Use Case Scenarios

From Filtered Push Wiki
Jump to: navigation, search

Scenarios are stories that end with an accomplishment. Later, we will try to analyze them into Use cases (draft in:File:Fp use, of which there might be several in each scenario, and they are likely to be repeated and fewer in number than the scenarios. Don't hesitate to make scenarios that may seem only a little bit different from those given here. Variant scenarios can be indicated with "sub numbering using ##

See also LiteratureUserStories

Scenarios for taxonomist who can edit specimen records

  1. A taxonomist reviews a specimen record over the course of a few days, and updates certain fields. The record is the subject of collaborative editing.
    1. Taxonomist Tina opens a collection editor and selects a record.
    2. The system indicates that the record has a duplicate in the network ((Req: Query a set based on criteria that define the current record; Req: Define criteria; MetaReq: Define how to define criteria)) .
    3. Tina inquires what institutions that participate in the network hold duplicates or potential duplicates for this record.((Req: Return attribute metadata of a known Set)).
      1. Actors: FindDuplicates, SomeOtherActor, ...
    4. The system replies with a list of institutions, with indications of the dates and authors of the last annotation of duplictes, if any. that also shows the name and last annotation date for any person who annotated a duplicate. ((Req: Someone had to be able to annotate! Req:system can query about annotation)
    5. The list is ordered by date of last annotation (Req: able to query for date of annotation) and also contains an indication of whether the annotation work is still in progress (Req: need to know whether WIP is asserted).
    6. Tina filters the list to all in-progress items (ClientReq) , <CommunityReq>and generates a message to all those on the list announcing her intention to work on the record, together with a request that people with work in progress for more than 30 days with no changes please release the WIP indication.
    7. She logs out indicating that she wants the network to send her email of any responses.
    8. By the next day, email from three of the four WIP participants have released their WIP.
    9. <ClientReq>She logs back on to the system,</ClientReq> <CommunityReq>re-opens her own WIP and asks for contact details of the recalcitrant.</CommunityReq>
    10. She telephones his office and a phone message says he is in Costa Rica and may see email every few days.
    11. <CommunityReq>she leaves her WIP in place</CommunityReq>, and <ClientReq>logs off the network</ClientReq> <NoteByRAM: why is there such a thing as "logging on to the network"? Paul: this is really about authentication of messages</NoteByRAM>.
    12. Three days later, an <CommunityReq> notificationemail arrives from the network indicating that she is now the only participant holding a WIP on the record.
    13. <CommunityReq> <CommunityReq req="ability to subscribe to enough to get the notice following:" She returans to the system which tells her that an edit has been offered for push against the record she has a WIP on.
    14. She asks for details and learns that the recalcitrant meanwhile has ignored her WIP and has edited his own record to <AnnotationRequirement FP CORRECTION ASSERTION now FP_ANNOTATION>correct a place name spelling</AnnotationRequired> and <ClientReq>pushed his edit</ClientReq>.</CommunityReq>
    15. The <CommunityReq>system tells her that <CommunityReq; must have mechanism to accept>all other duplicates have accepted his push, and asks if she wants to accept it.</CommunityReq>
    16. She does so, and then adds an additional correction of the Collector Name spelling, which she pushes to the network. Then she releases her WIP and asks the system to order a sugar free vanilla latte.
  2. The scenario is the same, except that it ends with a tall drip half-decaf, with two pumps of mocha and one of peppermint syrup..
  3. This is the same, but ask for raspberry chai. How do we know who Taxonomist Tina is? See:Who authenticates where?
    Sequence diagram for login, authentication handled within native software

  4. Taxonomist Trudi wants to plan the next three months of annotation of uncataloged lichen specimens from the Morris 2007 Lake Woebegone expedition.
    1. She asks for an inventory of institutions with known or potential duplicates of the specimens held in her collection.
    2. A list is returned that indicates the institution, the number of specimens with putative duplicates, and the number of specimens with Work In Progress flags on them, together with the date of the last activity in any such WIP.
    3. The result convinces her that three particpants are actively working on some of the specimens in her own list, and she schedules a skype conference with the indicated participants to discuss coordinating the activity.

  5. Taxonomist Tanya knows that network participant Steve is working on a revision of the genus Woebegonia.
    1. She trusts Steve and authorizes her network node to push all the way to her database any relevant name changes proposed by Steve to specimens of Woebegonia, whether or not her specimens are putative duplicates.
    2. For those which are duplicates her changes are sent to the network with an indication that they were "auto accepted".
    3. Later, Steve notifies the network that his revision has been published, and the publication data is suitably pushed to her database.
    4. She receives an email that this has been done.

BGM scenario

  1. At a conference, curator Charles learns that BioGeomancer has updated place names and georeferences for all of Borneo based on the completion of a project by the government of Borneo.
  2. He asks his network node to acquire from BGM all the updates of impacted placenames in his data. [Probably more realistic is: Charles asks network to acquire all recent name changes for Borneo. ]
  3. Charles' agent finds impacted records and does appropriate "warranty repairs" by adding annotations to Charles' data -Bob Morris 15:46, 1 April 2008 (EDT) HUH meeting]
  4. Charles' agent launches a push proposal onto the network documenting those changes.
  5. He receives notice from the network that 65,536 such proposals have been launched potentially affecting seven participating institutions.
  6. Agents at six of these indicated that they auto-accepted the pushes.
  7. These are logged locally for inspection by Charles at his convenience.
  8. The agent at one indicated that the annotations of all but three records were accepted because there were WIP locks on those three records, but that notice of the event had been queued for attention by a human, whose contact details were given.

"learns that BioG. has updated...." more generally: this is like QC or perhaps "warranty" and maybe should extend QC usecase --Bob Morris 15:36, 1 April 2008 (EDT) HUH meeting
Implies FPN has a use case: GetNewGeoDataFromBGM

What sort of different roles are there?
possible differences between the allowed actions of different actors

Finding Duplicates

Finding Duplicates, The Henry Scenario

  1. James has a chat with Henry, a member of his curatorial staff, and tells him that he has made a commitment to produce a treatment of the genus Rubus for the Flora of North America.
  2. He asks Henry to database all of the specimens in the collection for 77 Rubus taxa.
  3. Henry agrees (what else was he going to do...) and logs on to Specify.
  4. He goes to a menu item that says "Search Network" which presents several search boxes categorized by theme.
  5. He chooses to search by taxon and types in Rubus.
  6. He is then presented with a list of taxa junior to Rubus and selects all the 77 taxa James requested and hits the submit button, hoping to have all network information available to him when he returns to the task.
  7. Henry then switches to counting and identifying bugs stuck on sticky traps as part of his IPM duties. A little later he returns to a message in Specify telling him that his request to the network has been completed.
  8. Henry then retrieves a physical specimen folder from James' collection of specimens containing one of the taxa James was interested in.
  9. Henry goes to the specimen rapid data capture page he designed in Specify and types in the collector name followed by the collector number.
  10. A pop-up box appears with a list of records from the collector he input and a highlighted record containing the collector number he input. This should represent a duplicate of the specimen of interest.
  11. With a click on the highlighted record he is presented with data that looks identical to the label data on his specimen. The information presented looks enough like his physical label that he has no problem recognizing it.
  12. He then pushes a button that uploads the data into his capture form.
  13. Henry laughs at how easy this job he just landed is and moves on to the next label. See also: Use case meeting whiteboard image
    A pseudo sequence diagram based loosely on this scenario
    A pseudo sequence diagram extending this story to include a push of an annotation

  14. On the next sheet, Henry types in the collector name and number an gets no match.
  15. He then adds the date and a pop-up tells Henry that the collector does have records on the network that are related based on the date of collection.
  16. Henry browses the list and realizes that one of the collections that has a number very similar to the one he is looking at appears to have the same location.
  17. Henry is overjoyed because there is also a georeference which saves him the hassle of having to figure it out. He submits to the local database.
  18. In pushing the submit button, Henry hopes that his input makes it through quality control and does NOT appear on James's list of records that need to be reviewed.
  19. A second scenario here is that the location data throws a flag indicating that the georeference does not match to some criterion.
  20. Henry checks the georeference using Biogeomancer and maps the record and realizes that it is indeed correct...
  21. ...OR Henry realizes that the georeference is incorrect and edits the georeference.

Same as above for taxon name.

  1. Record's locality returned is identical but the identification is different. This is very likely a typographical error. For older hand written labels this can be more problematic. Henry has difficulty discerning nasty handwriting and unfortunately only writes in two languages.

Finding Duplicates, Alien Data Sources

  1. Susan is satisfied that a concensus is reached about all the duplicates visible in her FP network, but she wonders these things:
    1. Are there duplicates discoverable from BiSciCol that have not participated in the concensus and how can she remedy that?
    2. Ditto for ABCD data perhaps via AnnoSys

KOS musings about duplicates

Making Annotations

  1. Making Annotations, The James Scenario James the knower of all things Rubus is curious how Henry is progressing. He opens a report in Specify that shows him that Henry input 37 specimens today. James then goes to an UI that shows him any records from Henry that were flagged. James notices that one of the records has a lat/long that is screwy and realizes that he has the field notes for this particular collection and looks up the collecting event. He realizes that the lat/long is incorrect for all duplicates input and he makes a revision. He submits the change to the UI. James leaves content that he has done a great service to his colleagues in notifying them of this mistake. Rich, my good friend at UMICH, goes to his Specify UI and sees that James has made a change and since it is on Rubus which Rich implicitly accepts knowing that James knows all in this regard.

    1. The next day, after sleeping on it, James wonders just how many similar errors there may be for Rubus records on the network. So, James uses the UI to see a list of all Rubus in taxon A according to the label identification. In looking through these James identifies a few more mistakes by mapping the georeferenced records. James also makes these corrections and submits them.
      pseudo sequence diagram for a variant of this scenario

    2. The next day, James again looks to the reporting agent that tells him what is new with records relating to his collection. James decides to spend a few hours looking at these. James sees there has been several entries on Aster which are new and interesting. James accepts these annotations and the metadata associated and flags them in the UI which moves to a queue to print. When a page of these annotations is accumulated the document is printed and James cuts the annotation labels out. James then takes the Aster annotations and goes into the collections and finds the corresponding sheets and glues the labels on and refiles the specimens appropriately as identified.
      Sequence diagram for messages involved in an annotation and the acceptance or rejection of that annotation

  2. James does a great job of convincing the community to participate in the network (through using Specify or with other collection databases accessing the network through the same API). UMASS joins the network but does not use Specify. Immediately (with certain inherent latency...), the data within their database is indexed on several fields and a mass of messages begin to fill the reporting agent. The curator is both elated and overwhelmed by the number of annotations found on the network that appear to relate to specimens in the UMASS herbarium. The curator promises to dedicate the rest of his/her year to processing this wealth of botanical information. In beginning this daunting task, it becomes immediately obvious that some of the fields are not mappable to the UMASS collection database(i.e., metadata fields for geo-referencing, hybrid status, multiple annotations...). The Curator is then faced with a decision. They could add the fields to their current database and include them in their GUI; a switch to Specify would solve this problem; they could choose not to store this additional information locally and to query the network for it if required (note that with this choice they also would not contribute this additional information to the network). And the winner is...
  3. Use Case: Researcher Seeks DwC Metadata A researcher from a museum that does not belong to the network wants to use specimen data to test hypotheses on global climate change. In pulling down lots of data from GBIF he/she realizes that the Darwin core does not provide metadata, including uncertainty, for its georeferenced specimen records. Seeking to attain this additional information, he/she contemplates asking every major herbarium for data on the taxa of interest, which would include data outside of the Darwin Core. This would be just too much of a hassle for him/her and the curators of other herbaria, BUT... A colleague mentions the eTaxonomy Portal, which has a query interface to find detailed specimen data from all members of the network. The researcher runs a batch query and receives a large report as a CSV file with more than enough data to begin running models.
    1. Alternatively, the researcher may be interested in the taxonomy of a particular group of plants. The query to the network would produce an annotation history for the specimens of interest that show the stability (or instability) of a name over time. They could even note that certain collectors/annotators consistently called a plant 'X' over time or perhaps were consistently wrong in an identification.
  4. A taxonomist viewing an image in Morphbank of a specimen held in HUH (but deposited by someone else into morphbank) enters a new determination of that specimen into morphbank. The new determination (and the identity of the taxonomist) is sent to HUH and presented to Henry in his list of pending annotations. Henry accepts the annotation (notifying the network of this), prints a label, and associates it with the specimen. Another taxonomist queries the network and can find the specimen directly from HUH under the new determination.
    1. As in the above, except Henry never gets to the annotation. Another taxonomist queries the network and can still find the specimen under the pending annotation, even though HUH hasn't accepted the annotation.
    2. As in the above, except that James tells Henry to reject the annotation. Henry doesn't print a label, and tells the system to reject the pending annotation. The system tells morphbank that the annotation was rejected.
  5. A remote taxonomist views an image of a herbarium sheet containing three barcoded collection objects (collections in the botanical sense). The taxonomist decides that a previous annotators association of a label with one of these collections is incorrect and that the label data should be associated with a different collection. The taxonomist files an annotation that disassociates the label data from one barcoded collection and associates it with another. This annotation is brought to Henry's attention, he pulls the sheet, examines the data, accepts the annotation, prints a label and the system changes the association between the label data and the collection object records.

Taxonomist.png draft use case diagram for solo uses by taxonomist or parataxonomist.

Taxonomist 1.png

See also: Image:Collection_manager_too_late_at_night.png

Request duplicates might be a subset of a more general class of analytical requests, another variant might be a request for outliers. It is a more general case of "In looking through these James identifies a few more mistakes by mapping the georeferenced records".

  1. Expert botanist Tony Reznicek at the University of Michigan sits down at his computer for another day of fun-filled Carex research. Tony begins by logging in to his favorite on-line resource, the Herbaria Collection Portal. Tony notes that there are now 53 member herbaria on the network. He navigates to the query-building search page and types "Carex" in the genus box, "grayii" in the species box, and "Massachusetts" in the state/province box and presses the search button. Tony is presented with several results in list format. Tony sorts the city field alphabetically and selects the first record under "Boston." Five records appear below which represent herbarium sheets from different institutions identified by their acronyms. Tony sees that this represents a set of duplicates which were collected by Macklin under his number 345 on August 1, 1999. Tony checks a box beside each record indicating that he would like to see more detail...

Additional Scenarios

See also: Network Monitoring Use Cases

Syndication Scenario

A taxonomist has a certain group or groups of taxa that he/she is interested in. Many taxonomists maintain their own databases of specimens and occurrences of the taxa they are interested in for several reasons such as continuously updating the distributions of these taxa, keeping track of pristine specimens, identifying outliers which may be significant etc. In an ideal world, it would be very useful for them to know what specimens have been captured for the taxa that they are interested in, and especially what annotations have been made on them. So, the taxonomist logs into the FP network and goes to their configuration page where they are provided with an advanced query form that allows them to select the breadth of taxa they are interested in and how much information they would like presented to them and how. This query is saved and fed to a syndication service which monitors the network for any changes that match the criteria in the query and sends results to an API which presents the taxonomist a result set. The taxonomist can control how the syndication service would notify them of interesting happenings in the network via e-mail, RSS, wiki, custom client, etc. The syndicated content can then be viewed by the taxonomist and annotated by them if there is reason.

Field Biologist Images Scenario

A scientist in the Kruger National Park connects her camera and SD cards to her laptop after five days in the field gathering thousands of images. Her laptop brings up a a local web form that helps add [MRTG metadata]. Then upon plugging into the internet everything is automagically contributed to Morphbank, with metdata available through [IPT] indexers. Furthermore, everytime she plugs in again she gets notices of annotations on her images, and also gets some kind of usage report of all her previous contributions. There are two several kinds of IPT portals in play and participating as Filtered Push nodes in specialized Filtered Push networks. The first is an FP net devoted to conservation and managment. Annotations in that network are dedicated to those concerns. The second is dedicated to taxonomic concerns. The two are operated independently as an administrative convenience and to minimize traffic and filtering due to the small intersection of communities, which overlap mainly in the data contributors. Morphbank might is not a node in the former, nor is it a node in the taxonomic network, but her laptop is a temporary, opportunistic node in both.--Bob Morris 23:10, 26 February 2009 (UTC)

EOL annotation Scenario

From discussion with Patrick, NOMINAIII, Margeret River, 2008Oct17. Logged in EOL user makes an annotation to an element on a species page. EOL knows the semantics of the annotated element displayed on the page, the source of the content of the annotated element, and the custom mapping used to to map the semantics from the provider of that content to EOL's species page semantics. EOL can generate a targeted message to the original provider with the old content, the new content, the annotator, and direct it at the specific mapped concept in the custom integration between the provider and EOL (or a standard mapping such as the SDD or Species Profile Model.

Scenarios for Bulk Data Loading

  1. Cam goes into the field and makes a collection of 10,000 herbarium sheets, enters their data into a local database, prints out labels, associates them with the specimens, and ships the 10,000 sheets to A. Fred also sends all the data for the 10,000 sheets to A as a push. Henry sees the incoming batch of data, and accepts it en masse into the A database.

Scenarios for Quality Control

  1. Set of duplicates with different current determinations.(a special case of dependency of records )
  2. character dependency based validation
    1. If a set of characters (may contain only one character) has some kind of correlation with another set of characters, we may use consistency check of values of those two sets of characters to measure the quality of a record. For example, longitude and latitude have a very close relationship with country name, city name etc.. Inconsistency between them should raise flags.
    2. Some thought from Richard Moe (Aug 2009):
      1. If coordinates don't lie within a county's outline, then either the county or the coordinates, or both are wrong (and always keep in mind the "or both") Same with elevation. Analogous checks with collection date and birth and death date of collector.
      2. If a taxon occurs above or below its postulated elevation or geographic range then the range is incomplete or the identification is wrong or the data points are wrong or mistranscribed
      3. If a collector is associated with a record from outside his stomping grounds,then something is wrong. Joseph Tracy collected thousands of specimens from northern California, none from southern California.
      4. If the distance between collection sites for collections supposedly made on the same day are greater than some constant (variously determined then the date is wrong or the coordinates are wrong or the collector is wrong
      5. All records made on the first day of any month, and particularly on the first day of the year are suspect, since sometimes software has assigned 1 as day when there is no day provided. We had a lot of records from January 1 from the Sierra Nevada--nearly all assignable only to the year, since most plants in the region are covered by several feet of snow on Jan 1.You can identify a lot of these first day problems by seeing if there are records by that collector from the previous or following day
      6. Heteroduplicates: if supposed duplicate specimens don't have the same name, then there are nomenclature differences (possibly not a data problem) or data problems causing non-duplicates to be treated as duplicates.
      7. If supposed duplicates don't map in the same place, then there are problems.
      8. Sources of label data error are many, and by no means restricted to keyboarding. There are often multiple errors that contribute to an anomaly.Errors aren't uniformly distributed: TRS coordinates are very hard to transcribe accurately (and in California there is the multiple meridian problem) Elevations are often gross estimates collectors and other contributors to information in herbaria are not uniformly reliable.
      9. On top of everything is the possibility of database error--I mean error caused by the database software. Horrible things can be imagined. Here's something not so horrible, but still totally unexpected. Our (i.e. SMASCH) collector number was designed to have 3 fields: a text field for prefixes, a numeric field for the actual number and text field for prefixes. This takes care of the kind of numbering most collectors use, and allows proper sorting. so if I use something like RLM-1234B, RLM- goes into the prefix, 1234 into the numeric field and B in the suffix More complicated numbering schemes produce problems, as different data enterers parse up a number differently. What no one ever noticed until I realized recently, is that 1234-9 satisfies the restrictions of a numeric field, ..... and gets stored as 1225 I only found this out when I wondered about some negative collector numbers.
    3. If a set of records has some correlation among themselves, we can also use this to do consistency checks.The only case Zhimin can think of with half imagination is relations of data records maintained by the biSciCol project.
  3. dependency finding:
    1. Domain expert may define some kind of rules
    2. Let machine learn it

Scientific Name Validation

    1. Misspelling
      1. check against a name service which does name reconciliation (Global Names Index:; this would be the first action before going on to comparing against a taxon-based vocabulary.
      2. check against a controlled taxon-based vocabulary
      3. check against a controlled vocabulary which includes oddly constructed names (e.g. M. maclurea), or odd abbreviations for infraspecific ranks (e.g., nothovars., f./form/forma/formas)
      4. example: (JH) Search MCZbase (ornithology collection) for the genus Phaethon. Then repeat the search for Phaeton. I'm not an ornithologist, but I'll bet my monthly rent that the latter is a misspelling of the former.
        1. service-based: ITIS, Catalogue of Life, International Plant Name Index (IPNI), ZooBank...
        2. list-based: Need to store the list to check against e.g., research-specific lists
    2. Homonyms
      1. a genus name governed under one code of nomenclature is identical to another
        1. use a homonym reconciliation service or controlled vocab
    3. Genus/species match
      1. use controlled vocabularies and services above to verify a known match for genus and species (note that you may need or want to cross reference against several of these to insure better coverage of combinations)
        1. issue that a relatively new combination may not appear in the controlled vocab(s)
    4. Synonymy
      1. the genus-species combination qiven is not the currently accepted name [JAM: This is not necessarily about QC but is a valuable service in providing additional information to the user about upates to the taxonomy of the records they provided; could be seen as an automated new determination from a given source)
        1. issue that not all controlled vocabs have synonymy information
        2. issue that controlled vocabs may conflict in opinion as to what is accepted
        3. example: (JH) Search MCZbase (herpetology collection) for Rana catesbeiana, the American bullfrog. Then search for the same species in Amphibian Species of the World <>. The page that comes up identifies Rana catesbeiana as a synonym for Lithobates catesbeianus, a new name that was erected in 2006 and regarded by some as the appropriate, valid name.
    5. Author
      1. Fill in missing author name (if unique).
      2. if author names are included in taxon name string they will need to be parsed and handled appropriately
        1. use scientific name string parser (
        2. can validate authors against controlled vocabularies (e.g., Harvard Index of Botanists: ; and author abbreviations (e.g., or publication: R. K. Brummitt & C. E. Powell, ed. (1992). Authors of Plant Names: a List of Authors of Scientific Names of Plants, with Recommended Standard Forms of their Names, Including Abbreviations. Royal Botanic Gardens, Kew. ISBN 0-947643-44-3.)
    6. GUID from nomenclatural authority
      1. Lookup the GUID provided by a nomenclatural authority for the nomenclatural act that formed the name (e.g., International Plant Name Index (IPNI), ZooBank)

Collecting Event Date Validation

    1. Date-specific
      1. Can we obtain a date from the record?
        1. Is DwC eventDate populated? Is the date in a understandable format (ISO 8601 Is the date presented within range (i.e., month <= 12 or day <=31...)
        2. Is DwC verbatimEventDate populated? If it can be parsed, does it agree with verbatimDate? If no eventDate, populate it.
        3. Note that in addition to eventDate, there are fields in DwC for startDayOfYear, month, day, year.
        4. Also note that dates might be presented in more granularity than the source data had. Beware of dates like Jan. 1. 19xx, this may mean the original data only had a year value.
    2. Lifespan-based
      1. Was the specimen collected within the lifespan of the collector?
        1. the eventDate (based on year) falls outside the period when the collector was alive; if the specimen was collected before s/he was born (or before s/he was 15 years old) or after s/he died, then the date is suspicious.
        2. requires a service/control for lookup of collector's birth/death dates (e.g., Harvard Index of Botanists:; note that a list like this only exists for botanists as far as I know. [Paul: Note that QC test on HUH data shows 6988 records where the date collected falls outside the birth/death dates for the collector. (could also be a problem matching to correct collector, or the collector's dates have the wrong type-- there's birth/death, flourished, and collected types)]
    3. Geographically-based
      1. Are the collecting dates similar for two or more specimens that were collected really far from one another?
        1. requires the definition of what "really far away is" by using a polygon/bounding box/radius
    4. Collector number-based
      1. Does the collecting number (DwC recordNumber or possibly fieldNumber) coincide with the eventDate?
        1. If the collection number on a given date is "100" and a few days later, the collection number for a given collector is "1355" then this would be suspicious. Obviously, the more specimens from the same collector you have data for, the better you will be able to decide whether an error has been made. We could index several relevant fields for collectors based on the digitized records we can access and use this as the control [we need to do this for duplicate specimens as well]
    5. Phenology-based
      1. Does the timing/date of the event coincide with the reproductive state [this is primarily a botany example but timing is also very important in animals, for example emergence dates of insects]?
        1. if eventDate is in springtime and the plant collected is fruiting then date could be incorrect/inaccurate
        2. need phenological data for a given taxon [Lei used data from Flora of North America as an example, which James provided for her]
    6. Annotation-based
      1. Is the eventDate earlier than any annotation date provided?
        1. A simple check against any annotations (digitized or born digital) which contain dates (dateIdentified and/or possibly a date stamp in OA for FP-based annotations); If the eventDate is July 1, 1980 and there is an annotation date of Jan 1, 1945 then the eventDate is suspicious and may be erroneous (of course, the annotation date may contain the error...)
    7. Record metadata-based
      1. Is the eventDate earlier than the date that the digital record was created?
        1. A simple check against the record-level metadata would validate this. Unfortunately, in DwC we only record the date last modified (=modified) so may not be the original digitization event...
    8. Other
      1. There are 3 other date-based fields in DwC that may be useful/relevant: georeferencedDate, relationshipEstablishedDate, measurementDeterminedDate

Scenarios for DBAs

  1. Specify DB - it just works....
  2. Non-specify biodiversity collections DB
  3. Non-biodiversity collections DB (working with a different set of concepts).

Scenarios for Network Administrators

  1. Lucinda at RSABG wants to join the network. Lucinda sets up a network node and generates an authentication token. Lucinda sends a request to join the network to James at HUH. James says sure, verifies Lucinda's authentication token over an out of band channel, and authorizes Lucinda's node to read all network data and push changes into the network.
    Possible Sequence for adding a node to the network

  2. Stan at CAS wants to query the network. He can do so directly, but is not authorized to view information marked as private to the network such as detailed locality records for endangered species. This isn't good enough data for Stan's work, so he sets up a network node, generates an authentication token, and sends a request to Bob at UMB to join the network in a no-push capacity. Bob says OK, verifies Stan's authentication token over an out of band channel, and authorizes Stan's node to read all network data.
  3. Removal of nodes?
  4. Downgrading nodes?

Administrative Use Case Questions

Administrative use case questions:

The answers to questions starting with "who" might be "sysadmins," "developers," "collection managers", "end users" (i.e. someone in front of a web browser).

  1. Who obtains the software?
  2. How is the software obtained? Download from SourceForge? Download from FP wiki? Private ftp site?
  3. On what platforms will the software be installed? Linux/Mac/Windows? Server or workstation, laptop or mobile device?
  4. Who is expected to do installs?
  5. Should the install process be driven by a GUI ("setup wizard"), or be a command-line script? Should we create a package for some particular flavor of Linux?
  6. How should updates be accomplished? Should the software check for updates?
  7. Who deals with maintaining the software prerequisites (e.g. Glassfish)?
  8. How often will schema/ontology changes occur? I.e., how often will Apple Pie's rules change?
  9. For the prerequisite software (e.g. Glassfish), will these components be available for use by other software, or are they private to Filtered Push?
  10. How many software installationsare required per network?
  11. Who has access to install, configure, update, and add "capabilities?" Can this be done via web interface, or only by access to the machine?
  12. How does one discover and add "capabilities?" Can capabilities be removed?
  13. Will all network members be known at install time? Can new members be added later? Can members be removed?
  14. Do we need to be able to move an installation from one machine to another? Do we need to have a mirror set up at install time? Do we need to have a way to export data? Do we need to have a way of selectively exporting data, or is it an all-or-nothing thing?
  15. Backwards compatibility?
  16. Will the different programs (client/driver/node) need to always operate with every version of each other program?
  17. Do we need to support removing annotations? For example, if someone wants to demonstrate how to use the software and puts in a "fake" annotation, do we need to be able to support removing single annotations? Do we need to support removing annotations made during a particular time frame? Do we need to support "trial runs," purging data used during an initial training period?
  18. What would be involved in uninstalling the software?

Scenarios for malicious uses

  1. External attacker E inserts an annotation into the system containing a sql injection payload. The network rejects and logs the exploit attempt. See:
    Adding signatures to messages should help with this

  2. External attacker E inserts packets into the system containing fuzzing data (over long, wrong type, integer overflow) and protocol timing attacks. The network recognizes and rejects these packets.
  3. Overstressed junior faculty member C inserts annotations into the system under the apparent credentials of other highly respected workers to create an interesting pattern in the data. C then mines the network in an open manner, finds the pattern and publishes it in a high impact publication.
  4. External attacker runs code on a computer network node that injects a malicious payload into the network by accessing the API for the local node as if it were the local user interface software.
    Sequence diagram for this scenario

Scenarios for NLP training sets

Suggested by discussion at HERBIS Imaging and data capture meeting at FMNH, 2008 Jun 27.

  1. Gordon generates a natural language processing training set for a set of herbarium sheets for a collector at the University of Alaska herbarium and distrubutes this training set to other herbaria that are using NSP to parse data off of images as they might hold specimens with labels from this collector. Subsequently another collection starts using this training set and makes corrections that improve the quality of the set. The other collection then submits the corrections to the NLP training set back to the other sources who are using this training set.

Scenarios for CBoL

Here are some suggestions generated by a phone conversation with Jim Beach

  1. Specimen Collection Manager Maggie barcodes all her specimens of genus Aus. Her collection management system is a Filtered Push node and is configured to launch an FP message announcing these names to all subscribers to the 'Aus Determinations' message queue (AusDMQ) telling the barcode GUID, the taxon concept assigned in her collection, and perhaps other data. Subscriber Sven to AusDMQ has a filter set to notify him of messages alleging a barcode for any of the putative Aus specimens he holds. If his current barcodes contradicts any of the concepts that the message asserts, his filter launches a query against nomenclators to ascertain if any of his names and Maggie's are synonyms. This data is offered to Sven for human mediation, e.g., discussion with Maggie, generating an annotation, changing his determination, launching an opinion of disagreement into the FPnet, etc.
  2. As above, but Sven has no barcodes at all, so he accepts them all with an annotation as to the source of the determination. Thus a subsciption list for interested parties finding out (throught a push) when a particular species has been barcoded.

--Bob Morris 19:51, 28 December 2008 (EST)

Scenarios for publication annotation


Scenarios for Web Client

FP Web Client Scenarios

Users have three main roles: 1.Data capture; find duplicates (the Filter) 2.Collection Management of annotations (the Push) 3.Research use (“I am interested in”)

Role 1: Data capture

Jason, a student at the XXX herbarium (do not have an FP client in-house) has been asked to capture the data associated with the Acer (Maple) specimens in the collection. The curator mentions that there may be knowledge about duplicate specimens and annotations in the FP network that could benefit his work. He goes to the search page and creates an account in order to log into the web client. He uses the simple search for “Acer” and gets notified that there are thousands of records that may take some time to aggregate as well as a suggestion to further limit his search. The specimens are ordered in the cabinet alphabetically by species so he looks at the first folder and finds the species “amplum.” He uses the advanced search to add the species name and is presented with a “thinking icon” so that he is assured that a result set is coming. When the set is complete a prompt asks how he would like to view the results either on screen using a choice of fields or as a CSV file that he can download. He chooses to use the on screen version and selects to have it sorted by collector and collector number. By luck, the first specimen that he chooses to capture the data on with collector name Gray and collector number 3731 is found on the result list. He then...

1.Chooses to view this entire record and uses cut and paste to populate the data form in his client. 2.Tags the record on-line to download later as a CSV file that he can import into his database semi-directly.

While examining this specimen Jason realizes that the county name is mis-spelled based on a gazetteer he consulted. He makes the change in the XXX client but also clicks on the 'annotate' button at the beginning of the record presented in the web client. This opens an annotation window which already has Jason's name, institution and time stamp along with the record GUID. Several drop down boxes allow him to choose fields for which he has corrections/additions so he chooses the county field and types in the correctly spelled county. Because he populated the county field, another drop down box appears allowing him to add other metadata based on how/why he made the change. In this case he chooses “gazetter.” He then chooses “okay” and goes back to the list and moves on to the next specimen.

Aside: Two other possibilities associated with this scenario 1.The record from the FP network has an annotation associated with it that is automatically produced by quality control analysis tools which suggest that the county does not exist and that a likely match is “X.” The human filter, Jason, can then accept this annotation, further legitimizing it. 2.There is more than one duplicate in the network and a consensus record is presented which aids Jason in making decisions towards capturing the highest quality data possible. [there should be a link here to the place that consensus records are discussed]

Role 2: Collection Management

As part of her weekly duties, Maureen runs a query using the FP web client for all the annotations that have accumulated for GUIDs that are based on herbarium XXX (do not have an in-house FP network client). To do this she selects the herbarium and then the time frame she is interested in. Since she is logged in, the client knows when her last log-in was and asks if she wants to download all records since that time or is interested in another time frame. Maureen selects all since her last log-in and selects to view the annotations on the screen instead of downloading as a CSV file. She begins by sorting the records based on taxonomy and then geography. She then...

1.goes through the entire list selecting the records that she feels are relevant to the specimens housed in the collection and downloads these as a CSV file that she can then hand to Jason, her student helper, to make the updates in the XXX database client. 2.goes through the entire list selecting the records that she feels are relevant to the specimens housed in the collection and uses the on-line PDF annotation label producer which are then printed and given to Jason to attach to the relevant specimens. 3.goes through the entire list selecting the records that she feels are relevant to the specimens housed in the collection and these are simply stored in the annotation store for reference as the XXX database does not have the relevant fields and thus cannot store them or the CM decides not to store them in the database as it is too time consuming at this particular time)

--James Macklin 20:16, 2 February 2010 (UTC)

Role 3. Research Use

Dr. Tony Reznicek, an expert in the all-too-complicated genus Carex is writing a monograph for North America. To complete this work he must examine thousands of herbarium specimens and compare a huge volume of literature on Carex. As this is a lengthy process, Tony is constantly discovering new information which will help him discern taxa and better understand their distributions. Tony is not the only expert in Carex and relies heavily on the opinions of other researchers.

Tony has recently discovered the FP web client and believes that this will be a very efficient way for him to gain knowledge about Carex specimens, see any annotations relating to these specimens, and contribute annotations to the benefit of others. Tony logs onto the network and proceeds to a page which has a set of stored queries that Tony selected when he first started using the network. Initially, Tony used the simple and advanced searches to define several queries which he could use to assess information on Carex. His queries are the following...

1.He selects a query which asks the network for all of the new specimens of Carex added to the network since his last login (could have a date/time control functions). 2.He selects a query which asks the network for all annotations that contain the genus Carex. 3.He selects a query which asks the network for all annotations that have been made to his specimens. 4.He selects a query which asks the network for all Carex specimens found in Michigan

  • Note: All of the above could be answered by one larger query and then sorted/filtered for only what Tony is interested in that particular session. Thus the query would be tell me all the network knows about Carex. This large query could also be run a first time and then appended to as annotations and specimens are added to the network. This avoids network latency issues.

The record sets generated based on the queries could then be examined by Tony either on the web record by record, or downloaded as a CSV file, as described in the former scenarios. The records in the web client could also be annotated by Tony based on his knowledge and pushed back into the network as outlined previously.

An alternative scenario here could be based on geography. Tony also has a long standing interest in the flora of Michigan. Thus, Tony also saves a general query which captures all specimens in the network that are asserted to be within Michigan. Tony then sorts this query by county and adds records to his distributions based on this new knowledge.

  • A key commonality for all scenarios is the need to pick up where you left off. This could be done automatically based on last login etc. or a choice by users based on date/time span they are interested in.

--James Macklin 19:51, 11 February 2010 (UTC)

Scenarios for schema mapping

See also: Darwin_Core_to_Specify_Mapping, Arctos to Darwincore, Mapping types

  1. Some user queries a local application for some information item. Since the local application do not have all the data to answer this question, it initiate a call to network with a transformed query in a global schema. Then the network will try to answer this by querying some global service, and querying data providers, which need another round of query transformation from global to local(answer need to be in global schema) . After that, network will integrate and analyze results gotten from previous work and return an answer in global schema to the local application, which will transform that answer back to a local schema.
  2. Network notifies a curator that a specimen under his or her care has a new determination from a respected source. This curator decide to accept the new name, a name+author+year string which is in a global schema. The local schema treat name and authorship in different fields, and name and author are also managed in two tables. If the system can find entries corresponding to the name and the author in the annotation, it can automatically change the specimen record by referencing to those entries; if it can not find corresponding entries, it need human interference to figure out how to update database.(it could change the existing entries referenced by specimen record or insert new record into taxon and author tables)

Scenarios for User-based Queries

  1. Simple queries
    1. Where does this species occur?
    2. What species occur in this country?
    3. What data is held by "X" institution?
    4. When (year) was a specimen of this species last collected in the field? Where was this species last collected, and by whom? Where is it now?
    5. Which institution holds the holotype of species X?
    6. The longitude of the georeferenced locality for this specimen can't be correct (wrong continent). Can someone please correct it?
    7. I can provide a more precise georeferenced location for this locality. May I send it to you?
  1. More complex queries
    1. Have species latitudinal ranges moved northwards in the last 100 years?
    2. What are the phylogenetic and geographic characteristics of species who's ranges have increased in the last 100 years?
    3. Is there a relationship between latitude and alpha/beta/gamma biological diversity?
    4. Is there a relationship between biogeographic range and stability of taxonomic names?
    5. What is the relationship between depth and alpha/beta/gamma diversity in the oceans?
    6. Is the species' "extent of occurrence" estimated to be less than 100 km2? The International Union for the Conservation of Nature (IUCN) defines a species' extent of occurrence as "the area contained within the shortest continuous imaginary boundary which can be drawn to encompass all the known, inferred or projected sites of present occurrence of a taxon, excluding cases of vagrancy.... This measure may exclude discontinuities or disjunctions within the overall distributions of taxa (e.g. large areas of obviously unsuitable habitat).... Extent of occurrence can often be measured by a minimum convex polygon (the smallest polygon in which no internal angle exceeds 180 degrees and which contains all the sites of occurrence)." Extent of occurrence less than 100 km2 is one of the criteria used to officially declare a species as "Critically Endangered."
    7. Is the species' "area of occupancy" estimated to be less than 10 km2? IUCN defines a species' area of occupancy as "the area within its 'extent of occurrence' ... which is occupied by a taxon, excluding cases of vagrancy. The measure reflects the fact that a taxon will not usually occur throughout the area of its extent of occurrence, which may contain unsuitable or unoccupied habitats. In some cases (e.g. irreplaceable colonial nesting sites, crucial feeding sites for migratory taxa) the area of occupancy is the smallest area essential at any stage to the survival of existing populations of a taxon. The size of the area of occupancy will be a function of the scale at which it is measured, and should be at a scale appropriate to relevant biological aspects of the taxon, the nature of threats and the available data.... To avoid inconsistencies and bias in assessments caused by estimating area of occupancy at different scales, it may be necessary to standardize estimates by applying a scale-correction factor. It is difficult to give strict guidance on how standardization should be done because different types of taxa have different scale-area relationships." Area of occupancy less than 10km2 is one of the criteria used to officially declare a species as "Critically Endangered."

Scenarios for Annotation Exchange with Annosys

These include those that the FP Annotation Generator presently produces. The hypertext links on the FP Annotations are to (possibly obsolete) JSON representations. JSON may be the interchange format of convenience.

  • agent wants an FP DwC-based InsertIdentification annotation produced as an Annosys ABCD-based Annotation
  • agent wants an FP DwC-based insertGeoreference annotation produced as an Annosys ABCD-based Annotation