Report: FP Use Cases

From Filtered Push Wiki
Jump to: navigation, search

FilteredPush Use Cases

The FilteredPush prototype was built around two core use cases. These use cases both pertain to the domain of Natural Science Collections data.

Use Case: Find Duplicates

Find records of distributed duplicate botanical specimens. This is a core use case for ApplePie.

Business Process

Goal: A herbarium collection manager wishes to avoid having to retype all of the data about a specimen that belongs to a set of duplicates when the same data for another specimen belonging to the same set of duplicates has already been captured at another institution.

Summary: A collection manager capturing data about a specimen begins by entering data elements that are likely to identify members of a set of duplicates (e.g. (in order) collector name, collector number, date collected, current determination). If data has been captured from the same set of duplicates, the FP network presents this data to the collection manager, and the collection manager can accept these data for entry into their local database, saving them the effort of retyping much of the rest of the data.

Diagram

FindDuplicates use case diagram.png

Actors

Collection Manager (sensu latu), a person using a software application where the person is involved in the management of specimens and information in a herbarium collection, possibly a curator, a collection manager, a collection assistant, or data capture person. This person isn't using the FP network directly, but is using other software (e.g. Specify Workbench, Specify) to interact with the FP network (through FP Messages).

The local Collection Database.

FP Network.

Preconditions

The FP Network is aware of and can rapidly return relevant duplicate records.

The collection manager (as a person) has authenticated into the local software they are using for collection management, and this software has authenticated in to the FP network.

Triggers

Beginning data capture on a new (herbarium) specimen.

Course of Events

  1. A collection manager begins data entry on a herbarium sheet by entering the collector's name and collector's number (~= field number) from the sheet into the data entry interface for their local herbarium database (e.g Specify Workbench).
  2. The local database's data entry interface queries its local Filtered Push network node for any sets (of one or more duplicate specimens) matching this collector and collector number (FP_Messages#FP_INVENTORY, FP_Messages#FP_FIND_SETS, FP_Messages#FP_GET_DATA). Matching data are returned very rapidly to the user, more rapidly than they could type in the rest of the record. Requirement: Rapid return of analyzed duplicates from network
  3. A match for the duplicate record is presented to the collection manager, who can accept the data into appropriate fields on the user interface in front of them. Requirement: Highly atomic data in network, along with semantic mapping to local schemas
  4. The match having been recognized, the newly entered specimen is added to the relevant set of duplicates in the network FP_Messages#FP_ADD_SHEET.

Alternative Paths

  1. No matching specimens found, data capture continues from the specimen. The specimen is used as the basis for a new duplicate set in the network. FP_Messages#FP_ADD_NEW_SET
  2. Matching specimens found, but with problematic data that needs correction, this data gets pulled in, corrected, and Use_cases#Use_Case:_Annotate_Specimen is triggered.
  3. No matching specimen is found from just collector and collector number, but as more data are added, a match is found on other criteria (FP_Messages#FP_INVENTORY, FP_Messages#FP_FIND_SETS, FP_Messages#FP_GET_DATA). Course of events continues as if a match was found on the first set of data.

Postconditions

Specimen record in the local herbarium database is populated.

Specimen record has been added in the network to an appropriate set of duplicates.

Business Rules

The local collection database must present collector and collector number as among the first fields for data capture from a herbarium sheet.

The local collection database must query the network node for matching duplicates as soon as collector and collector number are entered.

Particular herbaria will have the original field notes, maps, and similar data to validate the data associated with sets of duplicates distributed by a particular collector. It is expected that herbaria will pay particular attention to validating the data of duplicate sets for which they hold additional authoritative data sources beyond the herbarium sheet itself. This may impose a requirement for a data element in an annotation to describe the authoritative source on which an annotation is based.

Assumptions

Knowledge of which specimens might be duplicates and their data are right at hand for the FP network, so that query/response and network transport lags do not prevent data capture off of the specimen from being faster than the response time of the network.

The specimen/collection object in question is a Herbarium specimen.

Authentication and authorization of the individual person using the FP network is a problem for the local software. Authorizing the software to generate FP messages is a problem for the FP network.

Notes

Duplicate specimens (Parts of one plant collected at the same time, attached to different herbarium sheets, and distributed to several Herbaria) are largely a phenomenon of botanical collections. This use case is only relevant for botanical collections amongst which duplicate specimens have been distributed {Note: have been distributed might be a useful piece of information for the network for more rapidly returning relevant records).

Use Case: Annotate Specimen

Or, Make Annotation, which seems more general, applying to images or other media files vouchering observations as well as specimens. This is a core use case for ApplePie.

Business Process

Goal: Data about specimens in natural history collections is being brought to the desktops of the researchers and specialists best able to correct and clean those data, without those researchers being brought back in to the specimen collections themselves. The visits of researchers to collections and the annotations of specimens by researchers is one of the key processes that keeps collections and their data vital, alive, and current. A critical need in the growing global networks of specimen data is a means for corrections, annotations, and new identifications (all of the processes of data improvement that keep natural history collections vital) to be brought from the remote desktops at which those data are seen back to the collections that hold the specimens.

Summary:

Diagram

Annotate specimens use case diagram.png

Note: Not in diagram is the creation of automatic filtering rules by the User, or the application of those rules to auto accept or auto reject messages with particular characteristics.  ?separate use case?

Actors

Taxonomist (a taxonomist or other researcher at a remote data portal, able to contribute a correction or new information related to data that they are seeing at that portal. Taxonomist here comprises both the human and the portal software (which may be a web portal or a collection database)).

CollectionDatabase (the software that manages and communicates with an authoritative database containing information about physical or electronic vouchers of organisms, such as the specimen catalog of a zoological collection in a natural history museum, or a database of field images of organisms).

User (a collection manager, or other gatekeeper for the collection database, the human filter for the acceptance or rejection of annotations directed towards their collection database.)

Preconditions

Data about specimens are present in the FP network.

Triggers

Taxonomist sees an error or has new information that relates to a collection object or defined set of collection objects.

Course of Events

  1. The Taxonomist enters structured data comprising an annotation. The Taxonomists injects the new annotation into the network. Note on addressing: the taxonomist doesn't know the network address of the destination node, but provides enough information in the annotation for there to be a high likelyhood of the network being able to find the correct address. The taxonomist provides either a GUID (through the software) or a darwin core triplet for the record to be annotated.
  2. The network retains knowledge of the annotation and forwards it to the user who is the gatekeeper of the institution that holds the authoritative record for the specimen or observation that is being annotated. Note on addressing: The network infers the destination of the annotation from the content of the annotation, this annotation is about MCZ-Ornit-35151, destination must be the MCZ node.
  3. The gatekeeper may accept the annotation, in which case the data are transformed to fit the local schema and pushed into the authoritative local database.
  1. Notice of the acceptance of the annotation is injected back into the network.

Alternative Paths

The annotation is on a set of collection objects and is forwarded to the gatekeeper of each institution holding one or more members of the set.

The gatekeeper rejects the annotation. No change is made to the local database, and notice of the rejection is injected back into the network.

The gatekeeper agrees with and accepts the annotation. Notice of the agreement and acceptance of the annotation is injected back into the network.

The gatekeeper ignores the annotation.

The annotation is for a specimen that isn't on the network because the collection isn't on the network. The network retains knowledge of the annotation. This could also be an error condition, it is a case where the network can't determine the destination address for the annotation. This should thus probably generate a message back to the annotator.

The annotation is for a specimen that isn't on the network because the specimen hasn't had its data captured yet. Message goes to destination node, handling is dependent on gatekeeper.

Potential alternative path: The content of the annotation undergoes automated quality control analysis, problems are found, and a FP_Messages#FP_QUALITY_ISSUE_ASSERTION message is generated referring to the original message.

Postconditions

The annotation is stored as part of the authoritative record of the relevant specimen.

The network retains knowledge of the annotation and its status.

Business Rules

Annotations must contain enough information to determine destination address.

Assumptions

Taxonomist knows which specimen (or specimen set) to annotate. That is, the taxonomist can address an annotation to a particular collection object (which needs to be know to the portal through some GUID or through a set of concepts that form a likely likely identifier for a single collection object (a darwin core triplet)).

Notes

The annotation use case suggests three specific types of messages for the injection of an annotation into the network, and an additional message injected back into the network at a filtering endpoint in response to one of these messages. One annotation message is a New Determination, a message carrying an identification to some level in the taxonomic hierarchy made by some person about some voucher of some organism based on their remote observation of data concerning that organism (e.g. A taxonomist provides a new determination of a specimen in a museum collection based on their viewing of an image of that specimen). A new determination is made by someone at some time (which may not be the person or time of the injection of the new determination message into the network), applies to some thing, identified by a globally unique identifier or by a DarwinCore triplet of institution, collection, and catalog number. A new determination message requires a TaxonName, mappable as a DarwinCore TaxonName element, and can contain any of the taxon name related DarwinCore elements, or TaxonConceptSchema elements. Mapping local concept spaces onto the required and optional elements of the new determination message is the job of the mapping interface. A new determination for a specimen injected into the network is forwarded to the network node associated with the institution holding that specimen, where it is expected to be queued for examination and filtering. The new determination is remembered by the network and can be retrieved from the network in association with the data for the referenced specimen, even though the annotation may still be pending review. On receipt, a human (or a set of filtering rules, e.g. Automatically reject all annotations from some person), can examine the new determination in a local software system (our goal in the prototype, is from within the Specify user interface), and then can inject a message back into the network making an assertion about the annotation. Such response messages may accept the annotation, indicating its acceptance into the local authoritative data store without commenting on its validity – someone said this, I trust them, I've accepted their statement- or agree with the annotation – someone said this, I agree with them, I've accepted their statement – or reject the annotation.