Remote Data Capture

From Filtered Push Wiki
Jump to: navigation, search

Use Case: Remote Data Capture

Business Process

Goal: Data is entered from an image of a specimen and its labels in an image repository and transported as a new specimen record to the appropriate authoritative collection database where it is ingested as a new specimen record.



Remote data capture use case.png


Data entry person.


Authoritative Collections Database.


No captured specimen or observational data in collection database.

Image of specimen in remote repository e.g. Morphbank.

Catalog (barcode) number assigned to specimen before or at time of transcription.


Transcription of a new occurrence record for some collection at a data capture site not connected to the authoritative collections database for that collection.

Course of Events

General Case

  1. Creation of structured data in a system not connected to the authoritative data store for that data.
  2. Construction and transport of a record using data structures and domain vocabulary understood by the creator and consumer.
  3. Ingest of the structured data into the authoritative database.
  4. Ingest of the structured data into alternative data stores.

Biodiversity Data Domain example

  1. Transcription of Label Data from Specimen in a system not connected to the authoritative collections database (e.g. Morphbank, Symbiota, Primary Digitization Apparatus).
  2. Construction and transport of a new specimen record as annotation(s).
  3. Ingest of annotation into a portal.
  4. Ingest of annotation into collection database.

Alternative Paths

Alternative Trigger: Collection of a new specimen in the field along with capture of a core set of specimen data.

Alternative Postcondition: Record already exists in ingesting database. Consequence: Record is not ingested. Annotation is commented on as possibly needing rephrasing with an expectation of update rather than insert.


Specimen record exists in Collection Database.

Business Rules

If record exists in the consuming database, do not ingest the data.

If record does not exist in the consuming database, ingest the body of the annotation (the new record).

If record does not exist in the consuming database, and conditions to ingest all the content of the body of the annotation (all the elements of the new record) are not met, engage in a conversation with some data curator about adding information to satisfy missing conditions.


Database exists (the annotation is not requesting the creation of a new database following some schema).

Database has a schema that can be mapped onto the domain vocabulary used in the body of the annotation.



  • Data entry person at primary digitization apparatus images herbarium sheet and collects a core/minimal set of occurrence data. Data entry person performs quality control checks on images and data.
  • At end of day, a batch of images and data is prepared consisting of the image files, a metadata document that describes the image files, and a metadata document that describes the specimen data captured for each sheet. This batch is provided to iPlant for storage of the image files.
  • The metadata documents are provided to the NEVP Symbiota instance, and ingested into Symbiota as occurrence records with associated media links (to the media files in iPlant's infrastructure).
  • The specimen record metadata documents are provided (via FilteredPush or directly?) to processing software sitting next to Specify collections databases at each of the participating herbaria (except for Yale, which is in KeEmu), and are ingested into those databases to form new occurrence records.
  • Enhancements of the data records occurring in Symbiota are passed on via FilteredPush and the annotation processor as annotations following Annotate_Specimen