Remote Data Capture
Use Case: Remote Data Capture
Goal: Data is entered from an image of a specimen and its labels in an image repository and transported as a new specimen record to the appropriate authoritative collection database where it is ingested as a new specimen record.
Data entry person.
Authoritative Collections Database.
No captured specimen or observational data in collection database.
Image of specimen in remote repository e.g. Morphbank.
Catalog (barcode) number assigned to specimen before or at time of transcription.
Transcription of a new occurrence record for some collection at a data capture site not connected to the authoritative collections database for that collection.
Course of Events
- Creation of structured data in a system not connected to the authoritative data store for that data.
- Construction and transport of a record using data structures and domain vocabulary understood by the creator and consumer.
- Ingest of the structured data into the authoritative database.
- Ingest of the structured data into alternative data stores.
Biodiversity Data Domain example
- Transcription of Label Data from Specimen in a system not connected to the authoritative collections database (e.g. Morphbank, Symbiota, Primary Digitization Apparatus).
- Construction and transport of a new specimen record as annotation(s).
- Ingest of annotation into a portal.
- Ingest of annotation into collection database.
Alternative Trigger: Collection of a new specimen in the field along with capture of a core set of specimen data.
Alternative Postcondition: Record already exists in ingesting database. Consequence: Record is not ingested. Annotation is commented on as possibly needing rephrasing with an expectation of update rather than insert.
Specimen record exists in Collection Database.
If record exists in the consuming database, do not ingest the data.
If record does not exist in the consuming database, ingest the body of the annotation (the new record).
If record does not exist in the consuming database, and conditions to ingest all the content of the body of the annotation (all the elements of the new record) are not met, engage in a conversation with some data curator about adding information to satisfy missing conditions.
Database exists (the annotation is not requesting the creation of a new database following some schema).
Database has a schema that can be mapped onto the domain vocabulary used in the body of the annotation.
Use in NEVP TCN
- Data entry person at primary digitization apparatus images herbarium sheet and collects a core/minimal set of occurrence data. Data entry person performs quality control checks on images and data.
- At end of day, a batch of images and data is prepared consisting of the image files, a metadata document that describes the image files, and a metadata document that describes the specimen data captured for each sheet. This batch is provided to iPlant for storage of the image files.
- The metadata documents are provided to the NEVP Symbiota instance, and ingested into Symbiota as occurrence records with associated media links (to the media files in iPlant's infrastructure).
- The specimen record metadata documents are provided (via FilteredPush or directly?) to processing software sitting next to Specify collections databases at each of the participating herbaria (except for Yale, which is in KeEmu), and are ingested into those databases to form new occurrence records.
- Enhancements of the data records occurring in Symbiota are passed on via FilteredPush and the annotation processor as annotations following Annotate_Specimen