- 1 ABCD Concept mappings
- 2 Concepts to think about for draft implementation
- 3 Grouping things that might be duplicates
- 3.1 Records from different sources are likely to represent duplicate herbarium sheets if they are similar in:
- 3.2 Records from different sources that represent duplicate herbarium sheets are likely to differ in:
- 3.3 Records from different sources that represent duplicate herbarium sheets are almost certain to differ in:
- 4 Data to return
ABCD Concept mappings
"Subset of ABCD" specified in proposal. (see: http://www.bgbm.org/TDWG/CODATA/Schema/ for ABCD schema docs).
Examples of relevant mappings of a subset of ABCD XPATH elements are at:
DarwinCore (1.4) http://www.bgbm.org/TDWG/CODATA/Schema/Mappings/DwCAndExtensions.htm
Concepts to think about for draft implementation
Popular ABDC Concepts
There are about 50 ABCD concepts related to specimen data (rather than metadata about the result sets) that are in widespread use.
For Finding Duplicates
Use Case: Use_cases#Use_Case:_Find_Duplicates
Collection manager makes a query: Find duplicates for collector and collector number, generate messages (FP_Messages#FP_INVENTORY, FP_Messages#FP_FIND_SETS, FP_Messages#FP_GET_DATA), and view any possible returned duplicate data.
An individual botanist. Zero or more botanists may be the collector for a gathering. One botanist is the primary collector for a particular specimen. The primary collector botanist is usually (but not always) the botanist that goes with the collector number (the collector number is taken from the primary botanist's number series).
Relevant ABCD xpaths. See: http://www.bgbm.org/tdwg/codata/schema/ABCD_2.06/HTML/ABCD_2.06.html#element_GatheringAgent_Link03174CE0
- Gathering/Agents/GatheringAgentsText (full concatenated list of collectors)
- Gathering/Agents/GatheringAgent/ with attribute <xs:attribute name="primarycollector" type="xs:boolean" use="optional"/>
- Gathering/Agents/GatheringAgent/AgentText (concatenated components of single collector's name - should be in standard botanical collector's form).
Collectors as organizations - in some unusual cases, the list of collectors can involve an organization. Example: LAE (Papua New Guinea forestry service) is included as part of the set of collectors for material collected under their auspices. This concept might be present in the list of collectors, but also is quite likely to be elsewhere, particularly in notes fields of various sorts.
The (hopefully) unique identifier assigned by a collector to a single specimen in the field at the time of its gathering, from an identifier series used by that collector to identify individual collections of specimens. The collector number for a herbarium specimen will usually be from the number series of the primary collector of that specimen. Typical formats are Prefix-number-suffix, where prefix and suffix are text strings and number is an integer (JAM-73-2009 which is Macklin, J.A.'s 73rd collection in 2009), or number (10113 which is the 10113th thing this particular botanist has collected in their life). Punctuation serving as separators between semantic components may or may not be present an may vary (between collectors and over time for a particular collector). Exact punctuation may be changed in data capture (sheet reads JAM-73.2008. captured data is fitted to local institutional standard of JAM/73/2009).
Some botanists use a collection event based number series, where the number is for a collecting/gathering event, and additional information is added to identify individual specimens collected in that collecting event (e.g. [hypothetically] JAM-73a where this is the a'th specimen in JAM's 73rd collecting event).
Relevant ABCD xpath Unit/CollectorsFieldNumber
Pathology of PrimaryCollector not having CollectorNumber
Sometimes the first listed or primary collector is not the collector who's number series is used to generate the Collector Number.
Example: Sargent material from late 1800's early 1900's (See: Macklin JA, Phipps JB, Boufford DE. 2000 Charles Sargent's type concept: a guide to interpreting his names in Crataegus (Rosaceae). Harvard Pap. Bot. 5. (1): 123-128). Sargent often used other collector's numbers (such as those of Ashe) when making his own labels. Sargent went into the field with someone else e.g. Ashe, both gathered lots of duplicate material, both returned to their own institutions and made their own labels for the material, they would then correspond about the material, and Sargent would use Ashe's number for a particular specimen on his (Sargent's) label.
Grouping things that might be duplicates
Records from different sources are likely to represent duplicate herbarium sheets if they are similar in:
- Collector name
- Collector's number
- Date Collected
- One of the identifications - that which is the identification on the original label made by the collector, which is not likely to be clearly distinguished from other identifications
- Atomized higher geography (though different variants may be used in different places, USA, United States, etc.).
- Verbatim locality - it should be a copy of the same text from the same label
- If recorded, the source institution from which the specimen came, though this may vary if the specimen was sent as a duplicate to one institution and then transfered to another.
- Family - Often explicit on the label - but might be dependent on local practice and how local heirarchies are maintained. Robustness of matches will vary with the fluidity of placement of genera into families in different (messy/not so messy) taxonomic groups.
Records from different sources that represent duplicate herbarium sheets are likely to differ in:
- The exact list of identifications - with each sheet having a different determination history after the distribution event for the duplicate set - though the determinations are likely to be within some close taxonomic distance - within some sphere. Even the same person annotating specimens in the same duplicate set in different institutions may put different determinations on different sheets from the same duplicate set...
- Date of data capture - but more recent than date collected etc.
- Date accessioned - though this may cluster around some unknown date(s) of distribution.
Records from different sources that represent duplicate herbarium sheets are almost certain to differ in:
- Institution code
Data to return
A canonical record for the duplicate set, and details for each herbarium sheet in that duplicate set.
In the response to a FP_Messages#FP_GET_DATA message the following subset of ABCD concepts:
The darwin core triplet:
- Unit/SourceInsititutionID (type=InstitutionCode) Institution (Which should, in theory, in Botany be limited to standard acronnyms)
- Unit/SourceID Catalog number series
- Unit/UnitID Catalog number
- ScientificName/NameAtomised/Botanical (of type NameBotanical) http://www.bgbm.org/TDWG/CODATA/Schema/ABCD_2.06/HTML/ABCD_2.06.html#complexType_NameBotanical_Link0306C890
- ScientificName/NameAtomised/Botanical/AuthorTeam (type=string)
- not including hybrid, cultivar, trade designation elements in the prototype.
- ScientificName/NameAtomised/Zoological (of type NameZoological - with its atomic parts ) http://www.bgbm.org/TDWG/CODATA/Schema/ABCD_2.06/HTML/ABCD_2.06.html#complexType_NameZoological_Link03071560
- NamedAreas (Of type NamedAreas: see: http://www.bgbm.org/TDWG/CODATA/Schema/ABCD_2.06/HTML/ABCD_2.06.html#element_NamedAreas_Link03171340 Matching named areas may require examination of both AreaClass and AreaName - perhaps include the explicit ranked primary division and secondary division (state, county) concepts from DarwinCore - so mappings don't need to understand the semantics of the text within AreaClass to make comparisons of data.
- GML Most likely just lat/long pair in decimal degrees (may need conversion).
- Identification/PreferredFlag (current id)
- Identification/Result/TaxonIdentified/ScientificNam/NameAtomized -> taxon name of appropriate type
- Identification/Result/TaxonIdentified/ScientificNam/IdentificationQualifier (for cf. and name components, not questionable id).
- IDentifcation/Identifiers/ -> to atomic botanist bits
Other bits ?Type status ?Typification/verification information
Possibly appropriate for grouping individual specimens of a duplicate set into a set: