Test Data Sets
Paul's Three Data Sets
Webprojects has three Specify databases, one for each of Paul's three test datasets. The mappings of database names to data sets is:
- testfrog: dataset a
- testfish: dataset b
- testfern: dataset c
The database names do not have any particular significance (i.e. there are not frog records in testfrog).
Umbfp has one Specify database:
- testfish: dataset b
The duplicate (same collector number and collector name) records:
| CatalogNumber | Barcode | CollectorNumber | Collectors | Taxon | a & c | 431 | 000000267 | 4887 | B. A. Krukoff | Tovomita krukovii | | 432 | 000000268 | 4887 | B. A. Krukoff | null null | b & c | 448 | 000000284 | 1866 | G. Klug | Graffenrieda colombiana | | 453 | 000000289 | 1866 | G. Klug | Graffenrieda colombiana | a & b [none]
Simple Data set of 20 records
Two tables with a one to one relationship, slightly more complex than DarwinCore, but only trivially so:
Collection object with barcode/catalog number and a determination and a type status and a collector's number Fields: BARCODE HERB_ACRONYM FORMAT REPRO TAXON TYPE_STATUS COLLECTOR_NO SITE_ID where site_id is the internal HUH database foreign key value for the site at which the collecting event where this specimen was collected, collector_no is the field number/collectors number assigned by the botanist who collected this specimen to the specimen at the time it was collected.
Tab delimited file: Image:Test1_collection_object.csv
Collecting event with a collector, locality description, and geopolitical placement.
Fields: SITE_ID COLLECTOR START_YEAR START_MONTH START_DAY LOCALITY H_COUNTRY_NAME H_PRIMARY_NAME where start year, month, and day are the date collected, and h_primary_name is the name of the state/province level geopolitical entity within country where the collecting event occurred.
Tab delimited file Image:Test1_collecting_event.csv
select BARCODE, HERB_ACRONYM, FORMAT, REPRO, TAXON, TYPE_STATUS, COLLECTOR_NO, COLLECTOR, START_YEAR START_MONTH, START_DAY, LOCALITY, H_COUNTRY_NAME, H_PRIMARY_NAME from collection_object left join collecting_event on collection_object.SITE_ID = collecting_event.SITE_ID
Generated from ASA with query:
select barcode, herb_acronym, format, repro, taxon, type_status, collector_no,
collector, start_year, start_month, start_day, locality, h_country_name, h_primary_name
from view_specimen_leftjoin_item, view_type_specimen_join_taxon, view_site_and_geo
where (collector_id = 138403 or collector_id = 109283 or collector_id = 104041)
and view_specimen_leftjoin_item.id_specimen = view_type_specimen_join_taxon.specimen_id
and view_specimen_leftjoin_item.site_id = view_site_and_geo.id_site
order by taxon_id