Filtered Push Project Charter (Filtered Push Production/FPCQC)
|Project Charter||Project Roadmap||Requirements||ApplePie|
The Filtered Push project will produce a system for improving the fitness for purpose of distributed data through analysis, annotation, and human review of data quality annotations.
The software provides for networks allowing data providers and consumers to define potential errors in data, develop metrics for those errors, analyze distributed data to detect potential errors, and close the quality management cycle by providing a network architecture to move assertions about data quality such as corrections back to the curators of the original distributed data sets.
Objectives (By project end, Summer 2014):
- Have functioning FP nodes at Harvard University, UMASS Boston, UC-Davis, and 5 institutions/museums curating natural science collections.
- Have user-installable, production-level FP software capable of detecting duplicates, pulling and pushing annotations, filtering them, and applying updates through a client into their collection database.
- Have a library of Kepler actors for exploiting authoritative lists and analytical tools for data assessment from which scientists can assemble their own workflows.
- Release documentation of the FP network API and a client-side library in Java to enable independent development of FP clients (e.g., Specify, Kepler, IPT, Morphbank).
- Release prototype GoldenGate and MediaWiki clients.
- W3C and Biodiversity Information Standards (TDWG) ratify the Annotation Ontology (now the W3C OA OpenAnnotation ontology).
- Produce training materials for teaching Biodiversity Informatics for undergraduates.
- Produce a testbed Flora of North America portal which uses an FP network to annotate and communicate between resources.
- $1,640,289 from NSF
- 2 full time programmers for 3 years (Harvard); one postdoc for 3 years (Davis); one system architect for 2 months per year (Harvard); one technician for 6 months in year three (Harvard).
- Non-NSF funded supervisory contributions from James Hanken, James Macklin, Paul Morris and Bertram Ludaescher, Robert Morris.
- Letters of Collaboration from: DataONE; USDA-Plants; Morphbank; Biodiversity Literature Semantic Markup; iPlant; EOL; Avian Knowledge Network.
PI- James Hanken; Co-PIs/Senior Personnel- James Macklin, Bertram Ludaescher, Paul Morris.