Draft Filtered Push Project Charter


The Filtered Push project will produce a system for improving the fitness for purpose of distributed data through analysis, annotation, and human review of data quality annotations. --Paul J. Morris 16:37, 26 April 2011 (EDT)

The Filtered Push Continuous Quality Control (FPCQC) project will put into production a network that addresses the quality and fitness for use of distributed species-occurrence data.

(We need to distinguish between the generic software and one or more specific instances of networks in the species-occurrence data domain --Paul J. Morris 14:05, 26 April 2011 (EDT) )

The software provides networks that allow data providers and consumers to define potential errors in data, develop metrics for those errors, analyze distributed data to detect potential errors, and close the quality management cycle by providing a network architecture to move assertions about data quality such as corrections back to the curators of the original distributed data sets.

Objectives: [Move "By project end (Fall 2013)] to this line and delete other instances below, so that it is stated just once.--Hanken. Done. PJM.]

  • By project end (Fall 2013) have functioning FP nodes at at least --Paul J. Morris 16:47, 19 April 2011 (EDT) Harvard University, UMASS Boston, UC-Davis, and 5 institutions/museums curating natural science collections supported by servers paid for by the project forming an instance of an FP network for finding botanical duplicates --Paul J. Morris 16:47, 19 April 2011 (EDT). The production-level software for nodes should be capable of pulling and pushing annotations, Client side PJM filtering them, and Client side PJM applying updates through a client into their collection database.

To what extent do we need to comment on client side and network side responsibilities here? --Paul J. Morris 14:14, 26 April 2011 (EDT)

  • By project end (Fall 2013) have user-installable, production-level FP software with documentation for use by institutions/museums and associated biodiversity cyber-infrastructure platforms, including DataONE and iPlant.

  • By project end, have an ability to include authoritative lists and analytical tools for data assessment present within Kepler for scientists to use in assembling their own scientific workflows.
    • Including, e.g., IPNI, GNI, possibly GNUB services, Biogeomancer, GeoLocate. --Paul J. Morris 16:47, 19 April 2011 (EDT)
  • By project end, have an ability to program the network by providing it an with arbitrary analysis defined by a workflow. --Paul J. Morris 14:13, 26 April 2011 (EDT)
  • By project end have specific analytical capability for detecting botanical duplicates deployed in the botanical duplicate detection instance of the network. --Paul J. Morris 14:13, 26 April 2011 (EDT)
  • By project end, create API's to facilitate communication with an FP network in several clients: Specify, Kepler, IPT, GoldenGate, Morphbank and MediaWiki.

(Some of these client implementations (e.g. GoldenGate, MediaWiki are FP project responsibilities, others are partner responsibilities --Paul J. Morris 13:53, 26 April 2011 (EDT))

Also, provide software libraries for developers to use in creating API's to other potential clients. (Move reference to software library to a separate point with a timeline --Paul J. Morris 13:53, 26 April 2011 (EDT) ).

  • Publish an Annotation Ontology for use with FP but more broadly applicable to biodiversity science through the Biodiversity Information Standards (TDWG) process. Applicable standards processes? PJM
    • This should probably involve (1) a W3C process for the annotation ontology, (2) a TDWG process discussing adoption of the WC3 process, and (3) a TDWG process for domain specific annotation content (e.g. how to assert a new determination in an annotation) --Paul J. Morris 16:47, 19 April 2011 (EDT)
  • Produce training materials for teaching Biodiversity Informatics for K-12 and undergraduates.

[It is unrealistic to claim that we will be able to produce age-appropriate training materials for kindergarten through college seniors. I suggest narrowing the scope to a more realistic goal--Hanken] I concur. Suggest specific focus on undergraduates. --Paul J. Morris 13:53, 26 April 2011 (EDT)

  • Produce a prototype Flora of North America portal that uses an FP network to annotate and communicate between resources.
  • Release source code 1.0 for network nodes by X.
  • Release Client to network API documentation by the end of year 1. --Paul J. Morris 16:47, 19 April 2011 (EDT)

  • Release a Client side library for client-network interactions implemented in Java by the end of year 1. --Paul J. Morris 13:53, 26 April 2011 (EDT)
  • Not discussed, but probably should be: Web client for viewing annotations. Annotation store in network. Information security in the botanical duplicates instance. No single point of failure. Analytical and storage capabilities in network. Mapping tool for clients. --Paul J. Morris 14:38, 26 April 2011 (EDT)
  • Kepler goals: Kepler actors as clients. Workflow Annotation. Workflows running as analytical capability in the network. --Paul J. Morris 14:38, 26 April 2011 (EDT)
  • ? Discuss a regular release process with incremental addition of functions here? --Paul J. Morris 14:38, 26 April 2011 (EDT)

Committed Resources:

  • $1,640,289 from NSF
  • 2 full time programmers for 3 years (Harvard); one postdoc for 3 years (Davis); one system architect for 2 months per year (Harvard); one technician for 6 months in year three. [It seems odd to specify one location but not others--Hanken]
  • Non-funded supervisory contributions from James Hanken, James Macklin, Paul Morris and Bertram Ludaescher.
  • Letters of collaboration from William Michener, DataONE; Stinger Guala (now Gerry Moore), USDA-Plants; Greg Riccardi, Morphbank; Hong Cui, Biodiversity Literature Semantic Markup; Steve Goff, iPlant; Cynthia Parr, EOL; and Steve Kelling, Avian Knowledge Network. [Did Stinger Guala change identities? It might be better just to list the institutions--Hanken]

Authorizing Players:

PI: James Hanken; Co-PI's/Senior Personnel: James Macklin, Bertram Ludaescher, Paul Morris.