FP Production Requirements
From Filtered Push Wiki
|Project Charter||Project Roadmap||Requirements||ApplePie|
Requirements that must be fullfilled for ApplePie.
- 1 (Maureen) It must be possible for a collection manager (data entry person) to retrieve and insert specimen information from existing duplicate records faster than they can type the same information off of the labels on a sheet in front of them.
- 2 Rapidly find potential duplicate botanical specimen records by fuzzy matching on collector name, collector number, and date collected. (faster than they can type the same information off of the labels on a sheet in front of them)
- 3 Filtered Push can directly update local data stores
- 4 Database administrators can produce a mapping for several expected to be frequently used classes of annotations onto insert and update queries that can be fired against their local data stores.
- 5 ( Bob) An annotation can be about another annotation and can contain assertions about that other annotation.
- 6 (Maureen)--> Annotations in a domain must express their domain-specific assertions in terms of a commonly understood semantics within that domain.
- 7 Annotations must be able to express domain concepts.
- 8 (Lei) Data curators must be able to recieve annotations that are of potential interest for their making changes to the data sets that they curate.
- 9 (Lei) Data curators must be able to decide which annotations, and which parts of which annotations to accept into their local data stores.
- 10 (Lei) Data curators must be able to ignore incoming annotations that are reported to them.
- 11 Annotations must be able to carry structured data
- 12 A message injected into a FilteredPush network must be able to carry an authentication token allowing the validation of which user provided the annotation.
- 13 Annotations must be able to distinguish between the data object being annotated and the assertions made in the annotation.
- 14 Data in motion should be encryptable by configuration in a particular network instance.
- 15 Clients must authenticate to network access points.
- 16 ( Bob) Messages require identifiers
- 17 ( Bob) Clients should be able to inquire on the status of a message by its identifier
- 18 The network can access data about an object with a guid from sources external to a message that refers to the guid.
- 19 People can register interests to the network.
- 20 The network can match interests to messages and associated content and deliver messages to the interested users.
- 21 The network must be able to respond to clients' queries for annotations
- 26 (Domain Specific): Cluster potential duplicate botanical specimen records by fuzzy matching on collector name, collector number, date collected, locality, identifications, etc.
- 27 Injected workflows must be able to create annotations.
- 22 Client elicits Annotation as Structured Data from a user.
- 23 Client system injects the user's annotation with at least a minimal context (e.g. GUID) of what data object the user was annotating.
- 24 Individual Human Users of Clients are Authenticated.
- 25 The workflow environment must be able to authenticate the user and pass authentication tokens to the network in messages.
- 33 All interactions between clients and the network must be able to occur over port 80/443 on connections initiated by the client.
- 28 Find clusters in data known to the network on arbitrary criteria.
- I accept this, but think we should get more specific so we can measure success. --Bob Morris 17:36, 15 March 2011 (EDT)
- Specific Example: Find potential duplicate herbarium sheets based on collector, collector number, date collected, taxon and locality. --Paul J. Morris 11:47, 29 November 2011 (EST)
- Specific Example: Find clusters of specimens based on family membership (using current APG taxonomy) using generic and specific names associated with those specimens. --Paul J. Morris 11:47, 29 November 2011 (EST)
- 29 (Annotations): Annotations must be able to group records into arbitrary sets.
- Not so clear how we would measure success on this. --Bob Morris 17:36, 15 March 2011 (EDT)
- Specific Example: Provide an annotation to cluster a set of specimens into a duplicate set.--Paul J. Morris 11:47, 29 November 2011 (EST)
- Specific Example: Provide an annotation to cluster a set of collecting events into a track of a particular person. --Paul J. Morris 11:47, 29 November 2011 (EST)
- Specific Example: Provide an annotation to cluster a set of specimens into a family. --Paul J. Morris 11:47, 29 November 2011 (EST)
- 30 (Annotations): It must be possible to annotate workflows.
- 31 Data Curators must be able to automatically apply filters to incoming annotations to allow them to examine and process the ones of the most interest to them.
- 32 A Data Curator can use an incoming annotation to generate a pattern for automatic rejection or acceptance of future matching annotations.
Further analysis and research required
- Requirement: A means for administration of automatically triggered workflows present in the network.
- Requirement: Workflows injected into the network must be discoverable by at least their creator, and probably all users, no clear requirement for access control has yet been defined for workflows.
- Requirement: Query rewriting/mapping for select queries on local data stores.
- Requirement: Query rewriting/mapping for update/insert queries from network into local data stores.
List of Systems to Develop
- Base Network Node
- Network capabilities
- Knowledge capability
- Analytical capability
- Messaging capability
- Kepler Client
- Specify Client
- Symbiota Client
- Morphbank Client
- Web Client
- Client Helper