Requirement8

From Filtered Push Wiki
Jump to: navigation, search


Requirement

Report:_FP_Requirements#8 Data curators must be able to receive annotations that are of potential interest for their making changes to the data sets that they curate.

This is a requirement for the FilteredPush network software (in directing annotations to relevant message queues). Annotations are provided to the network with subjects (annotatesResource) that are not likely to contain sufficient information to identify all of the relevant data curators (interested parties).

Solution

A job plan resulting from an annotation results in the following steps: (What does the query and query result of an annotation mean?)

  1. A query (or set of queries) to find related data
  2. The annotation and query results are provided to a filtering message system.
  3. The filtering message system delivers the message to queues representing interests that match data in the annotation or its associated data.
    1. The annotations which annotate data in a particular local database are filtered out and sent to that mapper instance for database update.

Competency Question0

Annotation is of a guid, how is it delivered to a collection database?

Question

An annotation has as a subject annotatesResource={some GUID}. How is this annotation delivered to a message queue established on an interest collectionCode="GH"?

Answer

Basically, the collection code (or colleciotn id) needs to be inferred from GUID so that the FP messaging system can delive the message to the queue established on an interest of particular collection code. How and when to do such inference has less to do with the query language. It's most a matter of software design. And the FP network should provide such system-level service.

  1. The inference could be done when the annotation is created completely transparent to the annotator. The inferred collecion code is sort of a metadata of the annotation. We could put it in the message as a property for convenient filtering afterwards.
  2. The inference could be done when the annotation arrives at the FP messaging system. According to the current subscritpion request, the messaging system can decide whether we need to do such inference or not and which queue the message should go to. If the mapping information between collection code and GUID is stored in a triple-store, a sparql query could be used.

Competency Question1

How can an annotation containing a collectionCode as part of the annotatesResource be delivered to the correct collection?

Question

How can a filter be specified to select annotation which annotates records from the local database?

Answer

  • Assuming the collection code is used in the annotation to specify the annotated records.
  • Assuming the local database is at HUH (e.g. collection code is A).

As SPARQL:

PREFIX: ao <”http://purl.org/ao/”>
PREFIX: aod <”http://etaxonomy.org/ontologies/ao/aod.owl#”>
PREFIX: dwc <”http://rs.tdwg.org/dwc/terms/index.htm#”>
PREFIX: dwcfp <”http://etaxonomy.org/ontologies/dwcfp#”>

select ?annotation
where {
	?annotation a ao:Annotation.
	?annotation aod:annotatesResource ?subject .
	?subject a dwc:references .
	?subject dwcfp:collectionCode A
}

As JMS key-value pairs without domain terminology: N/A

As Jms key-value pairs with domain terminology:

message.contentType: ao:Annotation
message.contentFormat: RDF/XML
message.content.annotatesResource.collectioncode: A

Competency Question2

Question

How can a filter be specified to select annotation which annotates records from the local database and comes from particular annotator?

Answer

  • Assuming the collection code is used in the annotation to specify the annotated records.
  • Assuming the local database is HUH (collection code is A).
  • Assuming the particular annotation is Bob.

As SPARQL:

PREFIX: ao <”http://purl.org/ao/”>
PREFIX: aod <”http://etaxonomy.org/ontologies/ao/aod.owl#”>
PREFIX: dwc <”http://rs.tdwg.org/dwc/terms/index.htm#”>
PREFIX: dwcfp <”http://etaxonomy.org/ontologies/dwcfp#”>
PREFIX: foaf <”http://xmlns.com/foaf/0.1/”>
PREFIX: pav <”http://purl.org/pav/”>

select ?annotation
where {
	?annotation a ao:Annotation.
	?annotation aod:annotatesResource ?subject .
	?subject a dwc:references .
	?subject dwcfp:collectionCode A .
        ?annotation a ao:Annotation pav:createdBy ?annotator .  
        ?annotator a foaf:Person .       
        ?annotator foaf:name Bob
}

As JMS key-value pairs without domain terminology: N/A

As Jms key-value pairs with domain terminology:

message.contentType: ao:Annotation
message.contentFormat: RDF/XML
message.content.annotatesResource.collectioncode: A
message.content.curator.name: Bob

Competency Question3

Question

How can a filter be specified to select an annotation which is about new determinations relevant to local specimen records?

Answer

  • Assuming the collection code is used in the annotation to specify the annotated records.
  • Assuming the local database is HUH (colleciton code is A).

As SPARQL:

PREFIX: ao <”http://purl.org/ao/”>
PREFIX: aod <”http://etaxonomy.org/ontologies/ao/aod.owl#”>
PREFIX: dwc <”http://rs.tdwg.org/dwc/terms/index.htm#”>
PREFIX: dwcfp <”http://etaxonomy.org/ontologies/dwcfp#”>

select ?annotation
where {
	?annotation a ao:Annotation.
	?annotation aod:hasExpectation aod:Expectation_Insert.
	?annotation ao:hasTopic ?topic.
	?topic a dwc:Identification.	
	?annotation aod:annotatesResource ?subject .
	?subject a dwc:references .
	?subject dwcfp:collectionCode A .
	?subject dwcfp:catalogNumber ?catalogNumber
}

As JMS key-value pairs without domain terminology: N/A

As Jms key-value pairs with domain terminology:

message.contentType: ao:Annotation
message.contentFormat: RDF/XML
message.content.annotatesResource.collectioncode: A
message.content.annotationType: insert_determination

Competency Question4

Question

How can a filter be specified to select annotation about local specimen records with specific determination taxon, like with the scientific name as Ateleia gummifera?

Answer

Annotation with determination information

For the annotation containing determination information, like insert_determination, update_determination, insert_occurrence and delete_determination, it's possible to write down sparql queries and key-value pairs against the incoming annotation to find out whether it's interested by some subscriber.

The sparql query and key-value pairs are ommited here since they're similar to the above examples.

Annotation without determination information

For the annotation that doesn't contain determination informaion, like insert_georeference, update_georeference, Show_Inconsistency and Systematic_Error, it's impossible to use both sparql query and key-value pairs against the incoming annotation (message) to specify filter for annotation selection.

In this case, to decide whether the annotation is being interested, we need to go either way:
(1)Assuming we have a triple-store abstraction of the collection database, then we could run a sparql query against the annotation + collection database to find out whether the annotated specimen has specific determination.
(2)Get out the specimen record id by parsing the annotation;Find out the determination of this specimen record by querying the collection database; If the determination is interested by the subscriber, then send the annotation to the subscriber. If no, do nothing.

As Sparql Query:

  • Assuming there's a triple store abstraction of the collection database
  • The query is run against the incoming annotation + colleciton database
PREFIX: ao <”http://purl.org/ao/”>
PREFIX: aod <”http://etaxonomy.org/ontologies/ao/aod.owl#”>
PREFIX: dwc <”http://rs.tdwg.org/dwc/terms/index.htm#”>
PREFIX: dwcfp <”http://etaxonomy.org/ontologies/dwcfp#”>

select ?annotation
where {
	?annotation a ao:Annotation.
	?annotation aod:annotatesResource ?subject .
	?subject a dwc:references .
	?subject dwcfp:collectionCode A .
	?subject dwcfp:identifierOf ?specimen.
	?specimen dwcfp:scientificName 'Ateleia gummifera'
}

As Jms key-value pairs without domain terminology: N/A

As Jms key-value pairs with domain terminology:

  • The key-value pairs of the scientific name of the annotated specimen which is parsed out either through the sparql query of the triple store or through the sql query of the relational database, needs to be put in the message as a property.
message.contentType: ao:Annotation
message.contentFormat: RDF/XML
message.content.annotatesResource.SpecimenRecord.ScientificName: Ateleia gummifera


Notes

  1. For the FP messaging system, there are two kinds of annotations.
    1. Annotation containing all the information that could be used to decide whether some subscriber is interested in it or not. E.g. the annotation in the competency question 1 using collection code and catalog number to specify the target specimen record. (This will probably never be true for interests other than collections, as interested parties may express arbitrary query criteria as interests).
    2. Annotation containing information that needs to be further processed (or queried together with the triple-store of the collection database) to decide whether some subscriber is interested in it or not.
      1. simple processing. E.g. the annotation in the competency question 1 using occurrence id instead of colleciton code and catalog number. The occurrence id will be used to infer the collection code.
      2. complex processing. E.g. the annotation in competency question 4 without determination information.
  2. Maybe we could use both sparql and key-value pairs to specify the filter. The simple subscription could be expressed just with key-value pairs. The complex subscription could correspond to a group of filters while the key-value pairs filter is applied before the sparql query filter as pre-classification. In this way, the performance is better than using purely using sparql query and the flexibility (richness of expressing the interest) is better than only using the key-value pairs. For the subscriber, making a subscription should be a easy job through a GUI.
    1. some information, like the annotation type and collection code etc., could be put as the key-value pairs. Therefore we don't need to find it out by running the sparql query.
    2. some information, like whether the annotated record has specific determination, could be accomplished by sparql query.
  3. key-value pairs without domain terminology seems not a solution since a lot of subscription is related to the domain concepts.
  4. What information is in the triple-store of the collection database? We might need to define some rules to support sparql query (reasoning) againt annotation, collection database and annotation store. E.g. if specimen S belongs to family A and if family A has synonyms of B, then S also belongs to B.