2013Feb20

From FilteredPush
Jump to: navigation, search


Etherpad for meeting notes: http://firuta.huh.harvard.edu:9000/FP-2013Feb20

Agenda

  • iDigBio Portal
  • Embedded Kepler
    • Progress on embedded QC job.
    • Work needed on actors to interact with current FP APIs.
  • Knowledge/Interests implementation.
  • Annotations
    • NewOccurrence example for NEVP
    • Progress on: Rewrititing dwFP

Non-Tech

  • Increasing Burn Rate.
  • Annotations
    • Annotation MS
    • Task Group for Applicability Statement on OA
  • Collaborations
    • Specify/Symbiota
    • SCAN TCN
    • NEVP TCN

Carry to next week

  • Prospective meetings, development targets, ApplePie
    • OA West coast (will pay expenses of the presenter). Looking for demonstration and higher level technical details of what works and doesn't with OA.
    • OA East coast. (essentially same presentation - two sessions for convenience).
    • Semantics in biodiversity.(opportunity to develop for publication).
    • SPNHC ( [1] ) ApplePie
    • TDWG (late October) http://www.tdwg.org/homepage-news-item/article/tdwg-2013-call-for-symposia-and-workshops/
    • CNH: meeting will include NEVP. Good to workshop to get feedback from the botanists present. In Vermont, July? ApplePie

Reports

.

Notes

iDigBio

Perspective from iDigBio team on what they're most interested in:

web based FP interface (FP lite), putting that into iDigBio portal in this release or next (June or Jan next year), new developer will handle this

interested in both display and creating of annotations

Paul: two cases for annotations: 1) identify a particular business process like new determination or new georeference; in both cases there is a form for that 2) arbitrary small changes to data, e.g. name of the county is misspelled, not tied to a particular business process, just an error someone noticed

Two approaches to those cases. 1) look at handling code behind form for business process, e.g., Symbiota new determination form; hook into that code and have it populate the form and send it to FP network where annotations are stored 2) look at Symbiota's data store and create an OAI-PMH provider over it, harvest those changes incrementally by modification date and wrap those records as annotations

Last year at SPNHC FP demo'd an ecosystem w/ Specify, Morphbank, Symbiota, and FP network. Hooked into forms in Morphbank & Symbiota as in the first case above. Also in Symbiota, added tab for display of annotations to display of occurrence record, where annotations come from a query to FP. Haven't done this yet for Morphbank but should be easy to do, also easy to add to the iDigBio portal.

Gil: Might be easier to do annotation display first, then annotation creation, esp. with new developer coming on board in March

Paul: we could prepare code for Morphbank for creating and display of annotations. Should think about documentation for the new developer, FP working with new iDigBio developer, testbed, platform for annotation storage in production. FP is looking at supporting two network instances, SCAN and NEVP. In NEVP we're dealing with duplicate detection as well as annotation and there is a higher information security need-- unredacted endangered species locality data-- but what we're setting up for SCAN could be expanded to other Symbiota portals working with iDigBio.

Alex: Other than portal code itself, what is needed to run the FP service?

Paul: We will probably run a centralized resource rather than instances that talk to each other. Lifecyle of an annotation:

  • an occurrence record exists in Symbiota
  • a researcher gives it a new determination
  • an annotation is created from that
  • annotation injected into an FP access point
  • that annotation document (rdf xml) goes into Fedora Commons repository as a document store
  • that annotation is dropped into a triplestore (Fuseki), where there are triples harvested from e.g. taxon authority files turned into ontologies, and triples from annotations. This lets us reason on taxon names and relationships even when particular taxon names don't appear in annotations. Also would work for geographical hierarchies. Annotation could go into other indexes, perhaps Lucene
  • iDigBio has a copy of the occurrence record, someone visits the page in the portal. Code behind the page can submit an annotation to FP or query FP for annotations known to pertain to this occurrence record based on darwin core triple; results can be accompanied by a stylesheet for human display
  • if the occurrence record is associated with a Specify CollectionObject record, then a data entry person could import the annotation as a new Determination in Specify. That action generates an annotation that says the annotation was "accepted" as a new Determination.
  • "Accept" annotations now displayed in Symbiota/iDigBio

Alex: How to handle load of network traffic generated by this process?

Paul: could set up a local triplestore cache with a sparql interface

David: glassfish supports clustering

Paul: two queries: show me annotations pertaining to a darwin core triple; show me all annotations. If no other queries, e.g. show annotations pertaining to a chain of reasoning, then the cache might be a good idea

David: the more interests are registered, the more interests have to be updated

Paul: we can start by using same resource stack; we can give you same triple store as we use; if scaling issues begin to appear, we can look at how to configure glassfish to handle it (Paul is drawing diagrams here...) Symbiota, Morphbank, iDigBio portal all clients of FP; annotations coming in to FP go into two different Fusekis, one of which the clients can query directly David describes the client helper code: for annotation display, xslt is applied to query results; the query is canned. for submiting annotations, if you're developing in php there is code that uses FP web service for generating annotations. There is also Java code that does the same.

The result of annotation generation needs to then be submitted to the FP network. THe client helper code helps to do this process for the two different kinds of FP deployments, "lite" and "medium."

Client helper libraries also help in creating xml signatures.

Paul: we want to restrict who can create new annotations to the authorized users of participating systems so that spammers can't create new annotations. FP doesn't authenticate users but rather the systems that create annotations. The systems have to police their own users. This requires the system to have an authorization process.

Alex: there may be some iDigBio user information, but mostly not

Paul: an annotation system that allows users to create arbitrary annotations allows them to create attacks on users of other systems, such as an annotation containing a malicious link

Gil: perhaps some data providers will want to allow annotations and some not, will have to determine how to qualify data providers; might be two levels of determination relating to whether someone can annotate

Paul: broader model is that any authorized user of a participating system can annotate an object, and any authorized user can see those annotations. within the system there is an authoritative data store who determines whether annotations will be ingested into their data store. so authorization is for ingesting annotations, not for creating them

Gil: unaccepted annotations don't get distributed and displayed? Paul: no, they can see it but they will see that it is in conflict with the record's owner's version, and perhaps a reason why it was not accepted... local policy prohibits it e.g. There are use cases for access control on records themselves.

Gil: we don't own any of the records, we're just aggregating, so we need to be in synch with what the data providers need

Paul: our goal is to try to connect the end users of the aggregated data with the authoritative data stores so that when a researcher looks at a data record and says something about that record, there is a way for the authoritative data store to find out about what was said. Annotations are not guaranteed to agree with eachother but over the long term they will help ensure consistency of the larger data pool There are ways to construct FP queries to retrieve and display only those annotations made by the curator of a given object March-April-May sounds like when work on code will start to happen. There is some developer documentation already available.

Ed: in Symbiota, collection managers can make objects publically editable, and those changes can later be accepted by the collection manager.

Paul: we could capture the changes at the point the manager approves the changes, or we could capture them at OAI harvest time

Ed: either way sounds good

Paul: when iDigBio's new developer is on board, we can start discussion of testbed platforms and updates to developer documentation

Embedded Kepler

Tianhong: we can now read MongoDB data with a Kepler actor.

Paul: Kepler data cleaning workflow: first find records to be cleaned with a parameterized query read the records from MongoDB apply the workflow to the records

Maureen will obtain a query for MongoDB to obtain a limited set of data by collection code for an appropriately sized collection

We need to set up a web page to launch the queries.

Increasing Burn Rate

Paul met with Damari and Kristin.

Comments on the Leiden meeting

Annotations well received. Bob and James presented on annotations, James on FP, Bob on OA.