2013May01

From FilteredPush
Jump to: navigation, search


Etherpad for meeting notes: http://firuta.huh.harvard.edu:9000/FP-2013May01

Agenda

  • Curation Provenance Models for Quality Control
  • Upcoming Meetings
  • Upcoming Meetings
    • OA East coast rollout meeting.
      • Rebuild of demonstration (and video)
    • SPNHC ( [1] ) ApplePie
      • Registration by May 15, Abstracts due May 15
  • Kepler Georeference QC Actor.
  • Annotations
    • Progress on rewriting dwFP, OAD, and example annotations.
  • MCZbase Driver

Non-Tech

  • Annotations
    • Annotation MS
  • Collaborations
    • Specify/Symbiota
    • SCAN TCN
      • Niko will be looking for a FP/SCAN update in a week or two.
    • NEVP TCN

For Future meetings

Reports

  • Paul
    • Work on Paper revisions with Bob.
    • Unpacking a new copy of the GBIF Cache.
    • Working with Patrick on Communication between NEVP primary digitization apparatus, iPlant, and Symbiota to feed new specimen records with image links on to ApplePie.
  • Bertram
    • I'd like to elevate our FP curation workflows to one of the featured use cases in the DataONE/Provenance Working Group. Specifically, I'd like to collect provenance traces in different formats and for different variations of the curation workflow. Having a good understanding of the various provenance models and formats will also be important for FP, including for the export to spreadsheets.

Notes

FilteredPush Team Meeting 2013 May 01

Present: Tianhong, Sven, David, Maureen, Paul, Bob, Jim

  • Curation Provenance Models for Quality Control

Paul: Bertram is putting more provenance documentation on the table.

Tianhong: Not clear what the provenance model is for the curation workflows. Kuration has its own actor to record provenance,

Paul: Provenance related to actions taken by kepler actors (this actor validated this record) as opposed to internals (domain specific) of why this actor validated this record?

Maureen: Perhaps need to match Tianhong with a user.

Paul: Is this a question where interaction with a domain expert wiould help, or is this about Kepler internals for provenance management?

Sven: Bertram is perhaps looking for how much provenance information is available to be exposed in the spreadheet.

Paul: Tianhong and Sven to provide a first cut of documentation for Bertram on what is being put into the provenance and what decisions by kuration actors are not.

  • Upcoming Meetings
  • Upcoming Meetings
    • OA East coast rollout meeting.

Bob: Next week.

      • Rebuild of demonstration (and video)

Paul: David has virtual machine reconfigured, need to get copy, put in data, and record video.

Paul: Schedule for tomorrow, also record video of embedded kepler workflow.

    • SPNHC ( [2] ) ApplePie
      • Registration by May 15, Abstracts due May 15

Time to get abstracts written.

  • Kepler Georeference QC Actor.

Paul: Needs to make progress?

Tianhong: Have new dataset for shapefile of county boundaries. Representing shapefile in java, working with adding validation logic. Actor is working. Need to add filteredpush step.

Paul: Three cases (actually 4 after discussion):

  1. Valid: No annotation needed.
  2. Invalid: Can fix. Generate update georeference annotation (implemented)
  3. Invalid: Can't fix. Generate solve with more data annotation (needs implementation).
  4. Can 't tell: Preconditions aren't met: No annotation needed (provenance needed in analysis results). (Annotations could highlight records in which there is much interest in improving the data).

Bob is constructing an annotation that references a part of an image of a botanical sheet.

Target selector is a DwC triple for "fix with more evidence" example? Depends on what the body is trying to say. No selector needed in case of identification because data is not coming from an existing data source, it's coming from within the annotation.

The examples are in the AO directory? THere's a new subdir called manuscript examples; if there's something in aod examples, and it hasn't been changed since February, then it's not current. If it's been modified in the last month it's probably current.

Case 3) David will update the annotation example for the "fix with more evidence" so that Tianhong will have something to work.

There's a case 4) in which the actor doesn't know if data is wrong

Bob: is it possible to characterize the exception/failure as the actor is saying that it can't complete it's job and it is only going to report some failure condition?

Tianhong: this isn't expressed as an exception, and downstream actors will be able to operate if they don't depend on the output of the failure, otherwise, they will operate independently on the comad stream. There will be an annotation, problem is what belongs in that annotation.

Requirements:

Kepler workflow produces a result set along with provenance (which is reported to the original requestor of the analysis)

Actors may be embedded in a workflow to create annotations.

An annotation has a scope determined by configuration (e.g. an update georeference annotation or an insert determination annotation).

An annotation applies to a target which is represented by a darwin core triplet selector (in ApplePie).

A record (e.g. an occurrence record) in a dataset loaded into Kepler in a curation workflow may result in the creation of zero to many annotations.

The generation of annotations is independent of a serialize results to mongo, or serialize results to csv file actor in the workflow (the same workflow with different load/save actors can be run on the desktop or embedded).

A workflow can accept a parameter which will prevent the assertion of annotations.

Annotation generation can be separated from annotation injection (we may wish to serialize annotations to a triplestore rather than inject into an access point) (thus an intermediary may reason on annotation documents).

ApplePie expections are: case 1, no annotation, cases 2,3,4, an annotation. case 2 asserts a correction, cases 3,4 assert solve with more data.

Consumers of annotations (e.g. the annotation processor) must be able to provide users with a concise view of multiple identical annotations on a single record. And let users handle this as a single problem.

Implementation:

(1) Procede with generation of annotations with a kepler actor (and injection of annotations from another).

(2) Explore implemntation of a second analytical engine (e.g. R), and the generation of annotations from workflow results by a separate agent in the FP network.

  • Annotations
    • Progress on rewriting dwFP, OAD, and example annotations.

Bob: Constructing new set of examples fully expanding on those provided in the paper (paper has snippets of turtle and diagrams, these are fleshed out in these examples).

  • MCZbase Driver

Maureen: Have access to the VM, working on getting system configured to begin development. Need some low level vm/firewall/system configuration support from research computing to proceed.

Paul: If blocked, next priority is insert/update georeference in specify drivers.

Non Tech

  • Annotations
    • Annotation MS

Bob: Deep in writing the set of examples for ancilary materials that flesh out the brief examples that are in the paper (complete and correct examples). Have revision 12.20 out to Paul, revising discussion of multiplicity to reduce use of Bernardo example. Paul needs to revise before we circulate.

Paul: Bob and I identifed and dealt with a large set of issues over the last week, and have dealt with most of them.

Bob: very close (days) to being able to circulate in group now, much better from the intense work.

  • Collaborations
    • Specify/Symbiota
    • SCAN TCN
      • Niko will be looking for a FP/SCAN update in a week or two.

Nothing back yet from idigbio folks about implementation in portal, need to follow up.

    • NEVP TCN

Ed has signed off on ingest new occurrence annotation code. Still needs signoff from idigbio folks. Working with patrick on communication with iPlant.

Bob: Hong Cui has recieved Tenure. Roger Hyam published nice paper on using http uris for specimen records, and how to do it reliably. Meeting in Berlin this week on a EU project where argument is being made to push forward with LSIDs, but good assertions to use http uris.