From Filtered Push Wiki
Jump to: navigation, search

Etherpad for meeting notes: http://firuta.huh.harvard.edu:9000/FP-2013Jul03

Reminder: Change of meeting time effective Sept 4: (12-1 Eastern 9-10 Pacific).


  • Annual NSF project report submitted this morning.
  • NEVP TCN Support
    • Status update - Annotation processor for Patrick to test?
    • Image and Specimen XML documents.
      • Small issue about oad Motivations. See Issue 138
    • Production node deployment target date 2013 Aug 15.
    • Site for AnnotationProcessor/SpecifyDriver testing?
    • Duplicate Finding Find_Duplicates
  • SCAN TCN Support
    • Trees
    • Status update on deployments.
      • Symbiota instance for Nico to test.
      • Annotation processor as in SPNHC demo for Nico to test.
    • Production node deployment target date 2013 July 31.
  • Kepler
  • MCZbase Driver
  • dwcFP and DarwinCore RDF guide - need to provide immediate feedback.


  • Third Project Programmer, Burndown.
  • Recent Contacts
  • Collaborations
    • Specify/Symbiota
    • Names

For Future meetings


  • Paul
    • Provided information for Annual Report.
    • Some updates to Roadmap


FilteredPush Team Meeting 2013 July 03

Present: Paul, Bob, Maureen, David, Bertram, Jim, Tianhong. James

  • Annual NSF project report

Jim: submitted this morning.

  • Names proposal

Jim: They need evaluation of work from MCZ and response from Paul and James about their queries.

Bob: Time to revisit the report of invention submission.

  • Third Project Programmer:

http://employment.harvard.edu/ -> Administrative/Staff Jobs (External Candidates> -> Search Positions -> Auto req ID 29736BR

Paul: Position is now publican available.

  • NEVP TCN Support
    • Status update - Annotation processor for Patrick to test?

David: Setup on FP3. Ready to test if Symbiota isn't needed. New georeference and solve with more data annotations available.

Paul: David to send link/info to Patrick. Maureen to send script for test and questions to Patrick.

    • Image and Specimen XML documents.
      • Small issue about oad-defined Motivations.

Bob: See https://sourceforge.net/apps/mantisbt/filteredpush/view.php?id=138 This is a generic issue, not limited to XML serialization. It is that OA validators that do/can not import OAD will fail when encountering a Motivation object not explicitly declared to be an oa:Motivation.

Bob: Went back and forth with Patrick on a few issues. On images, AC needed a little bit more structure. Identified issues with blank nodes that we need to think about in FP. Noted issue above about OAD motivations.

Paul: Grant as motivation?

Bob: Something to look at again offline.

Paul: UUIDs generated on the fly - botanists?

Patrick: Minted new UUIDs for botanists rather than looking up. Intent in production code is to do a lookup and populate from there, development version may not have access to authority file.

Paul: Questions for us to look at:

<oad:hasEvidence rdf:resource="urn:uuid:8a4e29f6-2f6f-4dc5-8de6-197632b12f75" />

Bob and Paul to examine.

   <dwcFP:Taxon rdf:about="urn:uuid:e316c97e-75fb-422a-b626-bc3308382687">

Patrick: Yes, that uuid is coming from the bonap name list.

Patrick: Given updates from Paul/Bob can make those changes to code that generates these.

Bob: See: https://sourceforge.net/apps/mantisbt/filteredpush/view.php?id=138 for motivations structure.

    • Production node deployment target date 2013 Aug 15.

Patrick: Week of July 22 targeted for setup of primary digitization apparatus at Harvard. OK team members coming to Harvard that week.

Paul: Key triad probably primary digitization apparatus, iPlant, test ingest into test Symbiota instance.

Paul: Primary issue for FP probably one or two documents to FP for specimen data and links to iPlant images.

Paul: Sugest Bob, Patrick, James review Find_Duplicates and evaluate

Bob, Maureen, and Paul to then review related requirements.

Patrick: Happy to provide feedback.

Bob: Are we likely to exchange duplicate information with Anosys?

James: Unknown, good to try to find out.

James: possibly also duplicate info on: http://wiki.filteredpush.org/wiki/Finding_Duplicates,_The_Henry_Scenario and under QC. (Check recent James' edits).

TODO: Paul and James to check for duplicates info and put on/link to on Find_Duplicates

Paul: Let's put on the table some examples of annotations using Occurrence and MaterialSample.

James: Will bring Joel, Bob, Paul into conversation about this.

  • SCAN TCN Support
    • Trees

Paul: Put on the table for Ed to think about OQGraph http://openquery.com/products/graph-engine

    • Status update on deployments.
      • Symbiota instance for Nico to test.
      • Annotation processor as in SPNHC demo for Nico to test.

David: Annotation processor is deployed on FP3. Symbiota on Symbiota2 has schema up to date, current Symbiota trunk, bringing client helper code up to data to allow connection to FP3.

Paul: David to point Nico at resources when ready, Maureen to send script for test to Nico as well.

    • Production node deployment target date 2013 July 31.

David: Configuration reconfiguration nearly done, needs tests.

Paul: When reconfigured, produce set of example annotations to Bob for review.

Bob: Sounds like a good plan.

Paul: Also compare generated annotations with Lei's list on: ApplePieRules#AnnotationTypeRule

Bob to review: ApplePieRules#ResponseAnnotationRule

Tianhong; Example code not working consistently, getting sporadic timeouts. Can see from example code how to use the service.

Paul: Proposal, update GBIFService class within Kuration actors and then, if seeing timeout issues with GBIF service to contact Marcus et al about them.

    • Provenance and rendering - anything more needed here?

Bertram: Support for Kurator proposal, use this experience to help identify challenges - usability, scaling, etc - identifying those as tasks. Good to produce video that shows provenance in action.

Bertram: anything we can do by July 15th can inform the DataONE planning meeting that's happening then

James: Thought that the results spreadsheet very valuable to researchers - two cases, check existing local data, screen data known to network.

Bertram: Documentation on Wik and screen capture video would be very helpful.

Paul: Created stub wiki page at: Scaling_Workflows_and_Provenance

TODO: Paul - screencast of SPNHC demo.

Two challenges:

  1. performance issue; a million records, .. analysis in a reasonable amount of time; for demo: need small subsets of data
  2. what happens with similar analysis on overlapping kinds of data, receiving contradictory / multiple annotations
  • MCZbase Driver

Maureen: Nothing back yet from Brendan.

Paul: Forwarded feature request from Rod Eastwood to Maureen.

Maureen: One element there was "people aren't going to write a long letter to describe a data problem". General priciple: keep the process short.

  • dwcFP and DarwinCore RDF guide - need to provide immediate feedback.

Paul and Bob need to provide feedback from dwcFP to Steve and TDWG RDF group. [Bertram] I'm also looking at those emails... interesting :)

  • Recent Contacts

Bertram: FYI: Heading to Humboldt University in Berlin on Sunday, talking about provenance there. Focus is at least in part theory, but I want to add some practice as well, so probably using some curation workflow stories -> will bug Tianhong and maybe Paul for screenshots etc :)

Tianhong: OK

Paul: OK.

Bertram: At Berlin I'll also meet with Mr. ETL/ Data Cleaning (Felix Naumann). Should you useful due to the application of provenance in curation workflows. Maybe we could employ ETL technology in the Kurator proposal... (Unfortunately, the schedule is very tight, so I probably won't be able to meet with Berendsohn et al) Last week: DataONE/ProvWG meeting

  • Collaborations
    • Specify/Symbiota
  • Kurator proposal resubmit

Bertram: I guess we're going again for this one: http://www.nsf.gov/funding/pgm_summ.jsp?pims_id=5444 Full Proposal Deadline Date: August 13, 2013

Paul: Harvard date August 6 (confirmed with Kristin).

  • CNH FP workshop planning

James: Are we organized yet? Coming soon. Not far off from what we did at SPNHC. Can several people play with is at the same time?

Paul: That's what we want to do.

James: July 19th, morning, Burlington, VT.