2014Aug13

From Filtered Push Wiki
Jump to: navigation, search


Etherpad for meeting notes: http://firuta.huh.harvard.edu:9000/FP-2014Aug13

Agenda

Non-Tech

  • Publications
  • Kurator
    • iDigBio Data Management Interest Group
    • ELODENS Jeremy Miller, ELODINS Christina Flann
  • James: TDWG Symposium
  • InvertEBase
  • Possible firuta server move.

Tech

  • Coordination: David out next few days.
  • QC for SCAN
    • David: Status of updates to Occurrence and Taxon harvests, (new data, missing collection code from some data).
    • Run on full NAU dataset (using entomologists list from solr), send report to Neil
      • Tianhong: Status of workflow and this run.
      • David: Alternative report just listing actionable items.
  • QC work to do
    • Tianhong: Preparation of jar for workflow runnable by Bertram
    • Adding agent authority file to Symbiota - harvest to solr index - use in actor.
  • DarwinCore issues 204-226 under discussion: https://code.google.com/p/darwincore/issues/list?sort=-id
  • Status of going live with Morphbank integration
  • iDigBio croudsourcing deployment
  • David: Update on current Status of FP2 and SCAN
  • Metrics for SCAN: http://symbiota2.acis.ufl.edu/symbiota/scan/scan_reports.html
  • Updating Roadmap
  • Upcoming work
    • NEVP
      • Node configuration, symbiota integration
      • Annotation processor
    • InvertEBase
  • Java 1.8 (test builds on 1.8, build for 1.6 - dependencies Ptolomey, Morphbank).

Reports

Notes

FilteredPush Team Meeting 2014 Aug 23

Present: Bertram, Bob, David, Paul, James, Jim, Tianhong.

Non-Tech

  • Publications
  • Kurator
    • iDigBio Data Management Interest Group

James: FP came up in new iDigBio Data Management interest group, Deb pinged out to see if we could do demonstration.

Paul: Provided links ot Deb. Plausible to do a demonstration for them in Oct.

    • ELODENS Jeremy Miller, ELODINS Christina Flann

Bob: Inquiry from Paddy on state of FP. Consortium making a response to a European biodiversity call.

Bob: If we haven't answered all the questions, we might be able to bring Jeremy in on the call next week.

Jim: Just be cautious that we don't make commitments at this stage. We are interested in having any community that can use FP use it. Model of way in which we have been working with TCNs - upfront with all comers about that.

  • James: TDWG Symposium

James: No more data on symposium at this point.

  • TDWG call for abstracts: September 25th
  • *James: How many talks will we give? 1. Workflow Symposium; 2.?

James: Current state of Kurator in symposium. Also a data quality symposium, is there a second FP talk to go there?

Bob: Lots of people could probably do, but that nobody else is, is construction and launching of annotations by workflow system.

Bertram: should have a presence in the DQ session

James: have a presentation on current "Kuration1.0" in the wf session, and have plans for Kurator2.0 in the DQ session!?

  • InvertEBase

Jim: No final approval from NSF yet. Message from Petra Seinwald, 6 August 2014--"Delaware Museum of natural History never had an NSF grant and needed to submit extensive documentation, such as financial policy documents, etc.... Liz Shea ... has now supplied all the paperwork to NSF, and now NSF is switching its financial software. Judy Skog and Anne Maglia, the program officers, are working on this and we hope to get the award letters soon."

  • Possible firuta server move.

Paul: No date yet.

Jim: Key eye on ball of producing something that end users benefit from.

Tech

  • Coordination: David out next few days.

Tianhong: Tech call tomorrow?

Bertram: can use the slot for a call with Tianhong

Paul: Tianhong - Bertram call.

  • QC for SCAN
    • David: Status of updates to Occurrence and Taxon harvests (new data, missing collection code from some data).

David: No progress yet. Have code checked out but haven't gotten into it.

    • Run on full NAU dataset (using entomologists list from solr), send report to Neil

postprocessing from the curation wf output into a JSON(?) spreadsheet Tianhong: current status: MCZ: 17627 records, 1102 distinguish names, 999 meaningful names (without symbols), 167 have true matches, 117 only has one component of name (no space in string) with the issue of many to one matches

  • If lastname and firstname have same first letter --> more false positives !?
  • Scalability not an issue? (Good!)
  • Can Tianhong also run David's code?

David: Just a command line utility. Right now querying mongo for JSON, can load from command line.

Paul: Two tasks: (1) run QC workflow on NAU data on FP2, generate spreadsheet report from results out of Mongo of FP2. (2) Package command line pos-procesing utility to read json from filesystem, make available on FP1 for Bertram to see file->akka->json file->spreadsheet flow.

David: Could use a copy of file of JSON output (to make sure how file starts/ends, in case there are minor differences from Mongo retrieval) for testing.

Tianhong: often only lastname available? -> not much to be done with this?

Tianhong: one problem is we don't know it's last name or first name

Paul: Assumption to make in this context is that it is the last name, almost always correct.

Tianhong: another problem is unless the last name is unique, we cannot tell which result it is

Paul: If more than one possoible match to a collector, report an error if date collected is outside the date ranges for all of the possible matches.

      • Tianhong: Status of workflow and this run.

Tianhong: Can run, some outstanding issues with false positives on entomologists. Paul: Let's run and send the result to Neil for comments.

      • David: Alternative report just listing actionable items.

David: Changes requested to spreadsheet regeneration made - actors on own sheets.

  • QC work to do
    • Tianhong: Preparation of jar for workflow runnable by Bertram

BErtram, We will try this out tomorrow. Appear to have access. Want Tianhong to make sure that he can run it by then..

David: Michael is using Java 1.6 with tomcat and Ubuntu 10.0.4. Have done a build for this for him.

David: Nothing back yet.

Paul: Have Bob follow up?

Bob: Need emails and some context.

Paul: Q for Ed, where does this go? Q for Neil, does this give you the information you need.

  • Updating Roadmap
  • Upcoming work
    • NEVP
      • Node configuration, symbiota integration
      • Annotation processor
    • InvertEBase
  • Java 1.8

Paul: Consenus from discussion at this point: (test builds on 1.8, build for 1.6 - dependencies Ptolomey, Morphbank). David: Have builds working with 1.8 with 1.6 as target.