2014May14

From FilteredPush
Jump to: navigation, search


Etherpad for meeting notes: http://firuta.huh.harvard.edu:9000/FP-2014May14

Agenda

Non-Tech

  • SPNHC
  • James: TDWG Symposium: Who to invite
  • James: FaunaEuropaea
  • InvertEBase
  • Request for second NCE
  • iDigBio, next actions?

Tech

  • Report from Thursday call
  • Driver
    • Report Maureen: Status of driver - current annotation processor integration.
    • Chuck: State of getting set up to work on annotation processor.
  • SCAN
    • Test of Akka workflow with SCAN data.
    • Query for harvested data and analysis results.
  • Analysis
    • Tianhong: Progress on cleaning data with data.
    • Report Bob: Progress on Duplicate Finding data mining.
  • Nodes
    • Report Maureen: Status of ingests (taxon/occurrence) on FP2 and FP3
    • Report David: Morphbank integration update.
  • FP-DataEntry
    • Report Chuck: Duplicate detection integration into Yale data entry application
  • NEVP
    • Report David: Progress on updating deployment.
    • Akka Workflow for NEVP
  • SemanticMediaWiki as FP Client


  • For Thursday:

Reports

  • Paul
    • InvertENet budget and justification revisions now in fastlane.
    • Met with NEVP team, including Binil and Patrick to review data capture rate for NEVP. Limited downstream changes at this point. May go to capturing only first collector.

Notes

Present: James, Jim, Tianhong, David, Chuck, Paul, Maureen, Bob.

Non-Tech

  • SPNHC

Paul: Time to specify what the demonstration script should look like.

  • James: TDWG Symposium: Who to invite?

James: Time for this discussion. Haven't heard back from Berlin yet on first thoughts.

  • James: FaunaEuropaea

James: Folks there are checking on who is coming to SPNHC to see if it is worth us meeting there. Interested in annotation on text based resources, names and related information from textural sources.

Bob: Descriptive taxon treatments?

James: some of it, not sure how much.

Bob: Perhaps some synergy with ETC. FP would need a vocabulary, work coming here from Plazi, Terry, Hong, etc.

  • InvertEBase

We have received the go-ahead from NSF to upload revised budgets and budget justifications, which should be completed today. "The tentative start date is 07/01/2014 or as soon thereafter as award paperwork can be processed." Harvard total award, $76,320; direct costs, $58,385.

  • Request for second NCE

Approved! Harvard will now process a matching NCE for the UC Davis subcontract.

  • iDigBio, next actions?

Tech

  • Report from Thursday call

Maureen: Looked at the Akka analysis results in Mongo, found several places where the actors were asserting failure conditions too early.

  • Driver
    • Report Maureen: Status of driver - current annotation processor integration.

Maureen: Close, not quite there yet.

Maureen: Stub driver is working. Returns a fixed set of records when asked. Supports the minimum to make the annotation processor function. Fetch record by identifier, update a record, fetch a record by some field list. Essentially a mock object for the AnnotationProcessor.

Maureen: Driver in progress to work with current trunk specify. Makes use of two workbench configuration files.

    • Chuck: State of getting set up to work on annotation processor.
  • SCAN
    • Test of Akka workflow with SCAN data.

Tianhong: fixed some issue of SciNameValidator, how is the follow up discussion in the email?

Paul: No substantive feedback yet.

Tianhong: Next step?

Paul: Bug James and I for feedback.

    • Query for harvested data and analysis results.
  • Analysis
    • Tianhong: Progress on cleaning data with data.

Tianhong: refactored both lifespan check and outlier dection to use solr in fp1, though not enough data of lifespan of collector right now

Paul: Have approval from Ed to build this into Symbiota.

Bob: Do most collectors go into the field once per season, so outliers from clusters that are one year apart may be higlighting plausible errors. May be worth thinking about if there are patterns of how people collect things - about their collection practices that may signify that two collectors aren't the same or are the same.

    • Report Bob: Progress on Duplicate Finding data mining.

Nothing this week.

  • Nodes
    • Report Maureen: Status of ingests (taxon/occurrence) on FP2 and FP3
    • Report David: Morphbank integration update.
  • FP-DataEntry
    • Report Chuck: Duplicate detection integration into Yale data entry application

Chuck: A service that Patrick can use is deployed - have html like his form from him, have a demo page that operates with this.

Chuck: Also working on getting the HUH-rapid data entry running in test enviroment again, then to integrate

Maureen: the Solr batch load that created the index was run from a Perl script. Ongoing updates need to be written into another script patterned after the ones that import into Mulgara and Mongo.

Chuck: Place to put solr jar for indexing?

Maureen: Need to work out location.

  • NEVP
    • Report David: Progress on updating deployment.
    • Akka Workflow for NEVP
  • SemanticMediaWiki as FP Client

Bob: Provided support for a machine web interface to invoke a java based service (stand alone client helper tool from David) to turn annotation assertions in JSON into an annotation. PHP extension in mediawiki can invoke an underlying java service. Unclear what the scalability issues are. Have this working with new identification assertions - not as relevant for treatments.

David: Can provide annotation body in arbitraty simple darwin core in its entirety.

  • For Thursday:
    • Review harvest process.
    • Next steps for Akka QC for SCAN and NEV