2013Dec04

From FilteredPush
Jump to: navigation, search


Etherpad for meeting notes: http://firuta.huh.harvard.edu:9000/FP-2013Dec04

Agenda

  • Kepler
    • Report: Current state of Kepler/Akka work.
      • CSV load/save actors.
      • Date Validation actor
      • Vertnet github issue filing
  • Driver
    • Report: Development status
    • Symbiota Driver, appears as need from SCAN.
    • Validation of ingest of all current annotation types (new determination, updated determination, new georeference, updated georeference, new occurrence, updated locality).
    • Plan for validation of future annotation types (updated habitat, phenological state descriptions).
  • Analysis
    • Discussion: Proposal for ranking annotations based on queries, not repeated analysis of same records.
    • Report: Progress on implementation of OAI/PMH harvesting through firewalls.
  • SCAN TCN Support
    • Revisit sanity check.
    • Working on scheduling visit in week of Jan 6.
  • NEVP TCN Support
    • Report: Preparations to update Annotation Processor Deployment at UNH.
    • Annotations in OCR/Croudsourcing pathways
      • Hackathon. Progress on setting up FP-Lite instance and development environment.
  • iDigBio integration, possible schedule in February.
  • FP Infrastructure
    • Report: Status of FP Node Refactoring

Non-Tech

  • Need to increase burndown rate.

Next Week

  • Duplicate finding

For Future meetings

  • Duplicate finding.
  • Prospective meetings, development targets.

Reports

  • Chuck
    • Last week, a little more cleaning for Maureen.
    • This week, installing, testing, and documenting FP-Services.
    • I'm not confident that I can get a stable development environment in time for Florida. (It works for a couple days, and then I update and it breaks and it takes a morning and more commits to get it working again.) Instead, focus on what it takes to configure FP1 for the particular story about crowd-sourcing transcriptions.
    • Need to write up ChuckAtIdigBioHackathon use case.

Notes

Present: Maureen, Jim, Tianhong, Chuck, Paul, James, David, Bob

  • Kepler
    • Report: Current state of Kepler/Akka work.
      • integrated FNA data set to flowerTimeValidator, how large should the dataset be?

Tianhong: Can use current small set.

James: Volume 19 is current test set - fairly small Part of:Asteraceae. Can extract from everything else that is available, about half of the flora.

James: Extract down to what taxonomic level? Flowering times vary by variety. Source is lumping to various extents.

Paul: Variety differences large enough to not detect month/day transpositions?

James: Yes. Will put on Joel's list of things to do.

Bob: Plazi invited to a Taverna workshop to look at Plazi's extraction of data from treatments in workflows. Good to make them aware of this actor. What should we point them at? Lei's video make sense?

      • CSV load/save actors.

Tianhong: added some error handling cases, will check in code when it's done:

Tianhong: what if the dataset is not valid

Discussion: How to handle in network.

David: Result of workflow includes system error state, query mongo for these.

Chuck: Write into database errors in reading from database?

Maureen: Write errors for administrator into log.

David: Mongo currently serving as data store for request/result documents.

Discussion: (1) Log to log file responsibility of each component. (2) Create response documents indicating failure state if possible. (3) Monitor with icinga.

Tianhong: what if the record to write is not a valid csv dataset

Tianhong: E.g. What happens if there are too many records?

      • Date Validation actor

Tianhong: start implementing collectionEventOutlier in akka, using the simple way then later looking at scaling up handle large set of data.

      • Vertnet github issue filing

Paul: Pinged out to David Bloom.

  • Driver
    • Report: Development status

Maureen: Testing, haven't started integration with Annotation Processor yet. Driver piece that takes spreadsheet and saves it as a workbench object, then driver piece that takes workbench object and saves it in the dataset. Working now on testing second piece.

    • Symbiota Driver, appears as need from SCAN.

No work.

    • Validation of ingest of all current annotation types (new determination, updated determination, new georeference, updated georeference, new occurrence, updated locality).

Maureen: David has provided example annotations.

David: These are in FP-Service.

Paul: Need to make sure that we also have examples live from the wild: from SCAN and from the NEVP new occurrence annotations.

    • Plan for validation of future annotation types (updated habitat, phenological state descriptions).

Paul: When we get to the driver/annotation processor integration tests, start writing documentation on how to add cases to those tests.

  • Analysis
    • Discussion: Proposal for ranking annotations based on queries, not repeated analysis of same records.

Maureen: Use the ListIdentifiers verb from OAI to get counts of things that have been modified and would be included in a harvest. Once a certain threshhold has been met, or the oldest unharvested item is old enough, do the harvest.

Maureen: Put the harvested documents into Fedora and Mongo. Run the chosen analysis workflows. Put the resulting cleaned documents into Fedora with incremented version numbers. Put the resulting provenance documents into Fedora. Create relations in Fedora from the harvested documents to their provenance documents.

David: Two options from here: Record counts of how many times the mongo queries have been executed, or record counts associated with queries on fedora for occurrence documents. In Mongo, need to relate selector in annotation to data objects in mongo. Need somewhere to put the count. Another option is if dwcfp occurrence records, sparql queries can retrive documents, and store counts in triple store. Depends on how we store harvested documents.

Discussion: Return to this on Friday.

    • Report: Progress on implementation of OAI/PMH harvesting through firewalls.

No work on Maureen's side.

  • SCAN TCN Support
    • Revisit sanity check.

Need to do.

    • Working on scheduling visit in week of Jan 6.
  • NEVP TCN Support
    • Report: Preparations to update Annotation Processor Deployment at UNH.

Will reschedule when driver is done with initial testing and deployment of client stable.

    • Annotations in OCR/Croudsourcing pathways
      • Hackathon. Progress on setting up FP-Lite instance and development environment.

Chuck: Need to work with David on configuration documents for FP1. Need to look at endpoints for putting new data in.

David: Annotation model object for that.

Discussion: Need to build and test deployments, document for Chuck how to build the pieces he needs, document for Chuck how to configure/build/deploy client helper artifact, document for developers how to use client helper artifiact.

Put example case for consensus annotation on table for Friday.

  • iDigBio integration, possible schedule in February.
  • FP Infrastructure
    • Report: Status of FP Node Refactoring

Non-Tech

  • Need to increase burndown rate.

Paul to meet with Kristin.