2013Nov06

From Filtered Push Wiki
Jump to: navigation, search


Etherpad for meeting notes: http://firuta.huh.harvard.edu:9000/FP-2013Nov06

Agenda

  • Annotation Paper.
  • Report from TDWG.
  • Summary of Friday Tech Call.
  • Driver development - need to reallocate some resources into critical path items (Specify6 Driver, Specify-HUH Driver, MCZbase Driver).
  • Kepler
    • Report: Current state of Kepler work.
    • Need workflow with georeference validator and scientific name validator composed with GBIF service for SCAN.
  • dwcFP and DarwinCore RDF guide - Need to provide feedback.
  • Analysis
    • Discussion:
    • Report: Progress on implementation of OAI/PMH harvesting through firewalls.
    • Report: State of investigation of duplicates in GBIF cache.
  • SCAN TCN Support
    • Sanity Checking, updates to AnnotationProcessor installation.
    • Progress statistics and other deliverables: Discussion with ASU for David and I to visit for a week of hacking with Ed and Nico.
  • NEVP TCN Support
    • Report: Preparations to update Annotation Processor Deployment at UNH.
    • Report: Testbed UI for data entry for duplicate finding.
    • Annotations in OCR/Croudsourcing pathways
      • Hackathon. Who to send?
  • FP Infrastructure
    • Report: Status of FP Node Refactoring

Non-Tech

  • Need burndown numbers from Kristin

Next Week

  • Supporting repeated QC requests for same records.

For Future meetings

  • Prospective meetings, development targets.
  • Burndown

Reports

  • Paul
    • At TDWG. Delivered Semantic matching of annotations to interests talk. In annotations interest group, obtained consensus to proceed with a task group to produce an Applicability Statement on OA. Applicability Statement will need a mechanism to update as OA updates and/or moves in W3C. Participated in NOMINA meeting on Nomenclator-Publisher integration for assignment of GUIDs for scientific names for registration processes. Participated in part (had to go to TDWG Technincal Architecture Group meeting) of Dina-Specify collaboration meeting.


Notes

FilteredPush team Meeting 2013 Nov 06 Present: Paul, Bob, Maureen, David, Chuck, Jim, Tianhong, Bertram, Sven Agenda

  • Annotation Paper.

Bob: Published.

  • Report from TDWG.

Bob: Significant set of Semantic Sessions. Bertram: very interesting two days (Thursday, Friday) at TDWG, specifically semantics workshop. Workflow presentation (authored by Bertram and delivered by James) apparently went well. Also: Are there any specific follow-ups from a FP angle? Action items: FP Lite install on fp1.acis.

  • Summary of Friday Tech Call.

No call.

  • Driver development - need to reallocate some resources into critical path items (Specify6 Driver, Specify-HUH Driver, MCZbase Driver).

Paul: Need to reallocate some resources discussion with Maureen suggests that help with testing framework for driver would help. Suggestion is to put this on Chuck's plate. Maureen, would like to demonstrate something on friday.

  • Kepler
    • Report: Current state of Kepler work.

Tianhong: working on data validation actor; working on 4-5 of the 8 or so required functions; developed in the "system-independent" services framework to be deployed either as Kepler actors or Akka actors. Paul: pair of data output actors that parallels the Mongo data and provenance writers (MongoProvenanceWriter), have that also for outputting CSV files Paul: On desktop CSVReader-(composition of QCActors)-CSVOutput-CSVProvenanceOutput In Network: MongoReader-(composition of QCActors)-MongoOutput-MongoProvenanceOutput Bertram: Two areas of Tianhong's (and Sven's) work: Curation functionality (services) and scalability of these services. Will be featured in Tiahong's upcoming qualifying exam. Bertram: Sven focused on writing thesis. Bob: On thing that hasn't been clear to me in tech meetings is that while we can get speedup on webservice calls by caching, is are there more fudamental scaling issues than network latency. Are there any n-p hard problems remaining. Bertram: Several observed scaling issues: Data flow pipeline not being parallelized, and then service calls having limits on parallellization and network latency. Algorithmic complexity with in the domain is an interesting issue to consider

=> for Tianhong QE
    • Need workflow with georeference validator and scientific name validator composed with GBIF service for SCAN.

Paul: Other item for Tianhong to check. To do in email.

  • dwcFP and DarwinCore RDF guide - Need to provide feedback.

Todo for Bob and Paul.

  • Analysis
    • Report: Progress on implementation of OAI/PMH harvesting through firewalls.

Maureen: Still needs FP message, unclear of content (zip, tar, single record, xml, etc). Todo: Maureen and David to discuss.

  • SCAN TCN Support
    • Sanity Checking, updates to AnnotationProcessor installation.

David: Latest update needs to be deployed. Errors in sanity check appeared to be on client side on new occurences that relate to transcription.

    • Progress statistics and other deliverables: Discussion with ASU for David and I to visit for a week of hacking with Ed and Nico.
  • NEVP TCN Support
    • Report: Preparations to update Annotation Processor Deployment at UNH.

Pending completion of driver work.l

    • Report: Testbed UI for data entry for duplicate finding.

Chuck: Solar schema, sql that pulls from GBIF cache, UI that has 4 fields geography, collector name, collector number, taxon, matches on those produce list of darwin core labels, if just one, represnted as data to edit, if several as a select list when selected switch to editable. Ran index on last night on all fungi in GBIF. Test platform before testing in HUH-Rapid. Numbers are a remaining problem. Which fields to pull from schema are also a question.

    • Annotations in OCR/Croudsourcing pathways
      • Hackathon. Who to send?

iDigBio hackathon, last full week before christmas. Opportiunity, from iDigBio interested in consensus forming, and we might learn interesing things from this. Discussion: Risks and benefits.

  • FP Infrastructure
    • Report: Status of FP Node Refactoring

David: Development/fp lite mechanism for deploying node using embedded jetty. Working on client helper which will work with (actually) asynchronus message driven node. Camel provides functionality we've been implementing by hand, including xml dsig, simplyfying code. Configurable routes composable within a class. This lets us write unit tests for the services, and separately unit tests for functionality in beans. Works with EJBs, web services, file systems, mongo, spring beans, etc. Looked at integrating Akka with Camel, some minor work needed. Non-Tech

  • Need burndown numbers from Kristin
  • @ UC Davis, Sven has to finish his thesis to start his new job at Google in December.

Hence he's resigning from UC Davis effective mid November (next week..) For Friday:

  • Driver demo.
  • OAI-PMH harvesting message: deliver all results in a tarball, or message per record
  • look through set of existing annotations on the SCAN Symbiota instance, look at the sanity check.