2013Nov27

From Filtered Push Wiki
Jump to: navigation, search


Etherpad for meeting notes: http://firuta.huh.harvard.edu:9000/FP-2013Nov27

Agenda

  • Summary of Friday Tech Call.
  • Kepler
    • Report: Current state of Kepler/Akka work.
      • CSV load/save actors.
      • Date Validation actor
    • Generalizing annotation generation actor - vertnet github issue filing as second case.
  • Driver
    • Report: Development status
    • Symbiota Driver, appears as need from SCAN.
    • Validation of ingest of all current annotation types (new determination, updated determination, new georeference, updated georeference, new occurrence, updated locality).
    • Plan for validation of future annotation types (updated habitat, phenological state descriptions).
  • Analysis
    • Report: Progress on implementation of OAI/PMH harvesting through firewalls.
    • Discussion: Supporting repeated QC requests for same records. QC on harvest, information to gather on query?
  • SCAN TCN Support
    • Working on scheduling visit, Possibly week of Jan 6.
  • NEVP TCN Support
    • Report: Preparations to update Annotation Processor Deployment at UNH.
    • Annotations in OCR/Croudsourcing pathways
      • Hackathon. Progress on setting up FP-Lite instance and development environment.
  • iDigBio integration, possible schedule in February.
  • FP Infrastructure
    • Report: Status of FP Node Refactoring

Non-Tech

  • Need to increase burndown rate.

Next Week

  • Proposal for ranking annotations based on queries, not repeated analysis of same records.
  • SCAN, revisit sanity check.

For Future meetings

  • Duplicate finding.
  • Prospective meetings, development targets.

Reports

  • Paul
    • Was at iDigBio Summit III
  • Chuck
    • Small refactorings / small tests for Maureen
    • Documenting the installation process for FP-Deployment. Getting closer, but have not gotten a installation from scratch to work smoothly yet.
    • Refactoring / testing data-entry tool in the gaps.

Notes

Present: Maureen, Chuck, Bob, Jim, Paul, Bertram, David, Tianhong, James

  • Summary of Friday Tech Call.

Maureen: Update from Paul on iDigBio Summit Discussed vertnet using github for data quality control. Looked at CSV reader issues that Kepler is having.

Paul: iDigBio has FP integration into the portal on the schedule for February. Greg OK with us contributing FP code to morphbank repository and Deb OK with putting that into production.

  • Bertram: The National Academy of Sciences Board on Research Data and Information (BRDI; www.nas.edu/brdi) is holding an open challenge to increase awareness of current issues and opportunities in research data and information
    • are we responding to this with a Letter of Intent?

(fyi: Bob's reply to Amber's email went to all of DataONE, not just to FP)

  • Bertram: according to Markus Doering, all GBIF data is accessible for download.

Indeed, this is the new distribution mechanism iDigBio has a similar mechanism now

Bob: Will write a letter of intent.

Bertram: As a project we should be clear about the intent, fits clearly into data quality improvement in biodiversity data.

  • Kepler
    • Report: Current state of Kepler/Akka work.
      • CSV load/save actors.

Tianhong: Finished csv reader/writer actors in akka. Whole workflow running in about 10 minutes on 100k name records. Many names in test set are not validating.

Bertram: From friday, phenomenon might be starting over again with each subset. Could be a comad (memmory/resources) issue as well.

Tianhong: Apperance is of issue with comad.

      • Date Validation actor

Tianhong: On pause waiting for CSV reader/writer.

Bertram: Issues to address.

Tianhong: Internal consistency check and within data set consistency checks.

Discussion: How to approach: data mining problem that gets run periodically, or as record level data quality control on ingest of sets of records?

Bertram: Build index of valid time points for collector strings might be a way to do this. Start with records with known birth/death dates, then add mined data of occurrences of dates associated with name strings, provide as a service.

    • Generalizing annotation generation actor - vertnet github issue filing as second case.

Bertram: Seem useful approach.

Paul: Will contact David Bloom.

Bob: Also doesn't have FP's arbitrary notification on interests.

  • Driver
    • Report: Development status

Maureen: Have a main method that runs the output, so is just about there. Need to test integration with annotation processor and test thouroghly.

Chuck: Started into testing.

    • Symbiota Driver, appears as need from SCAN.

On table. Linda has put PHP development on table for MCZbase, so PHP based driver code with shared components with MCZbase and Symbiota might make sense (two core driver codebases - Java for Specify and Specify-HUH and PHP for MCZbase and Symbiota (or MCZbase could go in Java/ColdFusion)).

    • Validation of ingest of all current annotation types (new determination, updated determination, new georeference, updated georeference, new occurrence, updated locality).

Paul: Collect set of current examples. Test ingest of these through annotation processor into database through driver.

Bob: Have set include both generated and hand generated test cases.

Bob: Hand generated ones need to be updated.

    • Plan for validation of future annotation types (updated habitat, phenological state descriptions).

Paul: Will Driver X support new Annotation Y.

Bob: This is beyond tests of rule sets?

Bob: Are we only testing things that are known to meet the apple pie rules?

Paul: First case is does the driver work with annotations in the wild with SCAN and NEVP. Second case is extending that to tests of invalid annotations and attacks, needs a bit more thought on where the attack mitigation occurs.

Bob: Need documentation of the assumptions of the tests. One of those is the rule set.

  • Analysis
    • Report: Progress on implementation of OAI/PMH harvesting through firewalls.

Maureen: Haven't had time to get back to this yet.

    • Discussion: Supporting repeated QC requests for same records. QC on harvest, information to gather on query?

Paul: Maureen and David to look at early next week and put a proposal on the table.

  • SCAN TCN Support
    • Working on scheduling visit, Possibly week of Jan 6.

Nico good with that week. Need to start lining up scheduling.

Paul: Target bringing targeted inadequately identified material that is worth identifying to the attention of specialists. Good thing to work from from Richard ?'s proposal at iDigBio.

Maureen: Stackoverflow has a nice model for this: the more questions you give good answers to the better your reputation gets - doing things well improves your reputation.

Bob: Want to schedule EOL people sometime in January. Will put up doodle poll.

  • NEVP TCN Support
    • Report: Preparations to update Annotation Processor Deployment at UNH.

Maureen: Haven't scheduled yet. Want to wait untill we have been able to show driver working with test cases.

    • Annotations in OCR/Croudsourcing pathways
      • Hackathon. Progress on setting up FP-Lite instance and development environment.

Chuck: Working on refining README documents for installation. Repeated installs and refinement.

  • iDigBio integration, possible schedule in February.

Paul: David review documentation on install/run, see what needs to be brought up to date.

  • FP Infrastructure
    • Report: Status of FP Node Refactoring

David: Finished client helper and put a service over it. Have FP-Lite working over the new code. Have work to do in annotation processor and Symbiota to let those use the new annotation processor. FP-Lite and FP-Medium now using same code and same client helper, FP-Lite is just a subset of FP-Medim

Chuck: FP-Lite is a dependency for FP-Medium?

David: FP-Core is the dependency, FP-Lite and FP-Medium are deployment descriptions that depend on this.

Chuck: FP-Service, how does this relate?

David: Includes all of what the Jobs now as a set of services. FP-Lite is only using the annotation service. Message driven services are in FP-service, these are services that wrap FP-Core functionality. Annotation service, analysis service, interest service all used by FP-Medium. FP-Service depends on FP-Core.

Non-Tech

  • Need to increase burndown rate.

Paul: TODO: Check with Kristin early next week.