2013Aug28

From Filtered Push Wiki
Jump to: navigation, search


Etherpad for meeting notes: http://firuta.huh.harvard.edu:9000/FP-2013Aug28

Agenda

  • Kepler
  • NEVP TCN Support
    • Annotation Processor hardening
    • UNH site for AnnotationProcessor deployment, visit (Aug 5).
    • Status update (production deployment target date 2013 Aug 15).
    • Specify-HUH driver
  • SCAN TCN Support.
    • Status update on deployments.
    • Turned on FP in production SCAN Symbiota?
  • MCZbase Driver
    • Update on MCZbase test instance for driver development access:

Non-Tech

Next Week

  • Demo: run of current Kepler taxon name cleaning workflow on SCAN data, evaluate scaling issues.

Week after next

  • Look at duplicate finding (Lucene etc).

For Future meetings

  • dwcFP and DarwinCore RDF guide - feedback.
  • Prospective meetings, development targets.
  • Burndown
  • Duplicate Finding Find_Duplicates. State of old code.

Reports

  • Paul
    • Nothing substantive.
  • Maureen
    • Harvested SCAN data into Mongo & Fuseki on fp2.
    • Harvesting NEVP data into Mongo & Fuseki on fp3. Documenting the process.
  • Bob
    • Based on advice from Zhimin, began to look at native Lucene solutions for duplicate finding. In this found use of an interesting Lucene class called MoreLikeThis. In pursuit.
    • Back to dealing with production staff requests for PLoSOne manuscript.

Notes

FilteredPush Team Meeting 2013 Aug 28

Present: Maureen, Paul, David, Chuck, Tianhong, Bob, Sven, James


  • Kepler
    • Status of current Kuration workflow development.

Tianhong: Working on preparing to run demonstration on FP2, deployment on FP2 isn't parallell to that on FP3.

David: Dependency on annotation generation model, Kepler jar may have different copy of the configuration than the node itself. Evidence:VerbatimEventDate is missing is seen in error message.

Tianhong: Configuration changed since SPNHC demo?

David: No.

Paul: Deployment difference on FP3 vs FP2 might point to issue with configuration deployed on FP2.

David: Probably minor issues that David and Tianhong need to coordinate.

Tianhong: Taxonname validation actor still needs work on homonym validation, GBIF backbone taxonomy does not include the ambiguous names, so we can't use it for this. Working on finding multiple entries in other checklist bank lists.

Paul: Probably in a state where we can run it against some real data and evaluate the results.

Tianhong: Markus from GBIF has a description of a new (in progress) service over the GBIF backbone taxonomy that should help with homonym resolution, not expected to be deployed for a month or so.

Sven: Haven't gotten started on this yet, what is focus of current work?

Paul: Kepler Kuration actor development focued on taxon name validation, Working on alternative mechanism for deployment of Kepler as separate EAR with service invocation.

David: Have also been looking at invocation of maven from ant for dependency management in Kepler deployment

Sven: Akka is pure java workflow system.

Paul: Look at how Akka might be integrated on Friday. Next wednesday run the embedded Kepler demo on SCAN data.

Maureen: Friday meeting 8:10 pacific, 11:10 eastern starting this week for next three weeks.

  • NEVP TCN Support
    • Annotation Processor hardening

Chuck: Home and about are visible without login, other URLs produce a login prompt. Working with David on refactoring database users/roles/principals. Integrating Spring authentication/authorization management.

    • UNH site for AnnotationProcessor deployment, visit (Aug 5).

Maureen: Train goes up, drops off in middle of campus. No code customizations. Have been scoping out the snapshot they sent - some local customization fields, have more than one collection.

    • Status update (production deployment target date 2013 Aug 15).

David: All node infrastructure should be in place. May be issues involving Kepler and configuration, need to follow up with Tianhong.

Maureen Notes: Harvesting NEVP data into Mongo & Fuseki on fp3. Documenting the process. Snapshot of data from symbiota, manipulate data on workstation, then upload into FP3 capabilities. In process.

  • Roles for ApplePie: network deployment of annotation processor
    • admin (everything)
    • project administrator (can add/remove users and manage user rights)
    • research users project-level access (can view unredacted data per project) = access to data FP Network instance = project. (also: Research users who aren't involved in the project - need authorization from project administrator).
    • Specify-HUH driver

Maureen: Nothing specific yet, just general driver work.

  • SCAN TCN Support.
    • Status update on deployments.

Maureen Notes: Harvested SCAN data into Mongo & Fuseki on fp2.

Bob: Maureen, I use Fuseki 0.2.7 on http://filteredpush.org/endpoint It has some wrapper David wrote for that UI, which are certainly irrelevant in general. (Maureen: OK thanks)

David: Server capacity increased. All components except Kepler ready to go.

Maureen: Data in place.

    • Turned on FP in production SCAN Symbiota?

David: Not yet.

Paul: Suggest we validate the configuation first, then turn on soon.

David: Target date to turn on: Tuesday.

  • MCZbase Driver
    • Update on MCZbase test instance for driver development access:

Maureen: All access issues resolved, able to start work, haven't started development yet.

Non-Tech

Paul: Three abstracts?

James: Sept 4 current deadline

  1. Anton's session on workflows, like SPNHC but more technical.
  2. Anton's session on workflows, Bertram's talk.
  3. FP talk on semantics of biodiversity. Technology session: mechanisms for annotation of biodiversity data. OA/OAD/FP

Paul: TODO for James, send out email to get the discussion of who need to write what abstract going online.

  • Annotation Paper

Bob: Working on production editor requirements. Hopefully complete today. Still need to address making good images.

Next Week

  • Demo: run of current Kepler taxon name cleaning workflow on SCAN data, evaluate scaling issues.

Week after next

  • Look at duplicate finding (Lucene etc).