2014Jan29

From Filtered Push Wiki
Jump to: navigation, search


Etherpad for meeting notes: http://firuta.huh.harvard.edu:9000/FP-2014Jan29

Agenda

  • Report from Friday call
  • FP Infrastructure
    • Report David: Status of refactoring to camel.
  • Analysis
    • Report Tianhong: Current state of Kepler/Akka work.
    • Report: Progress towards deploying Akka for QC in SCAN and NEVP nodes.
    • Report Bob: Progress on Duplicate Finding data mining
    • Report Chuck: Duplicate detection UI.
    • Discussion: Duplicate detection
  • Driver
    • Discussion: Driver
    • Report Paul: Testing Alternate NEVP->Specify-HUH ingest tool.

Non-Tech

  • Davis burndown rate.
  • James: TDWG session

Reports

  • Paul
    • Got FP-Deployments checked out (mostly updating eclipse for dependencies) and maven build working following David's README. Working on local configuration files.
    • Got changes needed to support NEVP->Specify-HUH ingest tool checked back into classes in symbiota trunk, so that the same SpecProcessorNEVP.php file (which parses the NEVP new occurrence annotation documents) is used for both projects. More testing, and got fixes to issues found, including support for agent and occurrence guids. Small number of TODOs left, shouldn't block deployment.


Notes

FilteredPush Team Meeting 2014 Jan 29 Present: Bertram, Bob, David, Chuck, Paul, Maureen, Tianhong, Jim H. Agenda Non-Tech

  • Davis burndown rate.

Jim: If Betram continues paying Tianhong at the current rate, then Bertram will still have funds left at the expiration of the grant, so we don't need to plan to put more funding on the Davis side. Bertram: Further discussion offline later.

  • Accelerating burndown.

Paul: Clear that we need to do this. Discussion: Proceed with LHT/Temp postings. Paul: TODO: Circulate Temp/LHT position drafts. James: Concur. Yes, whatever we can do quickly!

  • James: TDWG session

Paul: James has circulated a link to draft text for the session proposal. Please comment in the document. James: I did not include everyone on this, only the conveners but could make available. Joel and I (along with the usual suspects) are also submitting a DwC symposium in the lines of last year to keep up the momentum. Tech

  • Report from Friday call

Maureen: Notes at: http://firuta.huh.harvard.edu:9000/FP-2014Jan24 discussed how to let network clients discover analysis capabilities. Options of a few diffent workflows - tied to query on harvested data. Paul: Also looked at updating Kepler Kuration and making Akka deployment artifact for end users.

  • FP Infrastructure
    • Report David: Status of refactoring to camel.

David: Have all the camel stuff working in the network. Starting to move out to the clients - working on PHP integration in symbiota, using client helper services. Developing integration test for registering/checking interest. Working through some maven issues in build system. Paul: Have the packages checked out and building, working on configuring development environment working again. David: Also working on an Akka connector for the Camel system.

  • Analysis
    • Report Tianhong: Current state of Kepler/Akka work.

Tianhong: Completed Akka read/write csv files. Tianhong: Getting deployment of executable jar file with workflows ready.

    • Report: Progress towards deploying Akka for QC in SCAN and NEVP nodes.

Paul: Pieces needed are:

  • Harvest

Maureen: Harvesting from symbiota to Mongo on symbiota2 - fp1 (test instances). Need to update the taxon tree harvest process as we've changed from fuseki to mulgara. Maureen: Need to change how records are being written into mongo. Paul: thus need for UI element to launch queries for harvested data and QC results on those data - informing how the data should look in mongo. Maureen: Side note:will need to update java harvester that exists for Specify instances. Paul: Needed for the NEVP harvest. Production goals: symbiota1 - symbscandb -> taxa to fp2 symbiota1 - symbscandb -> occurrencedata to fp2 symbiota1 - cnhdb -> taxa to fp3 specifyinstances -> occurrencedata to fp3 symbiota1 - cnhdb -> added collections occurrences to fp3.

  • Two workflows - that load data from harvest and write out related records

Tianhong: Two workflows assembled, need to assemble actors, and load data from MongoDB actor. Date validation actor is not yet complete. Betram: Are all actors other than date validation ready. Tianhong: Have workflows assembled, need to integrate with camel routes. Paul: 4 workflows First, for desktop use: (1) csv->sciname(nomenclatural)->georeferencevalidate->datevalidation->csv (2) csv->sciname(taxonomic)->georeferencevalidate->datevalidation->csv Second, for embeded use: (3) mongo->sciname(nomenclatural)->georeferencevalidate->datevalidation->annotations->mongo (4) mongo->sciname(taxonomic)->georeferencevalidate->datevalidation->annotations->mongo Tianhong: what to do with the date validation actor: Betram: Very open ended, have a version that does date validation for some cases, and then improve that over time. Tianhong: Will put in current working version (with limited capability) for integration testing.

  • Akka integrated into node - launch on harvest

David: In progress, testing, not checked in. David: Parameters supplied need to get passed in and used (same ones that we were using for Kepler). David: One parameter is a query on mongo to retrieve data for Akka to run.

  • Query mechanism to retrieve harvested data and QC results.

David: A mongo query again. David: Launch Akka from UI? Mauren: Also need a test mechanism to launch Akka. Paul: Harvest would launch akka to QC latest harvest. Discussion: Don't need to connect this to FP-Message to run analysies, camel route can be exposed on local host for invocation by either a command line test or by harvest. David: in essence provide an http service method to launch Akka. Maureen: Useful to be able to launch a command line test from a deployed environment to validate that service is available and working.

    • Report Bob: Progress on Duplicate Finding data mining

Bob: Got clustering working with a reasonable algorithm (one which will detect an arbitrary number of clusters, and should have clusters of botanical duplicates - i.e. not k-means), evaluating results.

    • Report Chuck: Duplicate detection UI.

Chuck: Lots of javascripting working back on tests. Got minimal integration with rapid data entry appliation working. Passing data structures between application and plugin in the javascript layer. More productization needed (lots of references to localhost, caching issues, etc). Paul: TODO: get exssicatta data to chuck.

    • Discussion: Duplicate detection
  • Driver
    • Discussion: Driver

Move to Friday.

    • Report Paul: Testing Alternate NEVP->Specify-HUH ingest tool.

Paul: all blocking issues resolved, some minor todos left, ready to deploy when CNH/NEVP is ready on symbiota1 For friday: Driver discussion. Touch base on akka/kepler deployment artifacts.