2014Jan22

From FilteredPush
Jump to: navigation, search


Etherpad for meeting notes: http://firuta.huh.harvard.edu:9000/FP-2014Jan22

Agenda

  • FP Infrastructure
    • Report David: Status of refactoring to camel.
  • Analysis
    • Report Tianhong: Current state of Kepler/Akka work.
    • Discussion: Targets and work needed for deploying Akka for QC in SCAN and NEVP nodes.
    • Report Bob: Progress on Duplicate Finding data mining
    • Report Chuck: Duplicate detection UI.
    • Discussion: Duplicate detection
  • Driver
    • Report Maureen: Development status
    • Timelines for revisiting annotation processor
    • Report Paul: Testing Alternate NEVP->Specify-HUH ingest tool.

Non-Tech

  • Davis burndown rate.
  • James: TDWG session

Reports

  • Paul
    • Got Specify-HUH NEVP New Occurrence Annotation ingest tool checked in, did more testing and debugging. Nearly ready to use.

Notes

Present: Bertram, Bob, David, Chuck, Maureen, Paul, Jim, Tianhong, James.

  • Scheduling issue:

Bertram would like to have the meeting start two hours later on Wednesdays (class schedule conflict through March). This would be a start at 2PM Eastern, 11AM Pacific.

Works for Jim.

Works for James.

Paul: Will need to check on room availability.

  • FP Infrastructure
    • Report David: Status of refactoring to camel.

David: Have checked the refactored branch back into trunk. Have a client helper that provides what we need for SCAN, and most of what is needed for NEVP, still need to do Akka integration. Two separate deployable artifacts, client helper and node. Working on integrating updated client helper with Symbiota.

Paul: To check ease of setting up development environment (per Maureen's request), I'll work on getting the current trunk set up in a development environment.

David: Main work remaining is in clients. PHP library functions now available from client helper, so clients using that need refactoring. Also annotation processor will need refactoring as well.

  • Analysis
    • Report Tianhong: Current state of Kepler/Akka work.

Bertram: Paper in draft stage, working on cleaning up. Should be finished soon.

Tianhong: Started back into cleaning up loose ends in Akka, making sure that workflows in Akka and Kepler are consistent.

    • Discussion: Targets and work needed for deploying Akka for QC in SCAN and NEVP nodes.

David: We need to have an Akka provider for Camel, and invoke it from the camel routes. What is state of workflow definition?

Tianhong: Workflow definition is still hardcoded (as a Java class).

Bertram: Is it possible to produce a serialization of this?

Tianhong: Yes, but substantive work.

Bertram: Perhaps work for subsequent work.

Paul: Plausible given the deliverables, we have explored programming the analysis component by injecting Kepler workflow xml documents into storage.

Paul: Question probably what metadata to discover about what workflows are available?

Bob: Revisit the use cases and use case scenarioes to develop a success test.

Paul: Let's phrase a test as a research user visits the FP web presence on a node, launches a query for data, and wishes it to have particular QC tests run on it, how does this user discover which tests are available (and in the current model have already been run at data harvest time).

Bertram: Also, what sort of deployments do we want to make?

Paul: similar packaging as the Kepler/curation packages (update Lei's package for Kepler) Need something similar for Akka.

Tianhong: Not sure at this point what would be needed for Akka.

Bertram: Should be easier than Kepler deployment!?

Tianhong: Depends on what kind of format is desired for deployment.

David: With the Camel - Akka integration we could provide Akka as a http service that stands alone.

Bertram: In Kepler, have a desktop application that can invoke remote services. In Akka, more flexible services and parallelization.

David: A deployable artifact for Akka that could run from the command line.

Bertram: Good target.

David: Camel also provides a file connector. Also easy to write a wrapper with a Main method around Akka.

Paul: Another item for discussion of details on Friday.

    • Report Bob: Progress on Duplicate Finding data mining

Bob: Feel like in a place to try with real data.

Paul: On my plate, grab me this afternoon.

    • Report Chuck: Duplicate detection UI.

Chuck: Working on integration with the HUH rapid data entry tool. Duplicate finding UI can pass text strings on, working on integration where the rapid UI has form elements that need to know the id for a record as well as the text string - working on being able to retrieve these Ids behind the scenes, and connecting the two applications.

    • Discussion: Duplicate detection

Bob: Parallel tracks are still making sense at the time being. Can't tell yet if it is a realistic task to build a cache of duplicates. Will require some integration of the work Chuck has done for preprocessing to prepare it for duplicate finding.

Chuck: Most of the prototypical instances of cleaning in GBIF involve filling in the taxonomic heirarchy for scientific names.

  • Driver
    • Report Maureen: Development status

Maureen: In a very good place to start integration into the annotation processor.

    • Timelines for revisiting annotation processor

Maureen: Now is good.

    • Report Paul: Testing Alternate NEVP->Specify-HUH ingest tool.

Paul: Testing nearing completion, want to sit down with Maureen and go through some test ingests.

Non-Tech

  • Davis burndown rate.

Jim: Bertram close to having the figures he needs. Expect to have email exchanges over next couple of days.

  • James: TDWG session

James: Anton wrote back to us, keen, interested in bringing more BioVel - taverna talks as final outreach from that project. Also discuss services used by those workflows and registries for such. Makes sense from this to ask for two time blocks. Ready to write something up.

  • James: EarthCube and FP proposal?

James: Pinged out to a few people, EarthCube NSF infrastructure funding for earth sciences, program officers are very interested in Bio-Geo interactions.

Bob: Involved in discussions on efforts for data citation in electronic publications in W3C. Attempt there to bring in geological science folks in these problems.

James: EarthCube deadline in early march for full proposals.

Jim: ADBC call did fund paleo, and program officers are very keen on ADBC - geo - paleo proposals. If we can partner with some existing group, this might make a lot of sense for us.

James: Perhaps an existing suitable paleo partner.

Paul: Perhaps Bruce.

For Friday:

  • how are we going to let network clients discover available analyses = analyses run at harvest time. what are the available akka workflows
  • details of what's involved in updating the kepler kuration package
  • building a deployableakka artifact with some canned workflows to be run from the command line

Friday notes