2015May05

From FilteredPush
Jump to: navigation, search


Etherpad for meeting notes: http://free.primarypad.com/p/BqyI5n6LjM

Agenda

Non-Tech

  • Progress from Meeting With AnnoSys
  • Publications
    • Paul/James: Collection Objects
    • Paul/Bob/David: QC Reports
      • Bob: Progress on draft.
    • Bob: Refactoring Dup finding cluster analysis
      • Bob: Access to larger scale infrastructure
    • Bob: List of additional topics
  • SCAN
    • Annual Report, NCE, updates.
  • Schedule: Next call in 2 weeks.

Kurator

  • Scientific name validation in nomenclatural vs taxonomic mode.

Tech

  • Maven repository
  • Annotation Processor
  • State of Deployments
    • FP2.acis
      • Status of InvertEBase setup
    • FP3.acis
      • State for harvest for NEVP
  • Morphbank integration
  • Habitat, Phenology Ontology work.
  • Agenda for Thursday Tech Call

Reports

Notes

Present: Bob, David, Tianhong, Tim, Jim, Paul, Bertram

Non-Tech

  • Progress from Meeting With AnnoSys

David: Haven't heard from them recently, we should touch base with them to see if they need anything from us to proceed.

Bob: Think they were busy through May.

  • Publications
    • Paul/James: Collection Objects

Paul: No Update.

    • Paul/Bob/David: QC Reports
      • Bob: Progress on draft.

Bob: Pushing it along, not at a state to circulate yet. Making more use of the recent discussions.

Tim: Explain Data cube?

Bob: Suggestion for going forward. Spreadsheet as produced by postprocessor sort of looks like a data cube (summary sheet plus one sheet per actor, think of Z=0 as summary plane, Z=1,2,3 as planes for validators. First set of columns in each plane are DwC attributes, with addtional columns added at each plane. Principle assumption of the model as expressed is that each cell uniquely participates in a single validator. Flowering time validator does violate this assumption.

Bertram: Perhaps explore in paralell with small pieces in Kurator-P - what does this actor do. Need a higher level description of what an actor does without going into a flowchart.

Paul: Lends naturaly to Bob's model - a Z=3 layer could be a projection of Z=3a, 3b, 3c layers representing actions by smaller sub actors, naturally pointing in the same direction as Bertram proposes towards an algebra for data curation assertions.

Bob: Rational numbers for layers.

Tim: Have explored some Akka actors that perform SQL like operations on a CSV model, may need more complex data model for transport of data objects through actors (at least a star schema of moving data objects). Nested workflow implementing complex flow charts with internal state tracking gets quite complex. Top level use workflow actors, but rapidly drop into code (perhaps with yes workflow markup - which can then be used to extract provenance information).

    • Bob: Refactoring Dup finding cluster analysis
      • Bob: Access to larger scale infrastructure

Bob: No action.

    • Bob: List of additional topics
  • SCAN
    • Annual Report, NCE, updates.

Jim: Request in for NCE (but relying on Nico at Arizona, he is planning on doing this), everything set on this end. Will put in the annual report shortly, have the report from the main grant, can just submit that,

  • Schedule: Next call, next week, SPNHC coming up on us.

Kurator

  • Scientific name validation in nomenclatural vs taxonomic mode.

Tianhong: in progress, looking at separating out a component (integrated with sciName subworkflow).

Paul: Please check in when you can, I would like to look at how we are assessing authorship simlarity, and putting some assertions about similarity into the workflow output.

Tech

  • Maven repository

Tim: Issue with permissions still needs to be resolved.

  • Annotation Processor

David: Got a build and deployment working, running into some issues with deployment that causes tomcat to become unresponsive and get heap space issues, something in applicaiton startup isn't working correctly.

  • State of Deployments
    • FP2.acis
      • Status of InvertEBase setup

David: Switched on and in same state as SCAN symbiota.

Paul: Would be good to do an analysis of the Chicago data set and send to Petra.

    • FP3.acis
      • State for harvest for NEVP

David: Have been working on improving harvest - failing about half way through, mongo queries were taking most of time, replaced a find if record exists, delete, write or insert operation with a unitary mongo insert or update operation (upsert), very significantly improves performance. Looking at batch inserts into mongo - vastly faster. Start with new data set, batch insert is about 60 times faster than current. Still running into some issues with xml (the OAI/PMH provider may not be correctly handling some value from the db, and producing an empty xml document part way through the data set)

  • Morphbank integration

David: Have a current dump, have morphank trunk on local machine, working on integrating changes into current trunk, about half way done with that, will put in branch for them to look at and then merge and use on production.

  • Habitat, Phenology Ontology work.

Paul: Haven't heard anything.

  • Agenda for Thursday Tech Call
    • Run workflows for SPNHC.
    • Look at Bob's cube model
    • Consider the granularity of actors, consider yes workflow markup of FP-Akka actors.