2013Aug21

From Filtered Push Wiki
Jump to: navigation, search


Etherpad for meeting notes: http://firuta.huh.harvard.edu:9000/FP-2013Aug21

Agenda

  • Annotation Paper
  • Kepler
  • NEVP TCN Support
    • UNH site for AnnotationProcessor deployment, visit (Aug 5).
    • Status update (production deployment target date 2013 Aug 15).
    • Duplicate Finding Find_Duplicates. State of old code.
  • SCAN TCN Support.
    • Status update on deployments.
    • All set to turn on FP in production SCAN Symbiota?
  • MCZbase Driver
    • Update on MCZbase test instance for driver development access:
  1. Can: Access database with SQL developer over ssh X tunnel.
  2. Can: SSH in to see/edit coldfusion code.
  3. Can: Access the specimen search page and run searches, see search results.
  4. Can: Have root on machine, can install tomcat if needed.
  5. Can: Access to database from SQL developer on desktop (firewall issues).
  6. Can't: Need to create user account (through web form, gets to an error message), appears to be same issue with searches over collection objects, throws a stack trace. Working with brendan on this, appears to be not all permissions having been copied over correctly.
  7. Can: Start/stop coldfusion.

Non-Tech

For Future meetings

  • dwcFP and DarwinCore RDF guide - feedback.
  • Prospective meetings, development targets.
  • Burndown

Reports

  • Paul
    • Did some Wiki cleanup, particularly around classification of meeting pages.
    • Work on final MS fixes with Bob.
  • Chuck
    • For AnnotationProcessor, added automated tests of login and navigation to each section.
    • Slow start on implementing Spring Security.

Notes

FilteredPush Team Meeting 2013 Aug 21.

Present: Bob, David, Chuck, Paul, Maureen, James, Jim, Bertram, Tianhong, Sven.

  • Annotation Paper

Bob: Submitted on Monday. Expect that there will be work with the production team to do for better figures.

Bob: Still need to package current release of examples to go with paper as download in sourceforge.

  • Kepler
    • Name cleaning and GBIF backbone taxonomy.

Tianhong: Have put option into taxon name cleaning actor to be able to invoke the GBIF backbone. Markus is seeing very similar problems in the GBIF backbone taxonomy in identifying homonyms as we are.

Bertram: Alternative engine, no work other than design thoughts in Kurator.

Sven: Just back from Germany, haven't gotten back to working on it yet.

Bertram: Need to look at whether it makes sense to implement the georeference validation and taxon name validation actors that Tianhong has been working on in Akka.

Sven: Needs a decision on whether to pursue the Akka work, or to work more on scalability in Kepler.

Bertram: Need to investigate whether the scaling issues are inherent to Kepler, a result of Comad workflows, or inherent to service invocation.

Bertram: Have identified that the COMAD implementation does add a streaming lag.

Sven: Being able to launch n parallel web service invocations would speed up things considerably.

Bertram: Neither Kepler nor COMAD would allow easy parallelisation of workflows.

Bertram: Put on table examining scalability questions (for Tianhong and Sven).

Maureen: Consider not treating analysis as a service. We provide software that provides algorithms, but not the infrastructure. (I just don't want us to be writing in the software infrastructure for queueing jobs, keeping everybody's data in their own partitions, re-starting stalled workflows remotely, that kind of thing. It's a big complicated problem. I totally agree that work to include parallelization with Akka is a good thing.)

David: Part of this (stalled workflows) is a motivation for moving away from Kepler as wrapped with an EJB. Goal is to put the Kepler capability into it's own enterprise application.

Paul: For friday, discuss what needs to be deployed/configured to run a two live analysies of SCAN data (on FP2), both with current georeference cleaning and taxon name cleaning actors, one subset of about 6 records, another subset of about 5 minutes worth of analysis. Contrast with full size of SCAN and NEVP data sets. Tentatively target running this as a demonstration next Wednesday.

  • NEVP TCN Support
    • UNH site for AnnotationProcessor deployment, visit

Maureen, general agreement (Sept 5). Need to arrange transportation.

    • Status update (production deployment target date 2013 Aug 15).

David: All needed now is harvest of data.

Maureen: Have a harvest of NEVP records on FP3 as JSON, need to ingest into Fuseki and MongoDb. Probably need to do some data manipulation first. Taxon records may or may not have what we need. Probably a couple days worth of work here.

David: We still need to work out how to get updates out to annotation processor installations at Specify sites.

Perhaps parallel with EFG mechanism.

Bob: Report on investigation with David: Found lowest level (string distance algorithms) and top level (interface) of Zhimin's code. We have multi-variable duplicate search elements - need to have several places where we lift into multi dimennsional space, suspect, but haven't tested yet that he parameterized that (probably the cutoff that we'll use). Have put copies of relevant code into FP-Core.

David: Algorithms well encapsulated.

Bob: Indexing implementations available at a higher level.

  • SCAN TCN Support.
    • Status update on deployments.

David: Have put in request to Alex to increase capability of FP2, believe that's been completed (might need a restart of the VM).

Same state as NEVP, FP2 is set up, need to load harvested data.

    • All set to turn on FP in production SCAN Symbiota?

David: Should be ready. Does need one more test to make sure all the pieces are working and then do a glassfish restart.

David: Once turned on, how do we manage versioning (install4j?). Do we want to manage updates of software differently for SCAN and NEVP. How do we manage with SVN - tag and build from the tag.

Maureen: Good Friday topics.

Paul: General answer is that we can expect SCAN and NEVP to have different deployment schedules.

  • MCZbase Driver
    • Update on MCZbase test instance for driver development access:

Can't: Need to create user account (through web form, gets to an error message), appears to be same issue with searches over collection objects, throws a stack trace. Working with brendan on this, appears to be not all permissions having been copied over correctly.

Maureen: No resolution of this issue yet.

Non-Tech

Workflow presentation;

Bertram: When useful to be there?

James: perhaps ramp up a bit more than what we did at SPNHC, looking in a little more detail at the TDWG relevant standards/technologies.

Bertram: Also W3C Prov as something to include in our workflows [and derived annotations]. Both workflows could be linked to concepts in a community ontology, and workflow results linked to detailed Prov provenance traces.

Bertram: Was thinking of having James/Paul present in Anton's session on Wednesday.

Paul: Perhaps a presentation for Anton's session and then on Wed/Thur/Fri have a software demonstration paralleling the SPNHC demo.

James: 20 min talk in Anton's session.

James: Believe the abstract deadline is Aug 31.

    • Annotations Interest Group session includes Task Group for Applicability Statement on OA.

Paul: Two initial agenda items: Current state of OA, task group.

  • Collaborations
    • Specify/Symbiota