2013Jun26

From Filtered Push Wiki
Jump to: navigation, search


Etherpad for meeting notes: http://firuta.huh.harvard.edu:9000/FP-2013Jun26

Reminder: Change of meeting time effective Sept 4: (12-1 Eastern 9-10 Pacific).

Agenda

  • Project/Package Refactoring: Developer Documentation
  • Annual NSF project report: update GoogleDocs doc by Today.
  • NEVP TCN Support
    • Status update
    • Annotation processor as in SPNHC demo for Patrick to test.
  • SCAN TCN Support
    • Timelines for deployment.
    • Symbiota instance for Nico to test.
    • Annotation processor as in SPNHC demo for Nico to test.
    • Production node deployment target date 2013 July 31.
  • Kepler
  • MCZbase Driver
  • dwcFP and DarwinCore RDF guide - need to provide immediate feedback.

Non-Tech

  • Third Project Programmer, Burndown.
  • Recent Contacts
  • Collaborations
    • Specify/Symbiota

For Future meetings

Reports

  • Paul
    • More debugging and finishing touches to SPNHC demo.
    • Gave Semantic Annotation talk at SPNHC.
    • With James, and with many thanks to Maureen, Tianhong, and David, gave successfull SPNHC demo.

Notes

FilteredPush Team meeting 2013 June 26th.

Present: Paul, Maureen, David, Jim, Nico, Patrick, Tianhong

  • Project/Package Refactoring: Developer Documentation

David: Working on annotation generation and rules. Still a good bit of documentation that needs to be updated (on an ongoing basis), but in a state where it could be used. We should update the documentation as we refactor and update the project.

  • Annual NSF project report: update GoogleDocs doc by Today.

Jim to review tomorrow, Paul to go through this afternoon.

  • NEVP TCN Support
    • Status update

Patrick: Still aiming for production in august. Script to generate RDF/XML annotation documents in test at OK. Symbiota instance at NEVP set up. In discussion with Ed Gilbert about how transport will work from Primary Digitization to Symbiota.

Meeting with iPlant folks tomorrow to discuss image metadata interactions with ingest (1PM Eastern).

    • Annotation processor as in SPNHC demo for Patrick to test.

TODO: Test Data - small set of records (ala SCAN/SPNHC demo), test annotations (new determination, update georeference, solve with more data). Use FP3, set up a specify database (but no UI).

  • SCAN TCN Support
    • Timelines for deployment.

To go into production, target date July 31 2013.

Nico: in production meaning deploy on main iDigBio server.

Paul: Deployed in the main SCAN symbiota instance.

    • Symbiota instance for Nico to test.

Nico: Access? What data do you need? (on demo VM..)

Paul: Two items to test: Specimens needing determination in Symbiota, and AnnotationProcessor. (Nico available to provide 5-10 specimens, with images, manually). Can send spreadsheet (csv) to Paul M. for ingestion.

Nico: Symbiota instance for SCAN is current (genetic links).

Paul: Get copy of current SCAN Symbiota database (symbiota1) and install current trunk on a test VM. Also need to populate the Symbiota usertaxonomy table with one or more records for Nico.

Nico's specific interest would be: Curculionidae.

David: Can also test dump locally before deploying to VM.

TODO: David to send out list of URIs for access to resources.

Paul: Decople tests as two separate problems.

    • Annotation processor as in SPNHC demo for Nico to test.

User Test of annotation processor (Patrick and Nico).

Need: Specify DB on FP3 (no UI access needed). Set of sample records (in specify and mongo). Sample (1) New Determination, (2) New Georeference, and (3) Solve with more data annotations). Workflow able to generate new georeference annotations on mongo data. Need user accounts and database mappings for Patrick and Nico. Nico and Ed can connect into Friday's FP tech call (Hangout) to coordinate transfer of relevant weevil test data (like 20 records).

ASUHIC images: http://symbiota1.acis.ufl.edu/scan/portal/imagelib/photographers.php?collid=1

    • Production node deployment target date 2013 July 31.
  1. FP Node
  2. Supporting knowlege infrastructure (Fedora, Mongo, Fuseki).
  3. Harvest from Symbiota into Mongo/Fuseki (taxon tree into Fuseki with reasoning, need to add user interests, and occurrence records into mongo).
  4. ? AnnotationProcessor
  5. Update configuration/rules/harvest to use current OA/OAD/dwcFP.
  6. Configure SCAN symbiota to use this Node.

4 VMs - symbiota1, fp1, fp2, fp3. Use two for production (SCAN and NEVP) and two for development/testing.

Paul: annotation ingest into Symbiota?

Maureen: Driver for Symbiota would be easy.

Nico: Tradeoff: want SCAN folks working, but not on short term solution?

Nico: Primarly ASU as source for suitable images at this point, probably several months before sufficient pool is available. http://symbiota1.acis.ufl.edu/scan/portal/imagelib/photographers.php

Are there SCAN images present in MorphBank? Nico will figure this out..

Jim: MCZ images there yet?

Nico: Not yet:

Paul: Adding MCZ images in process.

  • Kepler
    • Taxon name cleaning (and SCAN?)

Tianhong: Looking at GBIF service and why results didn't come back.

Paul: We should refactor the GBIF service code to use the new API.

TODO: Paul to send Tianhong links to the new GBIF api, documentation, bits in Kepler code, etc.

  • MCZbase Driver

Maureen: Still haven't gotten the database setup from Brendan.

Maureen: General impression (as with Symbiota), would like to hijack form processing system (also plausible approach for web based specify).

  • dwcFP and DarwinCore RDF guide - need to provide immediate feedback.

Paul: Distinct relationships to dwcFP.

David: Concur, also wouldn't be difficult to reconfigure to use their proposals in many places, lots of similarities. Non-Tech

  • Third Project Programmer, Burndown.

Brief update.

  • Recent Contacts

Bob has note from Berlin on discussions there.

Contact from Paddy, Paul needs to follow up.

  • Collaborations
    • Specify/Symbiota

Maureen: For friday, examining dependencies related to Kepler deployment - making development and deployment easier.

Discussion of documentation:

Plan: Maintain javadocs as current (with links out and automated documentation build). Start writing some other documentation in private wiki. Write deployment documentation as deployments occur (write in late July for SCAN components, write in late August for NEVP deployments).

Automated builds/hudson, etc running into permgen issues, probably non-trivial to address at this point. Could probably build javadocs on regular basis. Key issues are in libraries used repeatedly by different classes (in particular xml related libraries). Using JavaEE may simplify, but refactoring is definitely needed (particularly classes that do many things - split EJBs into multiple buisness logic components, can use other JavaEE mechanisms, e.g. could use injected resources for knowlege stores (which don't retain convesational state)).