2013Jun05

From FilteredPush
Jump to: navigation, search


Etherpad for meeting notes: http://firuta.huh.harvard.edu:9000/FP-2013Jun05

Agenda

  • Change of meeting time effective Sept 4: (12-1 Eastern 9-10 Pacific).
  • Annotation paper
  • Project/Package Refactoring: Developer Documentation
  • Annual NSF project report
  • Progress on SPNHC demonstration
    • Packaging
    • Driver
  • Annotations
    • Progress on rewriting dwcFP, OAD, and example annotations.
    • DwC RDF Guide
  • MCZbase Driver
  • Kepler

Non-Tech

  • Recent Contacts
  • Collaborations
    • Specify/Symbiota
    • SCAN TCN
    • NEVP TCN

For Future meetings

Reports

  • Paul
    • Compiled some homonym examples for Tianhong.
    • With James, discussed scientific name cleaning issues with Tianhong, provided some resources.
    • Further tests on Symbiota XML ingest code, updated, committed changes, and merged xmlingest branch back into trunk. Code is ready for testing by NEVP.

Notes

Present: Jim, Paul, Tianhong, David, James, Bertram, Maureen, Bob.

  • Change of meeting time effective Sept 4: (12-1 Eastern 9-10 Pacific).

Paul: Room reserved for 12-1:30 eastern, starting Sept 4.

  • Annotation paper

Bob: Revision submitted and acknowledged. Repackaged copy is available on sourceforge.

  • Project/Package Refactoring: Developer Documentation

David: Updated page for documentation on sourceforge: http://filteredpush.sourceforge.net/ has navigation links, embeds wiki pages when approprate. FP-Medium deployment documentation updated, needs a test on a clean machine, but should be ready to use. FP-Lite also up to date, as is installation of supporting software. No documentation yet for FP-Analysis, Annotation processor documentaton is out of date. Sparql rules and configuration is out of date. A few other developer documentation pages are out of date from the most recent refactoring.

  • Annual NSF project report

Jim: All in place - Maureen sent out the link last week. Please add text for current work. Please have completed by June 26th. Please send any questions to Jim.

Research.gov may solicit additional classification information from named participants after report goes in.

  • Progress on SPNHC demonstration
    • Packaging

Paul to try install on his laptop using updated developer docs.

David and Tianhong to coordinate deployment of current version and data to FP3 VM (just node, annotation processor to run on laptop).

Also a Symbiota instance on the VM.

Tag in sourceforge, put up EAR for deployment.

David to get copies of the 4 test records to Maureen and Paul.

    • Video

Bob: Stretched out to allow easier understanding?

Paul: Slightly but not a great deal.

Bob: Would like to point to demo at Manchester (June 24th-25th). Would like to craft a wiki page for people who work on annotation systems, but not biology to explain demonstration - explaining what problems are trying to be solved.

    • Driver

Maureen: Working on refactoring driver, does import/export of the DarwinCore types Occurrence, Identification, Taxon, Event, Location. Still needs the integration with the annotation processor.

David: oauth is still part of the documentation.

Maureen: Demonstration using georeference?

David: Yes, same set as video, will send the set. Target showing full demonstration next Wednesday.

  • Annotations
    • Progress on rewriting dwcFP, OAD, and example annotations.

Bob: Paul and Bob should go through the non-manuscript examples and make sure that they are consistent with the manuscript (and have correct namespaces, etc). Very very good to have a sparql endpoint and queries to test examples on. OA Validator is a nice tool, also lists optional things that you have omitted. http://austese.net/lorestore/validate.html

    • DwC RDF Guide

Paul: Draft of DwC RDF guide is up, TDWG TAG looking for feedback. Definitely worth looking at, likely also worth pointing them at dwcFP.

Bob: One recomendation in that doc is a namespace that includes what we are doing with the dwcFP:has_ID.

Paul: Good to tell them that, and then adopt their solution.

Paul: From discussion with David this morning, propose adding dwcFP:localityNumber, then dwcFP:fieldishNumber (needs better name) that generalizes, dwc:recordNumber, dwc:fieldNumber and dwcFP:localityNumber.. Move to discussion with Bob/David/Paul.

  • MCZbase Driver

Maureen: Brendan hasn't gotten the database dump up yet.

Tianhong: Minor improvments to solve issues from last week.

Bertram: More elaboration?

Tianhong, Sven knows the details.

Sven: Comparing results of Akka with Kepler Kuration, they are different (1700 records in test set vary). Partialy from different encoding, now can specify encoding for Akka output (reduces to 700 differences). Issue appears to be with Comad handling of these 700 (dates are null in some, may be datatype conversion issue), Tianhong investigating.

James: Paul, Tianhong, and I talked about how a workflow to evaluate taxon names could work, and what its goals might be. Good to have metadata come in with the input (Kingdom and goal (nomenclatural./taxonomic). Multiple steps:

First: Evaluate consistency/parsing in darwin core terms. Parse if needed. Second: Reach out to external services - validate names against nomenclatural sources - correct/fleshout actors.

Third: if needed, deal with synonyms.

Tianhong to start on first element, then to start working with existing Lei/Paul code for invoking services for second.

Paul: also good reading for us is the GBIF names data document: http://www.gbif.org/orc/?doc_id=2784

Maureen: Important to consider use cases - one off, one name at a time, or on bulk data. Dealing with bulk data, don't want to repeatedly call webservice if we don't have to. May be better to have substantive portions of the data internal.

    • Provenance and rendering

David: Excel spreadsheet export, including styling and second level working.

Todo: Screenshots for Bertram to look at - send to FP list.

Maureen: More detailed specifications in task in Mantis.

Prototype code: https://sourceforge.net/p/filteredpush/svn/652/tree/FP-Network/trunk/src/java/edu/umb/cs/filteredpush/ is probably latest. Not sure if UI is in that directory too.

Maureen: Starting point Getting data into a form (under our control) - launch a faceted query to get suggested values for that field, or for the entire form.

Doesn't address consensus record building, or sets of duplicates.

Implicitly addresses duplicates by showing very similar data. Non-Tech

  • Recent Contacts

Paul: Will get back to them pointing at new developer docs page.

  • Collaborations
    • Specify/Symbiota

Paul: Suggestion on the table: lighweight tool to ingest changesets as annotations from Symbiota into Specify, can work in same way for NEVP new occurrence annotations - synchronize data without network.

Maureen: I like the idea.

    • SCAN TCN
    • NEVP TCN

Paul: Patrick to return from field soon, should start coordinating rollout with him.

David: Do we want to update the rules for NEVP and or SCAN.

Paul: Yes.

Bob: Anna's rules incorporated into validator yet?

David: not yet, Sparql so easy to do, but haven't yet.

    • OA UK Rollout

Bob: Going and delivering our talk. Lutz and Anna will be there. Two days, one is rollout, other is face to face meeting of OA community, agenda includes outreach.

Paul: Time to push forward with applicability statement.

Bob: Should we be paying attention to other science communities with regards to data annotation.

SPNHC demo on agenda for Friday