2013Nov13

From Filtered Push Wiki
Jump to: navigation, search


Etherpad for meeting notes: http://firuta.huh.harvard.edu:9000/FP-2013Nov13

Agenda

  • Summary of Friday Tech Call.
  • Kepler
    • Report: Current state of Kepler work.
  • dwcFP and DarwinCore RDF guide - Need to provide more feedback.
  • Driver
    • Report: Development status
    • Discussion: Resources needed to progress?
  • Analysis
    • Discussion: Supporting repeated QC requests for same records.
    • Report: Progress on implementation of OAI/PMH harvesting through firewalls.
    • Report: State of investigation of duplicates in GBIF cache.
  • SCAN TCN Support
    • Sanity Checking, updates to AnnotationProcessor installation.
    • Need to schedule visit to Arizona.
  • NEVP TCN Support
    • Report: Preparations to update Annotation Processor Deployment at UNH.
    • Report: Testbed UI for data entry for duplicate finding.
    • Annotations in OCR/Croudsourcing pathways
      • Hackathon. Progress on setting up FP-Lite instance and development environment.
  • FP Infrastructure
    • Report: Status of FP Node Refactoring

Non-Tech

  • Need burndown numbers from Kristin

For Future meetings

  • Prospective meetings, development targets.
  • Burndown

Reports

  • Paul
    • Discussed response to use of OA annotations in DarwinCore RDF guide with Bob.
    • Working on NEVP new occurrence annotation ingest into Symbiota. Have the ingest working, currently on image lookup in iPlant and ingest of image metadata into Symbiota.images table.
  • Chuck
    • With the data-entry tool, I've figured out how to get most of the data out of GBIF that I need, with the exception of geography.
    • The UI is now handling the tuple-fields well for lat-long, country-country-code, and depth-elevation...
    • ... but it won't be able to handle the taxonomic tuples, where there are multiple values within the record, and we don't want these values scrambled.
    • Probably shifting gears to help Maureen with Specify testing, and to get a dev environment set up on the laptop.

Notes

FilteredPush Team Meeting 2013 Nov 13 Present: Bertram, James, Tianhong, Paul, Maureen, Jim, David, Chuck, Bob Agenda

  • Summary of Friday Tech Call.

Maureen: Discussed problem of annotations not all getting into triple store from SCAN Symbiota instance. David: Issues was arrays in Symbiota, getting CollectionCode/InstitutionCode values from the wrong places. Fixing this. Adding validation to the annotation generation, provides for list of missing fields for expectation. Date, collectionCode or institutionCode required in new occurrence annotation, not always provided. Bob: Checking OA validity? David: Validation being done for Symbiota at generation time should perhaps be less strict than rules. Bob: Http content negotiation comes to mind - query sets out its expectations, response says it can't but do you want to go ahead anyhow?. Something like that might increase robustness of generators. Discussion: two cases of failures right now, no date available in transcription of existing identifications, and our not finding the collection code from Symbiota. David: In symbiota in the client helper we have a subset of the rules doing some checking prior to invoking the annotation generator, but not logging this on the network side if it fails. Doing this in the annotation generator might be more robust. Bertram: On Friday we also discussed the FloweringTime validation step. One question that emerged was: have we ever used the "flowering status" (or whatever it's called) to validate the collection time (of a plant in flowering stage)? The earlier workflow by Lei suggests: yes. Tianhong's code currently doesn't do it. Maybe it wasn't done before either?

  • Kepler
    • Report: Current state of Kepler work.

Tianhong: Have looked at Lei's code, it doesn't appear to invoke time, just reproductive condition. Bertram: Which is the remote service? James: No remote service yet, just Flora of North America data. We will be able to expand this from the current single volume. Planning this work with Joel. Can provide more data than currently available. Bertram: Is that a dataset we need to reaquire. James: Yes. This is coming out of Hong's parser, just extracting the flowering time from the parse. Did from one volume, need to extract from others. Have on Joel's plate. Tianhong: Do we want to use a remote service? Bertram: We have a dataset, will be expanded in the future, need to obtain from James if we don't have it. Would be good to show as a remote service, but Paul: advantage of remote service is that it can be automatically updated. (Bertram agrees :-) Action: Tianhong to obtain current data set from James. Bob: Use of data for climate change monitoring, is there a cirularity here affecting our QC tests? James: Phenological data coming from the flora is very general, no more specific than month. Paul: time resolution should be still good enough to deal with transposition errors of month and day James: Richard Primak and Chuck Davis (both on NEVP) both have worked with this. Bertram: Tianhong has done updates on CSV reading/writing. Tianhong: Unifying reading/writing actors, not very easy in COMAD, due to expectations of data type. Q: Are you using classes for COMAD data types? David: suggest using jackson (sp?), JSON, ... follow up with Tianhong

  • dwcFP and DarwinCore RDF guide - Need to provide more feedback.

Paul: Bob and I need to comment further on this.

  • Driver
    • Report: Development status

Maureen: Have a plan for adding testing with Chuck. Have been extracting the code from the UI, down to about 14 compilation errors to track down still. Specify Driver providing webservice API as well as java API, extends options for deployments and possibilites for supporting drivers for other systems (e.g., MCZbase/Arctos).

    • Discussion: Resources needed to progress?

Maurren: Chuck helping with testing should get us there. Perhaps be able to show off something on Friday. Put two weeks on the table for UNH.

  • Analysis
    • Discussion: Supporting repeated QC requests for same records.
    • Report: Progress on implementation of OAI/PMH harvesting through firewalls.

David: Haven't yet created a message, will create one and add to message factory.

    • Report: State of investigation of duplicates in GBIF cache.

Chuck: Working though elements and mapping to darwin core, very few remaining questions. Indexing multiple values on a few fields. Bob: There was quite a bit of sympathy for cleaning up the RDF representation of DarwinCore at TDWG. One of the things included was when some of the DarwinCore elements could be URIs instead of text, there should be a way of doing this. Proposed syntax is to use two namespaces, one for URIs and one for text values. There is a set of prefered URIs for countries, could use dwc:uris for these.

  • SCAN TCN Support
    • Sanity Checking, updates to AnnotationProcessor installation.
    • Need to schedule visit to Arizona.
  • NEVP TCN Support
    • Report: Preparations to update Annotation Processor Deployment at UNH.
    • Report: Testbed UI for data entry for duplicate finding.
    • Annotations in OCR/Croudsourcing pathways
      • Hackathon. Progress on setting up FP-Lite instance and development environment.
  • FP Infrastructure
    • Report: Status of FP Node Refactoring

Non-Tech

  • Need burndown numbers from Kristin