2014Nov04

From FilteredPush
Jump to: navigation, search


Etherpad for meeting notes: http://firuta.huh.harvard.edu:9000/FP-2014Nov04

Agenda

Non-Tech

  • Reports from iDigBioSummit and TDWG 2014 next week.
  • Deployment for iDigBio.

Tech

Quick touching base

  • Kurator Integration
    • DarwinCore reader (zipped tab delimited input)
    • SVN reorganization/cleanup
  • QC for SCAN
    • Feedback
  • QC work
    • Adding agent authority file to Symbiota - harvest to solr index - use in actor.
  • Firuta deployed apps.
  • Deployments
    • Access point updates
    • Bringing Annotation Processor up-to-date
    • Deploy and re-run harvest of occurrence records
    • Status of fp2 and fp3
  • For Thursday: Looking at Tim's command line packaging for configurable workflows.

Reports

  • Paul
    • Presented on PreCapture app and FilteredPush workflows in NEVP and SCAN at TDWG 2014
    • Annotations Interest Group at TDWG 2014, interest from Names community, interest from BHL. Drafting charters for two TDWG task groups: one to draft an applicability statement on OA and a biodiversity data use case/competency question library; the other to conduct evangelism activities and draft a document on extensions to OA needed for annotating biodiversity data.
    • Added support for collector number patterns to agents authority file for Symbiota.

Notes

FilteredPush Team Meeting 2014 Nov 04

Present: David, Paul, James, Tianhong.

Non-Tech

  • Reports from iDigBioSummit and TDWG 2014 next week.
  • Deployment for iDigBio.

Should be straightforward. Likely things to need:

  1. Kurator actor to load data from iDigBio API
  2. Node with Fedora+Mulgara.
  3. Kurator running against iDigBio data store rather than MongoDB (but store is JSON, so difference should be small).
  4. Means of filtering sensitive data from annotation feed out of FP2 and FP3 - taxon names plus sensitive flags on omoccurrences.
  5. Social use of annotations to reflect buisness operations.

David: Alex pointed us at elastic search, much like solr, should be able to get JSON from them pretty easily.

Tech: Quick touching base

  • Kurator Integration
    • DarwinCore reader (zipped tab delimited input)

Likely need for an actor that reads darwincore archive files rather than CSV files as input. Plenty of code available, e.g. https://github.com/gbif/dwca-reader

    • SVN reorganization/cleanup

David: Still a couple small things in FP-Tools. Want to do more work on Driver and harvesting before reorganizing further.

  • QC for SCAN
    • Feedback

David: Did talk with Neil at iDigBio summit, wanted to bring another one of the collections on board. Also would like us to have a monthly call with Ed.

  • QC work
    • Adding agent authority file to Symbiota - harvest to solr index - use in actor.

Paul: Have found three points where change needs to be coordinated with the other developers before we can move into trunk. In progress.

    • dwc:Genus

Paul: dwc:genus noted as not being a parse of dwc:scientificName, but an element of current classification. See: http://rs.tdwg.org/dwc/terms/index.htm#genus Thus:

dwc:scientificName = Probates bicolor L.
dwc:genus = Rubus

Is correct.

Assertion is that what was described as Probates bicolor L. is now being placed in Rubus as Rubus bicolor (L.) Someone.

Tianhong: Can we tell if :

dwc:scientificName = Probates bicolor L.
dwc:genus = Probats 

is an error?

Paul:We could look up 'Probats' and tell that it doesn't exist. We can't assert that Probats <> Probates is an inconsistency.

Proposed term is dwc:genericName to hold the parse Probates out of scientificName (along with a dwc:infragenericEpithet). Will go through issue tracker and adoption process (at least a month, perhaps a year...).

James: We can also look at dwc:Family, and see if the provided dwc:Family contains a dwc:Genus = Probats. Tianhong: only dwc:genus and dwc:subgenus needs to be handled this way

Paul: dwc:specificEpithet and infraspecificEpithet are defined as elements extracted from dwc:scientificName, genus and subgenus are the ones that aren't.

  • Firuta deployed apps.

David: Things deployed, little configuration remains - query needs reworking. Still need current data from morphbank.

David: Archiva has been going down - not clear why, nothing in logs.

Paul: Also seeing sporadic issues with apache/tomcat for IPT.

  • Deployments
    • Status of fp2 and fp3

David: Dealing with change to query on firuta, will then push out to FP2 and FP3, start of FP3 pending that.

    • Bringing Annotation Processor up-to-date

David: Nothing further with that.

    • Deploy and re-run harvest of occurrence records

David: Harvest working (rapidly), did a re-harvest of occurrences from SCAN into mongo on FP2.

  • For Thursday: Looking at Tim's command line packaging for configurable workflows.