2014Jul23

From Filtered Push Wiki
Jump to: navigation, search


Etherpad for meeting notes: http://firuta.huh.harvard.edu:9000/FP-2014Jul23

Agenda

Non-Tech

  • Publications
  • Kurator
  • James: TDWG Symposium
  • InvertEBase
  • From Bob: Meeting Time OK?
  • Possible firuta server move.

Tech

  • DarwinCore issues 204-226 under discussion: https://code.google.com/p/darwincore/issues/list?sort=-id
  • Status of going live with Morphbank integration
  • iDigBio croudsourcing deployment
  • David: Status of FP2 and SCAN
  • QC for SCAN
    • Current state of Occurrence and Taxon harvests?
    • Run on full NAU dataset, send report to Neil
    • Run on full MCZ SCAN dataset, send report to Linda Ford
  • QC work to do
    • Revisions/Refactoring of actors
      • Tianhong: What issues are currently known with the actors.
      • Revision targets for Tianhong for August
    • Adding agent authority file to Symbiota - harvest to solr index - use in actor.
  • Metrics for SCAN
  • Updating Roadmap
  • Upcoming work
    • NEVP
    • InvertEBase

Reports

  • Paul
    • Reviewed results of Akka QC process on NAU SCAN data with Tianhong and David.
    • Running some queries in MongoDB on fp2.acis to look at data present there.

Notes

FilteredPush Team Meeting 2014 July 23

Present: Bertram (now from Champaign, IL), Jim, Tianhong, David, Paul. Bob.

  • Publications

Paul: Nothing today.

  • Kurator

Jim: Nothing new yet from NSF.

Bertram: From the program officer, moving through the system.

  • James: TDWG Symposium
  • InvertEBase

Jim: Nothing new yet.

  • From Bob: Meeting Time OK?

Bertram: Time works fine now. Semester hasn't started yet, starts in a couple of weeks.

Jim: I'm very flexible this fall.

  • Possible firuta server move.

Paul: Possible move in second week in August, likely partial-day outage, main effect would be on archiva repository.

Tech

Paul: For us to keep and eye on and comment as needed.

Bob: The usual subjects are trying to converge on BCO.

Paul: Our point of view to contribute to the discussion, both high level BCO concepts and OBO entanglement are undesirable.

  • Status of going live with Morphbank integration

David: Have built a war for client helper that they can deploy in their tomcat container. Have an integration test that they can run. Need to coordinate with them on the deployment on their VM.

Bob: Greg has produced a draft IPT extension for Audbuon Core.

David: Which FP node(s) does morphbank connect to?

Paul: Need to have FP2, nice to have FP3, if we can work out the camel routes.

  • iDigBio crowdsourcing deployment

David: No more contact there, need to talk with michael on their end about that. Targeting discussion for after having client helper running for morphbank.

Paul: Need to remember to keep on the list image delivery from Symbiota.

  • David: Status of FP2 and SCAN

David: Updated deployment on FP2 and updated configuration/client helper on Symbiota4, annotations should now be visible live on Symbiota. Checking that this is working correctly by comparing omoccurdeterminations with annotations in store with tabs on symbiota. Annotation tab on Symbiota 4 appears to not be correctly connected right now (three tabs: user profile taxon interest, collection view of annotations, occurrence annotations tab).

  • QC for SCAN
    • Current state of Occurrence and Taxon harvests?

David: Haven't rerun the harvests, waiting for test set to go out. Harvest is about a month stale. Need to run incremental harvest since then.

    • Run on full NAU dataset, send report to Neil

David; Tianhong has run the analysis, I've generated a spreadsheet updated to the changes discussed last week. Have run into limits in excell for hyperlinks - limited to 64k per worksheet.

Tianhong: Please forward me a copy.

David: Will upload to FP2 (37MB).

    • Run on full MCZ SCAN dataset, send report to Linda Ford

Tianhong: Have run this analysis, is on FP2, Collection name: MCZAll

David: Will need to generate the spreadsheet view of this.

  • QC work to do
    • Revisions/Refactoring of actors
      • Tianhong: What issues are currently known with the actors.

Tianhong:

  1. 3 records missing on the whole NAU dataset (36290 in total), but not MCZ
  2. 10% records run into problem if allowing empty authorship

Tianhong: Some small number of records not getting through the analysis. Probably some rare data cases hitting failure cases in the workflow and not making into the output. Output of actor doesn't have expected result (populating the authorship when blank) in some portion of cases - unclear why at this point. We want to do something in these cases, correct?

Paul: Yes, if we can fill in the authorship we should, if it is ambiguuous, we should flag that.

Paul: issue might be from multiple contradictory authorships on querying services. Tianhong: What if the scientific name is empty and the authorship is populated.

Paul: This is an error case.

Tianhong, what about if the atomic fields are populated?

Paul: We could very reasonably assemble a scientific name, validate it with the authorship and populate the scientificName term.

      • Revision targets for Tianhong for August

Bertram: Split between productizing current Akka packages, and Tianhong's research on automated workflow design library, creating workflows automatically. Make sure the workflows run and fit in the FilteredPush infrastructure, and push the research envelope at the same time.

Bertram: Perhaps discuss in tech call tomorrow.

Tianhong: Similar perspective to Bertram, also would like to make sure that workflow is more deployable.

Paul: Potential deployable artifact for this would be coupling a darwinccore archive reader with the Akka workflow with David's spreadsheet generation code - giving people something they can run on their DwC archive files.

Bertram: Makes a lot of sense. Discuss tomorrow.

Bob: Does any of this depend on the DarwinCore discusion that is on the table.

Paul: Probably later if substantive changes happen, and they get deployed into GBIF's IPT.

    • Adding agent authority file to Symbiota - harvest to solr index - use in actor.

Paul: On my list of things to do in Symbiota. After that can worry about harvest.

David: Have updated the solr schema for the entomologists, seems to be working well for names - using filters and tokenizers, should be working well for Tianhong. We can look at this tomorrow.

  • Metrics for SCAN

David: No progress since last week. Close to done. Targeting end of week.

  • Updating Roadmap
  • Upcoming work
    • NEVP

Paul: Let's get SCAN running solidly first, then move on NEVP node.

David: Will need to look at how to use client helper with endpoints to support SCAN and NEVP on symbiota4. Currently using ports.

    • InvertEBase

Pending hearing word.

  • For Tech call:
  1. Goals for Tianhong for August.
  2. Documenation and deployable workflow artifact.
  3. Evaluate solr indexing for date validation actor.