2014Jul30

From FilteredPush
Jump to: navigation, search


Etherpad for meeting notes: http://firuta.huh.harvard.edu:9000/FP-2014Jul30

Agenda

Non-Tech

  • Publications
  • Kurator
  • James: TDWG Symposium
  • InvertEBase
  • Possible firuta server move.

Tech

  • QC for SCAN
    • Updates to Occurrence and Taxon harvests, new data, missing collection code from some data.
    • Run on full NAU dataset (using entomologists list from solr), send report to Neil
      • Tianhong: Status of workflow and this run.
    • Run on MCZ SCAN data, Comments back from Linda and Brendan.
      • David: Alternative report just listing actionable items.
  • QC work to do
    • Tianhong: Preparation of jar for workflow runnable by Bertram
    • Adding agent authority file to Symbiota - harvest to solr index - use in actor.
  • DarwinCore issues 204-226 under discussion: https://code.google.com/p/darwincore/issues/list?sort=-id
  • Status of going live with Morphbank integration
  • iDigBio croudsourcing deployment
  • David: Update on current Status of FP2 and SCAN
  • Metrics for SCAN: http://symbiota2.acis.ufl.edu/symbiota/scan/scan_reports.html
  • Updating Roadmap
  • Upcoming work
    • NEVP
    • InvertEBase

Reports

  • Paul
    • Forwarded link to MCZ SCAN QC report to Linda, who forwarded to Brendan and Michelle. Substantive feedback from Brendan. Summarized in response to Brendan + Tianhong and David.
    • Reviewed SCAN QC reports for NAU and MCZ, sent some feedback to Tianhong and David.
  • Jim
    • No further news from NSF (or anyone else) regarding Kurator or InvertEBase.

Notes

FilteredPush Team Meeting 2014 July 30

Present: Bertram, Bob, James, Paul, David

Non-Tech

  • Publications

Bertram: Tianhong is working on an xpansion of IDDC abstract. For Tianhong we should shoot for a computer science publication.

Bob: We've talked about the architecture a lot at meetings but not published about it yet.

Bertram: Ecoinformatics sensible venue?

Bob: Kansas journal?, probably not as desirable as other venues.

Bob: How far you can get with configuration instead of code is one of our research goals, that would be a good target.

Bertram: This would fit well with Tianhong's work on workflows. => Tianhong: let me know if this isn't clear! I can explain in a meeting..TS: OK

Bob: How much code can you avoid writing - not a new idea, but new to the biodiversity informatics community.

James: Paul and I should follow up on Chuck's work on FP-DataEntry.

James: Sometime a little later (after CS and domain) we should do something in very high impact publication to call attention to what we've accomplished.

Bertram: Add a user "success story" (user satistfaction) a good topic as well.

James: We should collect this set, figure out who is working on what and move forward.

Paul: Three to get moving forward: (1) Tianhong's work. (2) Semantics, rules, configuration - limits we encountered in configurable system. (3) FP-DataEntry and botanical duplicates.

  • Kurator

Jim reports no updates yet.

Bertram: No news yet.

James: Time to start a separate Kurator call.

Bob: Dima has new funding for GNI.

James: Should be able to provide some good services for Kurator.

Paul: One specific we should look at is out use cases of validating name strings against nomenclatural acts for collections and clustering them into currently accepted names for researchers.

  1. IPNI: nomenclatural
  2. IF: Primarily nomenclatural
  3. Others: usually: some mix of nomenclature and taxonomic name services
  • James: TDWG Symposium

James: No new information. Have a brazilan who would like to contribute to symposium, seems like a good match.

James: Call is out for TDWG abstracts. Deadline Sept 25. http://www.tdwg.org/conference2014/

  • InvertEBase

Jim reports no updates yet.

  • Possible firuta server move.

Paul: Probably second week in august, nothing firm yet.

Tech

  • QC for SCAN
    • Updates to Occurrence and Taxon harvests, new data, missing collection code from some data.

David: Running the latest version of the harvest on Maureen's workstation - appears to be doing an inital harvest again instead of an incremental update. Also need to investigate missing collection code from the MCZ records.

Paul: Then will have to set the harvester on FP2 and FP3 to automate harvests.

David: OAI provider is deployed on symbiota4. OAI harvester (needs to be deployed on FP2/3 outputs JSON), needs script to load data into targets.

    • Run on full NAU dataset (using entomologists list from solr), send report to Neil
      • Tianhong: Status of workflow and this run.

Tianhong: Substantive progress in updating the workflow, running into some issues.

Paul: Status on hitting solr index?

Tianhong: Having problems querying the url for solr from the workflow - issue may involve # in the uri.


Tianhong: cannot access: http://fp2.acis.ufl.edu:8983/solr/ento-bios, no redirection but I can access http://fp2.acis.ufl.edu:8983/solr/#/ento-bios

Bertram: "http://fp2.acis.ufl.edu:8983/solr/#/" will work for me

Paul: We can access http://fp2.acis.ufl.edu:8983/solr/#/ento-bios Requesting: http://fp2.acis.ufl.edu:8983/solr redirects to http://fp2.acis.ufl.edu:8983/solr/#/ then picking the core selector ent-bios goes to http://fp2.acis.ufl.edu:8983/solr/#/ento-bios Likewise, http://fp2.acis.ufl.edu:8983/solr/ento-bios gets a 404 error. However: http://fp2.acis.ufl.edu:8983/solr/ento-bios/select/?indent=on&q=namePre:%22W.%20M%3E%20.,%20Wheeler%22~4&fl=*,score works from here without a #. (while adding a pound sign to this uri looks like it times out.

Tianhong: SolrJ library isn't working with this URI.

Paul: http://fp2.acis.ufl.edu:8983/solr/ento-bios/query returns a JSON document, while http://fp2.acis.ufl.edu:8983/solr/#/ento-bios/query produces the web application.

David: URL without the poundsign is for the webservice; with it is for the control panel for human interaction on the web. Thus for the rest service (which solrj should be invoking), the pound sign should be ommitted.

Paul: Tianhong, can you reach this URI and get a JSON document? http://fp2.acis.ufl.edu:8983/solr/ento-bios/select?q=namePre%3A%22W.+Wheeler%22&wt=json&indent=true

Tianhong: Can get a response, but it contains no records:

David needs the ~3 parameter.

Paul: http://fp2.acis.ufl.edu:8983/solr/ento-bios/select?q=namePre%3A%22W.+Wheeler%22~3%0A&wt=json&indent=true

Tianhong: That works.

    • Run on MCZ SCAN data, Comments back from Linda and Brendan.
      • David: Alternative report just listing actionable items.

Bob: Splitting out things that haven't been acted on yet.

David: Parameterizing the query that builds the spreadsheet to include/exclude based on the QC assertions.

Bob: If straightforward, add a button to the spreadsheet (to hide/show) allready acted upon records.

Bertram: need to leave, but would like to learn more about that feedback / report stuff.. (=> Tianhong, please follow up) TS: OK

Paul: Continue from here in Tech call tomorrow.

  • QC work to do
    • Tianhong: Preparation of jar for workflow runnable by Bertram
    • Adding agent authority file to Symbiota - harvest to solr index - use in actor.
  • DarwinCore issues 204-226 under discussion: https://code.google.com/p/darwincore/issues/list?sort=-id
  • Status of going live with Morphbank integration

David: Working on the tomcat client helper to improve it to hand off to Michael.

  • iDigBio croudsourcing deployment

David: Nothing further here yet, waiting on tomcat update to client helper.

David: Have this report up and sent link to Neil and Ed for feedback (on content and where to link). Seeing some schema/code incompatibilities in SCAN deployment vs current symbiota version, may not be fully up to date.

  • Updating Roadmap
  • Upcoming work
    • NEVP
    • InvertEBase

For Tech Call:

  1. Preparation of Akka workflow Jar.
  2. Date validation actor accessing solr REST service on FP2.
  3. Deeper issues in workflow refactoring.
  4. Tianhong's expansion of IDCC abstract.