2014Jul30
Etherpad for meeting notes: http://firuta.huh.harvard.edu:9000/FP-2014Jul30
Agenda
Non-Tech
- Publications
- Kurator
- James: TDWG Symposium
- InvertEBase
- Possible firuta server move.
Tech
- QC for SCAN
- Updates to Occurrence and Taxon harvests, new data, missing collection code from some data.
- Run on full NAU dataset (using entomologists list from solr), send report to Neil
- Tianhong: Status of workflow and this run.
- Run on MCZ SCAN data, Comments back from Linda and Brendan.
- David: Alternative report just listing actionable items.
- QC work to do
- Tianhong: Preparation of jar for workflow runnable by Bertram
- Adding agent authority file to Symbiota - harvest to solr index - use in actor.
- DarwinCore issues 204-226 under discussion: https://code.google.com/p/darwincore/issues/list?sort=-id
- Status of going live with Morphbank integration
- iDigBio croudsourcing deployment
- David: Update on current Status of FP2 and SCAN
- Metrics for SCAN: http://symbiota2.acis.ufl.edu/symbiota/scan/scan_reports.html
- Updating Roadmap
- Upcoming work
- NEVP
- InvertEBase
Reports
- Paul
- Forwarded link to MCZ SCAN QC report to Linda, who forwarded to Brendan and Michelle. Substantive feedback from Brendan. Summarized in response to Brendan + Tianhong and David.
- Reviewed SCAN QC reports for NAU and MCZ, sent some feedback to Tianhong and David.
- Jim
- No further news from NSF (or anyone else) regarding Kurator or InvertEBase.
Notes
FilteredPush Team Meeting 2014 July 30
Present: Bertram, Bob, James, Paul, David
Non-Tech
- Publications
Bertram: Tianhong is working on an xpansion of IDDC abstract. For Tianhong we should shoot for a computer science publication.
Bob: We've talked about the architecture a lot at meetings but not published about it yet.
Bertram: Ecoinformatics sensible venue?
Bob: Kansas journal?, probably not as desirable as other venues.
Bob: How far you can get with configuration instead of code is one of our research goals, that would be a good target.
Bertram: This would fit well with Tianhong's work on workflows. => Tianhong: let me know if this isn't clear! I can explain in a meeting..TS: OK
Bob: How much code can you avoid writing - not a new idea, but new to the biodiversity informatics community.
James: Paul and I should follow up on Chuck's work on FP-DataEntry.
James: Sometime a little later (after CS and domain) we should do something in very high impact publication to call attention to what we've accomplished.
Bertram: Add a user "success story" (user satistfaction) a good topic as well.
James: We should collect this set, figure out who is working on what and move forward.
Paul: Three to get moving forward: (1) Tianhong's work. (2) Semantics, rules, configuration - limits we encountered in configurable system. (3) FP-DataEntry and botanical duplicates.
- Kurator
Jim reports no updates yet.
Bertram: No news yet.
James: Time to start a separate Kurator call.
Bob: Dima has new funding for GNI.
James: Should be able to provide some good services for Kurator.
Paul: One specific we should look at is out use cases of validating name strings against nomenclatural acts for collections and clustering them into currently accepted names for researchers.
- IPNI: nomenclatural
- IF: Primarily nomenclatural
- Others: usually: some mix of nomenclature and taxonomic name services
- James: TDWG Symposium
James: No new information. Have a brazilan who would like to contribute to symposium, seems like a good match.
James: Call is out for TDWG abstracts. Deadline Sept 25. http://www.tdwg.org/conference2014/
- InvertEBase
Jim reports no updates yet.
- Possible firuta server move.
Paul: Probably second week in august, nothing firm yet.
Tech
- QC for SCAN
- Updates to Occurrence and Taxon harvests, new data, missing collection code from some data.
David: Running the latest version of the harvest on Maureen's workstation - appears to be doing an inital harvest again instead of an incremental update. Also need to investigate missing collection code from the MCZ records.
Paul: Then will have to set the harvester on FP2 and FP3 to automate harvests.
David: OAI provider is deployed on symbiota4. OAI harvester (needs to be deployed on FP2/3 outputs JSON), needs script to load data into targets.
- Run on full NAU dataset (using entomologists list from solr), send report to Neil
- Tianhong: Status of workflow and this run.
- Run on full NAU dataset (using entomologists list from solr), send report to Neil
Tianhong: Substantive progress in updating the workflow, running into some issues.
Paul: Status on hitting solr index?
Tianhong: Having problems querying the url for solr from the workflow - issue may involve # in the uri.
Tianhong: cannot access: http://fp2.acis.ufl.edu:8983/solr/ento-bios, no redirection but I can access http://fp2.acis.ufl.edu:8983/solr/#/ento-bios
Bertram: "http://fp2.acis.ufl.edu:8983/solr/#/" will work for me
Paul: We can access http://fp2.acis.ufl.edu:8983/solr/#/ento-bios Requesting: http://fp2.acis.ufl.edu:8983/solr redirects to http://fp2.acis.ufl.edu:8983/solr/#/ then picking the core selector ent-bios goes to http://fp2.acis.ufl.edu:8983/solr/#/ento-bios Likewise, http://fp2.acis.ufl.edu:8983/solr/ento-bios gets a 404 error. However: http://fp2.acis.ufl.edu:8983/solr/ento-bios/select/?indent=on&q=namePre:%22W.%20M%3E%20.,%20Wheeler%22~4&fl=*,score works from here without a #. (while adding a pound sign to this uri looks like it times out.
Tianhong: SolrJ library isn't working with this URI.
Paul: http://fp2.acis.ufl.edu:8983/solr/ento-bios/query returns a JSON document, while http://fp2.acis.ufl.edu:8983/solr/#/ento-bios/query produces the web application.
David: URL without the poundsign is for the webservice; with it is for the control panel for human interaction on the web. Thus for the rest service (which solrj should be invoking), the pound sign should be ommitted.
Paul: Tianhong, can you reach this URI and get a JSON document? http://fp2.acis.ufl.edu:8983/solr/ento-bios/select?q=namePre%3A%22W.+Wheeler%22&wt=json&indent=true
Tianhong: Can get a response, but it contains no records:
David needs the ~3 parameter.
Tianhong: That works.
- Run on MCZ SCAN data, Comments back from Linda and Brendan.
- David: Alternative report just listing actionable items.
- Run on MCZ SCAN data, Comments back from Linda and Brendan.
Bob: Splitting out things that haven't been acted on yet.
David: Parameterizing the query that builds the spreadsheet to include/exclude based on the QC assertions.
Bob: If straightforward, add a button to the spreadsheet (to hide/show) allready acted upon records.
Bertram: need to leave, but would like to learn more about that feedback / report stuff.. (=> Tianhong, please follow up) TS: OK
Paul: Continue from here in Tech call tomorrow.
- QC work to do
- Tianhong: Preparation of jar for workflow runnable by Bertram
- Adding agent authority file to Symbiota - harvest to solr index - use in actor.
- DarwinCore issues 204-226 under discussion: https://code.google.com/p/darwincore/issues/list?sort=-id
- Status of going live with Morphbank integration
David: Working on the tomcat client helper to improve it to hand off to Michael.
- iDigBio croudsourcing deployment
David: Nothing further here yet, waiting on tomcat update to client helper.
- David: Update on current Status of FP2 and SCAN
- Metrics for SCAN http://symbiota2.acis.ufl.edu/symbiota/scan/scan_reports.html
David: Have this report up and sent link to Neil and Ed for feedback (on content and where to link). Seeing some schema/code incompatibilities in SCAN deployment vs current symbiota version, may not be fully up to date.
- Updating Roadmap
- Upcoming work
- NEVP
- InvertEBase
For Tech Call:
- Preparation of Akka workflow Jar.
- Date validation actor accessing solr REST service on FP2.
- Deeper issues in workflow refactoring.
- Tianhong's expansion of IDCC abstract.