2014Apr23

From FilteredPush
Jump to: navigation, search


Etherpad for meeting notes: http://firuta.huh.harvard.edu:9000/FP-2014Apr23

Agenda

Non-Tech

  • UC Davis Burndown
  • James: TDWG session
  • SPNHC (April 25/May 5)
    • Abstract Preparation
  • InvertEBase
  • iDigBio - Integration visit from Greg R. (Apr 29)

Tech

  • Report from Thursday call
  • Nodes
    • Report David: Status of deployments on FP2 and FP3.
    • Report Maureen: Status of ingests (taxon/occurrence) on FP2 and FP3
    • Report David: Morphbank integration status.
  • FP-DataEntry
    • Report Chuck: Duplicate detection integration into Yale data entry application
    • Report Chuck: Duplicate detection integration into Specify workbench/Dina-Specify
  • Discussion: Maureen: Should we have a feature for deleting annotations?
  • Driver
    • Report Maureen: Status of driver - current annotation processor integration.
  • SCAN
    • Report David/Tianhong: Akka integration in FP2
    • Test of Akka workflow with SCAN data.
    • Query for harvested data and analysis results.
    • Display of annotation on interests.
  • NEVP
    • Report David: Progress on updating deployment.
    • Akka Workflow for NEVP
  • Analysis
    • Tianhong: Progress on cleaning data with data.
    • Report Bob: Progress on Duplicate Finding data mining.
  • SemanticMediaWiki as FP Client, review of SMW use cases.
  • For Thursday:

Reports

  • Paul
    • Got first draft of abstract for SPNHC DemoCamp (FP-DataEntry) circulated.
  • Chuck
    • Put an end-to-end Selenium test in place. (Selenium was already being used, but for checking the results of a qUnit suite.)
    • Helpful error message if plug-in is blocked as mixed content. (Instead of failing silently.)
    • New bookmarklet that looks at the inputs on a page and outputs the bash script that you can use to set up a demo server targeting that page.
    • UI for closing the plugin divs.
    • CSS reset so the plugin looks the same across more sites.
  • Jim
    • Still no request from Petra (Field Museum) that we upload FP budget for InvertEBase proposal to NSF.
    • Still no word from NSF regarding our request for a second NCE of the FP grant. Grants people at Harvard are going to contact relevant program officer for an update.

Notes

Non-Tech

  • UC Davis Burndown
  • James: TDWG session
  • SPNHC (April 25/May 5)
    • Abstract Preparation

Paul: Have circulated the democamp abstract for comments.

  • InvertEBase

Jim: Nothing new.

  • iDigBio - Integration visit from Greg R. (Apr 29)

Jim: Meet with him? Put 3PM on the schedule for the 29th. Paul: Documentation? Todo: Update client integration documentation. Bob: And send Greg (and the rest of us) a link to it. Also germane to what Joel and I will be doing three weeks later. Tech

  • Report from Thursday call

Maureen: Discussed Chuck's new changes to the solr schema and how they might work with analysis. Tianhong gave Chuck a copy of his use cases (the field names he will need to search on). Also touched on maven build issues.

  • Nodes
    • Report David: Status of deployments on FP2 and FP3.

David: Node running on both FP2 and FP3. Symbiota 4 scan pointing at FP2, and NEVP ready to turn on to point at FP3. Node, fedora, mulgara, mongo. Annotations collected to date from SCAN in embedded store - need to go into fedora/mulgara. Some test annotations in there as well though. Not deployed yet is (1) Akka (2) solr indexd store, and (3) the FP-DataEntry plugin. Paul: Much effort to do a camel route that does a load of annotation documents from the filesystem? David: No. David: Have some updates to the client helper from the morphbank work to deploy on symbiota 4, will do and then turn on cnh/NEVP. Paul: Let's coordinate the NEVP turnon with Patrick. Chuck: Is there a larger build script that assembles the full deployment, or is this done by hand? David: Currently by hand.

    • Report Maureen: Status of ingests (taxon/occurrence) on FP2 and FP3

Maureen: Harvested files for taxon trees (2 SCAN, 1 NEVP) and occurrences (1 SCAN, 1 NEVP) as files on FP2 and FP3, ready to load into data stores. Maureen: Harvest update mechanism working on test. Before production: Need to (1) make sure same schema is being used on production and test. (2) Confirm with tianhong that data is suitable for analysis. Maureen: Taxon harvester needs to switch from using very slow view.

    • Report David: Morphbank integration status.

David: Got insert determination working in morphbank with the client helper on a development machine. Have a little bit of logic to add to the morphbank form processor. Close to ready to deploy to production. Found a few possible bugs related to PHP version in getting morphbank working. David: Need to document the client helper web service. Post JSON and queries. Bob: In mediawiki can probably write php functions to wrap client helper, or could do lighter weight invocation of the client helper services. Choice is filling mediawiki templates or needs some backing php code, would rather avoid the latter if possible - so that any arbitrary semantic mediawiki instance could use the client helper through use of templates. Paul: How to deploy client helper. How to configure client helper. How to use client helper. David: Have first two current (client helper+access point deployment, and client helper configuration), need to update third. Will point Bob at the documents.

  • FP-DataEntry
    • Report Chuck: Duplicate detection integration into Yale data entry application

Chuck: Have mockup ready to go, needs dataset in solr to connect to it. Not a deployed demo yet. This week did more testing and hardening of the platform (given a new site, bash script to set up the server). Blocking issue is having the NEVP data in solr.

    • Report Chuck: Duplicate detection integration into Specify workbench/Dina-Specify

Chuck: Web application, demo has working integration to one publication form. Haven't looked at workbench yet - might be better to have someone more experienced with specify tackle that - but all depend on having either GBIF's service with collector number or the NEVP data to work with.

  • Discussion: Maureen: Should we have a feature for deleting annotations?

Maureen: Maureen thinks so. How about supressing them. Bob: put them into their own named graph "junk" Paul: Two stores - document store and the triple store. Plausible to remove from fedora, not easy from the triplestore. Bob: Some OA folks are putting each annotation in a separate named graph, does make query/comparison accross annotations difficult. Bob: retaining provenance is important for our use cases. Maureen: We should consider the triple store as an index that we can remove and reindex at any time - with the document store as the authoriative record - filter from document store and rebuild. Bob: What needs documenting is that there are non-annotation things created with the annotation generator, and they do end up in the docuemnt store and triple store. And, the annotation processor needs them!

  • Driver
    • Report Maureen: Status of driver - current annotation processor integration.

Maureen: Working on getting annotation processor to function with just a stub driver (comple, handle workflow for testing). Encoutering and resolving bugs along the way. Code is not pretty and there are still bugs, functional for testing at this point. Need a review of the requirements for what it needs to do. Would be good to rip out one of RichFaces/PrimeFaces. Will current database backed implementation scale to expected annotation load? Paul: Lots there. Let's focus on getting the harvest and solr index up to support Chuck and Tianhong, then having things to show there we can realocate effort onto annotation processor and driver.

  • SCAN
    • Report David/Tianhong: Akka integration in FP2
    • Test of Akka workflow with SCAN data.

Tianhong: 1. Heard back from Markus, GBIF checklist bank is under development, now v0.9, will switch to v1.0 late May. Chuck: This would be awesome, as 1.0 should have collector numbers. 2. solved GeoLocate service issue. 3. what do we do about the lists of scientist names? For discussion tomorrow. Can harvest into a local store and use that for cleaning, but has a maintinance problem of updating.

    • Query for harvested data and analysis results.
    • Display of annotation on interests.
  • NEVP
    • Report David: Progress on updating deployment.
    • Akka Workflow for NEVP
  • Analysis
    • Tianhong: Progress on cleaning data with data.
    • Report Bob: Progress on Duplicate Finding data mining.
  • SemanticMediaWiki as FP Client, review of SMW use cases.
  • For Thursday:
    • Handling unredacted data and redacted data in solr.
    • Let's get a real test environment on fp1 so that Maureen isn't nervous about putting the harvester in production on symbiota1 (4?)
    • Do we need a plan for backup and restore of data aside from whatever iDigBio is providing for us?
    • Do we have Icinga on symbiota prod?
    • Tianhong has scientist data and has it available as a service, is that the direction we want to go? might be maintenance issues.

(** Bob: Maybe something about https://thepund.it)

    • deleting annotations: need a process to flag annotations as suppressed for indexing, and for dumping the doc store and indexing in mulgara