2014Mar19

From FilteredPush
Jump to: navigation, search


Etherpad for meeting notes: http://firuta.huh.harvard.edu:9000/FP-2014Mar19

Agenda

Non-Tech

  • Davis: Need to check for any billing held up by NCE.
  • James: TDWG session
  • James: Progress on Finishing the SemanticMediaWiki as FP client deliverable.
  • James: SPNHC
  • InvertENet

Tech

  • Report from Friday call
  • SCAN
    • Report David: Progress on updating deployment.
    • Triggering Analysis.
    • Query for harvested data and analysis results.
    • UI for annotating annotations.
    • Display of annotation on interests.
  • Analysis
    • Next target for Tianhong
    • Report Bob: Progress on Duplicate Finding data mining.
    • Report Chuck: Duplicate detection UI.
  • Nodes
    • Report Maureen: Ingest progress.
    • Report David: Morphbank status.
  • Driver
    • Report David: Status of integration of last working driver version.
    • Report Maureen: Status of new driver approach.

Reports

  • Paul
    • Put up feed of NEVP image data for croudsourcing, haven't heard back if this meets requirements or not yet.
  • Chuck
    • Make plugin bookmarklet-able: You no longer require access to the source code of your data-entry application.
    • Simplified xml config: no special all-caps to lower-case mapping / concise data structures in xml / very little redundancy / tighter validation.
    • demo.sh is parameterized, could run the public demo with it, instead of editing source.
    • In the works: deployable war files.

Notes

FilteredPush Team Meeting 2014 Mar 19

Present: Maureen, Chuck, Bertram, James, Tianhong, Bob, Paul, David

Agenda Non-Tech

  • Davis: Need to check for any billing held up by NCE.

Paul: Will get our finanical folks to talk with Davis' to check.

  • James: TDWG session

James: No updates yet.

  • James: Progress on Finishing the SemanticMediaWiki as FP client deliverable.

James: Joel has written to Bob.

Bob: In my court - looking at preferred dates. Perhaps last week of April/first week of May.

James: Need decision soon to get paperwork going.

  • James: SPNHC

James: abstract and demo camp call out, ideas (due April 25th)? Possibilities: NEVP data flow - talk; Chuck's FP-DataEntry tool for duplicate finding - Demo camp. More functional integration with Symbiota (SCAN and NEVP) - Demo camp.

  • James: Update on Source Materials workshop and annotation= OA/FP!

James: Last week at Yale at iDigBio workshop on Source Materials (fieldnotes, and things not on label(s) of specimen). Use for augmenting data quality. Outputs - virtual expeditions, linked data etc. Collectors, Library/Archivists, Informatics folks all contributing. Annotation was a heavily discussed subject - both documents and data.

  • James: Workshop on requirements for biodiversity information aggregators

James: Next week will be at workshop by Greg for iDigBio - in concert with conference on digitizing pacific collections. Topics to be discussed include security, semantics, annotation, etc.

Paul: Creating annotations and getting them to interested parties is straightforward - needs good conventions for structure of the content. Crossing the last mile into local databases is non-trivial, except for very narrow cases.

  • InvertENet

Paul: Petra still revising budget, close to target.

Paul: See diagrams, reviewed SCAN diagram, worked on NEVP diagram.

Maureen: Tianhong was thinking of not using statistical methods for outlier detection.

  • SCAN
    • Report David: Progress on updating deployment.

David: Added response annotations to client helper and library, added to symbiota annotation tab. Also added interest tab to profile for annotations.

    • Triggering Analysis.

David: Still thinking of a camel approach with filesystem, haven't tried implementing yet.

    • Query for harvested data and analysis results.

David: Have and FP-message and mongo query to extract the data. Will need to Coordinate with Tianhong about what is going into Mongo.

Paul: Different case from before as analysis run on slices of data in harvest, and queries are run on different slices of the data (e.g. query on a genus.)

David: Combination of query and fetch results, we'll need to do some work, but the infrastructure should all be the same. Also need to settle on return type - json and have application interpret it as we are doing now? Also thinking about pagination - in particular from looking at annotations on taxon interests

    • UI for annotating annotations.

David: In place. Client helper passes back json for the annotation documents - like annotation digests in annotation processor. Have a small php script that transforms these to html and adds form/link elements to respond. Response form processor. Can update on symbiota2 for test before rollout of UI changes.

    • Display of annotation on interests.

David: In Place.

  • Analysis
    • Next target for Tianhong

Paul: Coordinating with David on data harvest triggering analysis decoupled from query by researcher is a good one. New process is (1) data is harvested, (2) quality control analysis runs on slice of data in harvest. Indepedently, researchers ask questions about the data (and the related quality control results), which cover different slices of the data than the slices in harvest. For example, a harvest that covers new ASU material won't correspond with a query on weevil data.

Bob: Also a concern of when the user asks the question - one off, or repeatedly interested in new data in the same query.

Paul: Also continuing on approach to cleaning data with data - quality control of new records based on other records in data set, as we were discussing on Friday.

Tianhong: Two issues: How to do it, and performance. Have added more discussion to the wiki page.

    • Report Bob: Progress on Duplicate Finding data mining.

Bob: Have written vectorizer for exsicatti data, need to examine and interpret data. In looking at three data sets, to put into data mining form, each data model requires a different set of coding. Next need to write cluster interpreters.

    • Report Chuck: Duplicate detection UI.

Chuck: Configuration file simplified. Have added option of javascript in bookmark to bring up iframe to lower barrier to entry. Working on getting applications deployable as a war file. Also set up a demonstration on Firuta using state quarters http://firuta.huh.harvard.edu:8086/.

  • Nodes
    • Report Maureen: Ingest progress.

Maureen: Working on getting setup running on workstation.

    • Report David: Morphbank status.

David: Still pending dump.

David: Still need Nico to upload a SCAN image.

Paul: Email me to coordinate.

  • Driver
    • Report David: Status of integration of last working driver version.
    • Report Maureen: Status of new driver approach.

Maureen: Working on getting setup running on workstation. from there have code to commit on driver stub and refactored driver. For Friday:

  • is it ok or not to have object marshalling/unmarshalling info in the triplestore
  • coordinate launch of analyses, queries to get slices of the data
  • review NEVP diagram