2014Mar12

From Filtered Push Wiki
Jump to: navigation, search


Etherpad for meeting notes: http://firuta.huh.harvard.edu:9000/FP-2014Mar12

Agenda

Non-Tech

  • Davis: NCE
  • James: TDWG session
  • James: Progress on Finishing the SemanticMediaWiki as FP client deliverable.
  • InvertENet

Tech

  • NEVP
    • Requirements gathering for Phenology croudsourcing
  • Report from Friday call
  • Analysis
    • Report Bob: Progress on Duplicate Finding data mining.
    • Report Paul/Chuck: Duplicate detection UI.
  • Nodes
    • Report Bob: RDFBean assertions and Annotation Stores.
    • Report Maureen: Ingest progress.
    • Report David: Morphbank status.
  • SCAN
    • Neil looking for brief report and current timelines
  • Driver
    • Report David: Status of integration of last working driver version.
    • Report Maureen: Status of new driver approach.

Reports

  • Paul
    • Went to talk with some Farlow data entry folks to make sure that they are aware of Grab/FP-DataEntry functionality in Rapid, haven't gotten any feedback yet.
    • NEVP ingest into Specify-HUH holding pending some cleanup of collector names.
  • Jim
    • No word from NSF regarding our request for a second NCE.
    • I managed to sneak references to Dou et al. and Morris et al. into a manuscript about educational uses of natural history collections that will be published in BioScience later this year.
    • It looks like the InvertEBase TCN proposal will be funded, but NSF has requested they cut the budget approximately in half (from $3M to $1.5M). Paul has sent a revised budget for the FP component that is about 30% smaller than original request. Start date July, I believe.

FilteredPush Team Meeting 2014 Mar 12 Present: Tianhong, Bertram, Jim, Maureen, David, Paul, Chuck McC., Bob. Chelsea, Lian, Chris. Agenda Non-Tech

  • Davis: NCE

Bertram: All good from what I've heard from this end.

  • James: TDWG session
  • James: Progress on Finishing the SemanticMediaWiki as FP client deliverable.
  • InvertEBase

Jim: NSF looking to eliminate redundancy and remove travel. Paul: Under certain assumptions, can reduce FP component of budget about 30%. Tech

  • NEVP
    • Requirements gathering for Phenology crowdsourcing

Edith Law working with Curio croudsourcing platform, providing to them images of herbarium sheets along with occurrence metadata. Getting the croudsourcing community to code the images as having flowers, having seeds. Working on what the returned codings will be, on the table are simple states: hasFlowers, and counts. Very much experimental. Croudsource platform will have multiple people coding the same sheet.

  • Report from Friday call
  • Analysis
    • Report Bob: Progress on Duplicate Finding data mining.

Bob: Fuzzy database db finished in 5 days with 1 core, not sure if I've got the correct idea of fuzzy. Looking for resources with more cores. Going back to try the other approach which runs in minutes. Chuck: Would reducing the size of the dataset to a set of say 100 help. Bob: That would help if the 100 is known to contain duplicates. Chuck: How about a synthetic data set. Bob: Still need to talk about realistic noise. Algorithm has claim of guarantte of clustering clean structures.

    • Report Paul/Chuck: Duplicate detection UI.

Paul: Have pointed some of the Farlow folks at the interface in Rapid, waiting for for feedback. Chuck: Have put parameters into an xml file (instead of excessive list of command line parameters), along with solar configuration information. Have this working. Working now on productizing and generalizing to any sets of similar data.

Bob: Underlying issue: Using RDF Beans is very profitable, but RDF Beans add RDF along with the Annotation. Allows for easy coupling between generator and processor. Disadvantage is additional assertions on the included ontology classes that get added to the triple store. Logical place to put this would be as part of the Provenance of the Annotation, but unclear how difficult this would be to implement. Bob: Anyone else is welcome to contribute to the wiki page. Questions about clarity are particularly welcome.

    • Report Maureen: Ingest progress.

Maureen: Working on integrating pieces for uploader, building graph of object and fields, adding implied includes, making sure authorization works.

    • Report David: Morphbank status.

David: Still waiting on dump.

    • Report David: Annotation processor.

Maureen: Work on refining driver API. David: Got the annotation processor working in tomcat using the new camel framework, debuging issues in the notifications and register interests pages.

  • SCAN
    • Neil looking for brief report and current timelines.

David: In a position to roll out the node infrastructure. Will need to coordinate with Ed to do the symbiota updates. Should be able to do this by next wednesday. Then can set up annotation processor as we've had sitting on FP-2, working with new infrastucture. Minimal use case of view annotations and respond working there.

  • Driver
    • Report David: Status of integration of last working driver version.

David: Right now using the Stub code for a driver. Have bugs to resolve first.

    • Report Maureen: Status of new driver approach.

Maureen: Have been working on a replacement for the implementation of the Driver API for Specify. . For Friday:

  • an item from Tianhong: ideas of outliers detection
  • quick review of the SCAN diagram and an analysis of the NEVP diagram