2014Jan15

From Filtered Push Wiki
Jump to: navigation, search


Etherpad for meeting notes: http://firuta.huh.harvard.edu:9000/FP-2014Jan15

Agenda

  • Analysis
    • James: Kurator
    • Summary: Current state of Kepler/Akka work.
    • Summary Bob: Duplicate Finding data mining
    • Summary Chuck: Duplicate detection UI.
    • Discussion: Duplicate detection
  • Driver
    • Summary Maureen: Development status
    • Summary Paul: Alternate NEVP->Specify-HUH ingest tool.
  • SCAN TCN Support
    • Summary Paul/David: Work for SCAN


Non-Tech

  • Davis burndown rate.
  • James: TDWG session

Reports

Notes

Present: Paul, Bob, Maureen, Chuck, Tianhong, Jim, David


  • Analysis
    • James: Kurator

James: ABI panels were delayed by shutdown. Targeting notice of clear declines as soon as possible. May not notify others until July.

    • Summary: Current state of Kepler/Akka work.

Tianhong: Working on paper.

    • Summary Bob: Duplicate Finding data mining

Bob: Working on using Mahout to find clusters in data. Complex bit has been developing the vectorization of strings of various sorts, and in choosing parameters for clustering.

    • Summary Chuck: Duplicate detection UI.

Chuck: Work on build process to build jar that contains everything and runs from command line. Working on heuristics of name extraction, concern with distances between single people and lists of people. Solr index for all fungi from GBIF, runs in around an hour+. Seeing lots of character encoding problems in GBIF data - often unicode multiple bytes getting interpreted as single byte encoding.

    • Discussion: Duplicate detection

For Friday.

  • Driver
    • Summary Maureen: Development status

Maureen: Making progress. Need to review code and figure out how to split up remaining work. For Friday: Review of how to split up this work.

    • Summary Paul: Alternate NEVP->Specify-HUH ingest tool.

Paul: Quick summary, we can look at code on friday.

  • SCAN TCN Support
    • Summary Paul/David: Work for SCAN

Paul: David and I worked all week with Nico and Ed, producing tools in Symbiota to help specialists identify inadequately identified material.

Non tech

  • Davis Burndown rate

Waiting on Bertram.

  • James TDWG session

James: Symposium at TDWG last year coordinated from BBG on workflows. TDWG has called for symposia, suggesting that we organize one on workflows again, Anton suggests including registries (for services?).

  • Schedule

Jim: This slot looks OK for spring.

Tianhong: Bertram has class this week, not next week.

Paul: We'll need to confirm spring schedule with Bertram.

For Friday:

  • Ed will join the call - revewing status of blocking issues from SCAN Hackathon.
  • milestones, roadmap
  • text mining discussion, how much is done by Lucene, how much do we need to add; how to integrate mysql, lucene, and mahout, or whether to go with only one approach
  • reassessing deployment; how can we get to a point where every developer in the project can get up and running in under fifteen minutes
  • driver code review
  • quck look at Specify-HUH NEVP ingest code.