2014Sep17

From FilteredPush
Jump to: navigation, search


Etherpad for meeting notes: http://firuta.huh.harvard.edu:9000/FP-2014Sep17

Agenda

Non-Tech

  • We need to schedule a new meeting date/time.
  • Publications
    • Progress
      • Tianhong: Workflows
      • Paul: Collection Objects
      • Bob: Refactoring Dup finding cluster analysis
  • TDWG Symposium
    • Abstracts, Registrations.

Tech

  • Firuta server move.
  • Status of Mongo on FP2
  • QC for SCAN
    • Run on full NAU dataset (using entomologists list from solr), send report to Neil
  • QC work
    • Tianhong/Bertram: More on running workflows.
    • Adding agent authority file to Symbiota - harvest to solr index - use in actor.

Reports

  • Paul
    • Firuta move pending, meeting tomorrow with Anne Marie and David to review.
    • Worked on the draft of complex collection objects paper.
    • In Dina consortium call on Monday, generally review of high level architecture/organization. Provided information on schema changes for complex objects. Generally poor audio, didn't get into annotation, raised this with James in email.

Notes

FilteredPush Team Meeting 2014 Sept 17

Present: Tianhong, James, Jim, David, Paul, Bob

Non-Tech

  • We need to schedule a new meeting date/time.

Paul: Starting with times Bertram is available and room also, will query group when some options are available.

  • Publications
    • Progress
      • Tianhong: Workflows

Tianhong: IDCC paper has been submitted.

      • James: Collection Objects

James: Have draft together, circulated to Dina group, discussed in the call with them on Monday.

James: In Dina meeting on Monday, didn't get to details on annotation, provided them with more details in email followup.

      • Bob: Refactoring Dup finding cluster analysis

Bob: Still on code, haven't gotten to text yet.

  • TDWG Symposium

James: Nothing needed other than the abstract.

    • Abstracts, Registrations.

Paul: Have first draft of abstracts on the list of things to do soon.

Bob: Have circulated a draft abstract to Paul about comparisons.

Tech

  • Firuta server move.

Paul: Meeting with anne marie and david tomorrow to discuss.

  • Status of Mongo on FP2

David: Issues while indexing, mongo will freeze and require a restart, also heap space issues, where a collection (the NAU one) is 9GB in size. Found a repeating string in the analysis result - Tianhong has fixed the source of this in the analysis, and extra collections have been removed from Mongo, so we should be good to retry the analysis.

  • QC for SCAN

David: Currently reharvesting occurrence records from SCAN. Baseline storage usage is 25 GB. May need Alex to increase that on FP2, but probably not, the analysis results shouldn't take up that much space - something we need to keep an eye on.

Paul: State of monitoring?

David: We haven't added the remote processor/disk space monitoriting on FP2 and FP3 yet, also would like to add monitoring of MongoDB and Solr uptime.

    • Run on full NAU dataset (using entomologists list from solr), send report to Neil

Plan, (1) finish load from harvest (next hour or so), (2) Tianhong reruns analysis on NAU data, David creates spreadsheet from resultset, (3) David and Tianhong assess resource consumption, and probably schedule run of analysis on full SCAN data set.

Tianhong: Expectation is that the full run will take 3 or 4 days.

David: Load from harvest should be done in about an hour, will email when done.

  • QC work
    • Tianhong/Bertram: More on running workflows.

Tianhong: Bertram has this running. Examining the larger context with him now - integrating with the work described in the IDCC paper.

    • Adding agent authority file to Symbiota - harvest to solr index - use in actor.

Paul: Havent't gotten to UI development in Symbiota yet.

David: Will need to do some configuration with the OAI/PMH provider for this.

For Tech call thursday:

(1) Coordinating analysis on FP2

(2) Any workflow issues that Bertram wishes to cover