2014Oct21

From FilteredPush
Jump to: navigation, search


Etherpad for meeting notes: http://firuta.huh.harvard.edu:9000/FP-2014Oct21

Agenda

Non-Tech

  • iDigBio TCN Summit
  • LepNet TCN
  • Publications
    • Progress
      • Paul: Collection Objects
      • Bob: Refactoring Dup finding cluster analysis
      • Tianhong - IDCC submission
  • No meeting next Tuesday?

Tech

  • Kurator Integration
    • Any remaining build issues?
    • DarwinCore reader
    • SVN reorganization/cleanup
  • QC for SCAN
    • Feedback from Neil/Paul Heinrich
    • Feedback from Nico
  • QC work
    • Adding agent authority file to Symbiota - harvest to solr index - use in actor.
  • Firuta server move.
    • State of deployed apps
  • Deployments
    • Access point updates
    • Bringing Annotation Processor up-to-date
    • Deploy and re-run harvest of occurrence records
    • Status of fp2 and fp3
  • For Thursday: Actor support for Tim's sprint

Reports

  • Paul
    • Have Minimal working UI for browsing/searching/displaying/editing agent records in Symbiota. Branched and committing to the branch. Have asked Ed for feedback on the schema changes in the branch and a proposal to rename omcollectors to agents.

Notes

Present: Bertram, Jim, James, Tianhong, Paul, David.

Non-Tech

  • iDigBio TCN Summit

Jim & David going

Bertram: link to meeting info/program? Use same demo as for Kurator?

Jim: Will circulate.

Discussion: Yes if done, but no added pressure to complete by then.

  • LepNet TCN

Jim: Has been submitted.

  • Publications
    • Progress
      • Paul: Collection Objects

Paul: No further progress this week.

      • Bob: Refactoring Dup finding cluster analysis
      • Tianhong - IDCC submission

Tianhong: Merged all the comments expect what from Paul

  • No meeting next Tuesday?

No meeting next week, almost everyone traveling.

  • TDWG talks, etc.

Bertram: Need to put together. Will be talking about related projects tomorrow: http://www.lis.illinois.edu/events/2014/10/22/eresearch-roundtable-data-curation-biodiversity-informatics

James: Good to focus on FP-Kuration/Akka, Tianhong's work - then bring in Kurator at the end. Likely some material of use from last years' TDWG presentations.

Paul: Likewise, need to put together. Perhaps later today.

Tech

  • Kurator Integration
    • Help with testcases for TDWG Kurator demo ..

Tianohng: some examples from last week?

Bertram: Who to do?

Paul: Format probably csv (occurrence.txt like from a darwin core archive) for input, and then json as output.

    • Any remaining build issues?

David: Looks like everything has been resolved, made FP-Core not require a settings.xml file, Tim has been able to get everything to build in his continuous integration system (bamboo).

Tianhong: issue about two external datasets to put in jar. I found one is free, but no info about the other one (a website host a dataset provided from an individual from Russia, so I'm not sure who should I ask?) country boundary:http://data.geocomm.com/editorspicks/data_world11.html

Discussion: From assertions on that page, probably OK to include with links to the source (the metatdata on that page) and that pageitself.

Tianhong: OK, will do

    • DarwinCore reader

Paul: Simple case, unzip and load occurrence.txt as simple darwin core. More complex case, use a DwC archive aware library (GBIF has written one).

    • SVN reorganization/cleanup

David: Few minor projects that need review. Main cleanup effort completed.

  • QC for SCAN
    • Feedback from Neil/Paul Heinrich

David: Next step with him is to run analysis on some more datasets and make them available for him (preferably as downloads accessible from symbiota.

Tianhong: more issues on the workflow:

  1. distinguishable correct field and curated field in curated case of one record are available now. ##distinguish background color?
  2. quality of the services:
    1. in some cases, Ü in the record been changed to U; others, U in the records been changed to Ü
      1. Paul: Looks like character encoding issues, or data sources that have names with or without extended characters -
      2. Tianhong: should we always use the one in the server, or compare to see if they're close enough, if close enough, use the one in the record?
      3. Paul: Perhaps where the issue is an accented character, accept the case that contains the accented character when that is the only difference.
      4. Tianhong: OK
  3. subspecies handling
    1. e.g., does “Anelaphus moestus moestus” always means “Anelaphus moestus subs. moestus”? Those subspecies always found no result in COL. and found result but no author in backbone.
      1. Paul, Jim: For a trinomial, when no rank is specified, in zoology, subsp. can be inferred for the rank.
      2. James: In botany this isn't the case, the, rank may be variety and ommitted.
      3. Tianhong: should we just flag and say "don't know validity"?
      4. Paul: Concur.
    • Feedback from Nico

David: Nico is reviewing ASU spreadsheet, no comments back yet.

Bertram: Please forward copy.

  • QC work
    • Adding agent authority file to Symbiota - harvest to solr index - use in actor.

Paul: UI in symbiota in progress.

  • Firuta server move.
    • State of deployed apps

David: Fedora and Mulgara are deployed and configured for FP. Java version brought up to date, access point built and ready to deploy.

  • Deployments
    • Access point updates

David: FP3 set up at same time as Firuta, requirements (Fedora and Mulgara), and access point. FP2 need to work on apache configuration before updating access point.

    • Bringing Annotation Processor up-to-date

David: No progress this week.

    • Deploy and re-run harvest of occurrence records

David: Got through issues with harvest - view was queried on constructed fields that could not be indexed. Have occurrence harvester working, and deployed on FP2, can run on whole data set in a few hours. Not automated on FP2, but ready to do this. Harvest isn't robust to errors - to many loosely coupled components with out logging - hard to detect failures when they occurr.

Paul: Rationale for loose coupling using file system was multiple consumers of harvest (fedora, mongo, solr). If not working well, tightening up to identify and log errors in one place makes sense.

    • Status of fp2 and fp3

Above.

  • For Thursday: Actor support for Tim's sprint