2015Jan20

From FilteredPush
Jump to: navigation, search


Etherpad for meeting notes: http://firuta.huh.harvard.edu:9000/FP-2015Jan20

Agenda

Non-Tech

  • SPNHC 2015
  • Publications
    • Paul/James: Collection Objects
    • Bob: Refactoring Dup finding cluster analysis
      • Bob: Access to larger scale infrastructure
    • Bob: List of additional topics

Tech

  • QC work
    • Agent authority file to Symbiota - harvest to solr index - use in actor.
    • JSON to XLS into Kurator
    • Generate QC results for each SCAN collection.
  • State of Deployments
    • FP2.acis
      • Status of InvertEBase setup
    • FP3.acis
      • State for harvest for NEVP
  • Annotation Processor
  • Morphbank integration

Reports

Notes

Present: James, Jim, Bertram, Tim, Bob, Tianhong, David, Paul

Agenda


Non-Tech

James suggests "yes if" (final hurrah!?) on FP

- for Kurator: (a) demo camp: showing what we're able to do with Akka right now; here is a dataset, here's a tool to manipulate it; here's a result view / spreadsheet

(b) "regular" abstract: big picture; explaining FPush end game and transitioning into Kurator

- James: -- how has the curation story changed from last year -- how to engage the community into Kurator

- poster idea: blank(-ish) poster; let users annotate the poster to get community input

- Bob: have a before- vs after- distinction on the poster

Decision: 1. demo camp on Kurator 2. one presentation on transitioning FP into K 3. one poster: community feedback

  • Publications
    • Paul/James: Collection Objects

- making progress on monitoring progress ;-)

    • Bob: Refactoring Dup finding cluster analysis

Bob: No progress to report.


      • Bob: Access to larger scale infrastructure

Bob: waiting for yours truly (Bertram) to find a contact at NCSA Bertram: Would like some slides to show around when talking to people at NCSA Bob: Can provide slides from TDWG and some narrative.

Bertram: there's a "National Data Services Lab" initiative here at NCSA.. maybe we can propose it for NDS labs...

    • Bob: List of additional topics

Bob: One of these: Annotation Generation: Working on trying to understand the details of what David is doing. http://wiki.filteredpush.org/wiki/JSON_Stuff Bob: https://docs.google.com/document/d/1FyTIbaIRIzw3uizxs5HEBOgcoxfF4A07g26KK_xrEYk/edit for comments about what papers we might write

JSON-LD is now the "standard serialization" for OA (not RDF/XML)

James: Bob, Joel and I are looking at doing a graph expt. with Jason-LD and we also have another staff member who is knowledgeable on LD, etc. Joel is also looking into Jason DL vs the competing RDF/XML

Paul: Please circulate link to list of papers again, we'll all comment on it and address it as an agenda item next week.

Tech

  • QC work
    • Agent authority file to Symbiota - harvest to solr index - use in actor.

- Agent code submitted to Symbiota branch - targetting end of January for production roll out

    • JSON to XLS into Kurator

- David's tool is not incorporated into Kurator

    • Generate QC results for each SCAN collection.

- need to reharvest data - old collection can be used for testing in the meantime

- Tianhong: ran one data collection with existing curation workflow (MCZ) -- workflow we have has "issues" with sci-name validator -- breaking down complex actors (such as sci-name validator) into smaller pieces

- question: should we fix the already idenitified issues (#event id please ;-)

- Tim: make clear that FP, Kurator are not in the "right answer/wrong answer" business; might have been a misconception that FP is the "authority" to resolve issues; make it clear that even if user employ pre-existing workflows, the workflow users are still responsible and in control for configuration, source selection etc

- Paul: let's put a corresponding clarification / disclaimer on the first page in the spreadsheet (curation report) - then run workflow on various collections - this way keep pushing on the community feedback

- Jim: can we document the authority choices that wf users have

Jim: Be a\uprfront about the assumptions: The authority lists.

James: On front page report use URL to the resources "use policy" for reference. Each resource should have a page which discuss their limitations etc.

Paul: Plan: (1) Comment on and revise text for first page of spreadsheet report making clear what the assumptions are (2) David adds to that first page a summary of the sources used (James suggests including URIs for each source as well), (3) Tianhong runs the workflow on more SCAN data, and we send out the results for more feedback.

    • Next Kurator Sprint.

- ? add JSON-to-XLS functionality - ? modularize complex actors


  • State of Deployments

David: FP2 and FP3 are up to date. Working to get InvertEbase client helper configuration. Minor bug fixes in progress in the client helper.

    • FP2.acis
      • Status of InvertEBase setup
    • FP3.acis
      • State for harvest for NEVP

Paul: Coordinating between David, Ed, and Patrick, planning on updating NEVP schema when Ed releases next patch (end of Jan), updating the NEVP Symbiota client at that point, and at that time adding the FP clinnt functionality for NEVP.

  • Annotation Processor

David: No progress this week.

  • Morphbank integration

David: Need to ping Greg again.-