2015Feb24

From FilteredPush
Jump to: navigation, search


Etherpad for meeting notes: http://firuta.huh.harvard.edu:9000/FP-2015Feb24

Agenda

Overflow from Kurator Call

  • iDigBio API hackathon.
  • Kurator webinar for iDigBio.
  • Agent authority file to Symbiota - harvest to solr index - use in actor.
  • Progress towards QC reports for all SCAN collections.
    • Feeback QC results for each SCAN collection - next collections:
      • NMSU, MCZ
  • Rest of SCAN

Non-Tech

  • Meeting With AnnoSys
    • Planning call
    • Logistics
  • Publications
    • Paul/James: Collection Objects
    • Paul/Bob/David: QC Reports
    • Bob: Refactoring Dup finding cluster analysis
      • Bob: Access to larger scale infrastructure
    • Bob: List of additional topics
  • Schedule: Next call Next week, then two weeks after that.

Tech

  • Annotation Processor
  • State of Deployments
    • FP2.acis
      • Status of InvertEBase setup
    • FP3.acis
      • State for harvest for NEVP
  • Morphbank integration

Reports

  • Paul
    • Response back from NMSU, pointing out a specific case where date validation seemed odd.
    • A little more work on dwcFP with Bob.
    • Began outline of QC reports paper, sent on to David.

Notes

FilteredPush Team meeting 2015 Feb 24 Present: Bertram, James, Tianhong, Paul, David

Overflow from Kurator Call

  • iDigBio API hackathon.

Bertram: what product?

Paul: Data loading actor for Kurator to get data from the iDigBio API.

Bertram: Who would go?

Tim and David are both booked on those days.

Tianhong would be logical. Tianhong may be available. (another pending conference May 31th - June 4th) Dates are June 3-5. Deadline for application Feb 28th. https://www.idigbio.org/content/call-participation-hackathon-idigbio-apisservices-and-interoperability-0

  • Kurator webinar for iDigBio.

Paul: Sent an outline of what we'd like to do to and what we need to get in place before then to Deb Paul, she came back with how about shortly before SPNHC.

Tim: If we make the FP towards Kurator approach the approach we are working on, we should be able to get two or three releases out by then.

Paul: Will reply back to Deb and see if we can put some dates on the table.

  • Agent authority file to Symbiota - harvest to solr index - use in actor.

Paul: Still waiting for Ed to roll update into production systems.

  • Progress towards QC reports for all SCAN collections.
    • Feeback QC results for each SCAN collection - next collections:
      • NMSU, MCZ

David: Feedback came back from NMSU, have spreadsheet for MCZ, need to mail to brendan.

    • Rest of SCAN

Tianhong: Workflow running into occasional records where exception conditions are raised that prevent the workflow from continuing. Working on breaking the data set into smaller chunks. Overall processing time for all of SCAN is several days. Common source of exceptions is handling of response from COL service, exceptions are probably not being handled at the appropriate level, needs review.

Bertram: seems to also suggest a principled approach to exception handling. Cf. the work by Little-Jil folks from UMass (Lee Osterweil, et al) -- not necessarily applicable for us, but something to think about more. I suggest Tianhong and Tim to look at this and think about this. Here is a potentially relevant link from the Little Jil camp: http://laser.cs.umass.edu/techreports/08-06.pdf

  • Yes-Workflow MS.

Bertram: Need feedback by end of week. Please check affiliations, assertions made in paper, etc. Also consider if there are scripts around that could be marked up as test cases.

  • James: DINA tools for data quality assessment and migration: possible Kurator role?

James: DINA project working on services for cleaning collection data (like biovel) using open refine. Good to have a video conference with them to look at approaches.

  • James: Question for Tianhong or Paul: What fungal services/references are we currently using?

Kepler Kuration was using IF webservces (CABI: likely an old copy...) See line 303 in http://sourceforge.net/p/filteredpush/svn/HEAD/tree/trunk/FP-Tools/FP-CurationServices/src/main/java/fp/services/IndexFungorumService.java

http://www.indexfungorum.org/IXFWebService/Fungus.asmx?op=NameSearch

Non-Tech

  • Meeting With AnnoSys
    • Planning call

Bob: Scheduled for this Thursday 10 am est Expect to be in tomorrow(?) and thursday. Will have some web pages up. James: Will be able to join in.

Bob will circulate the proposed agenda again - point of call is developing an agenda for their visit. A few technical points, plan for JSON-LD, OAD extensions to OA. Things that we need to make sure we get to during their vist.

Paul: Add travel to the agenda for the call.

    • Logistics

Paul: Travel seems to be in train. Not sure about accomodations. We can make sure we know where everything is on the call tomorrow.

  • Publications
    • Paul/James: Collection Objects

James: Some more work over last week. Need one more chunk of time to focus. Want to get this one done.

    • Paul/Bob/David: QC Reports

Paul: Started on outline, circulated to David, he'll get to Bob.

    • Bob: Refactoring Dup finding cluster analysis
      • Bob: Access to larger scale infrastructure

Bob: No news.

    • Bob: List of additional topics

Nothing further yet.

  • Schedule: Next call Next week, then two weeks after that?

Paul: Call next week?

James: We regularly run over our time for Kurator.

Bob: Lots of interfering activity with calls over much of March on Tuesdays (after the first week).

Paul: Let's plan on this call next week, and then work from there.

Tech

  • Annotation Processor

No progress yet, on for next week.

  • State of Deployments

David: Everything on FP2 and FP3 is up to date and at stable build of client helper and access point. Added icinga monitoring of more of the services, need to monitor activemq still (can also use this to monitor messaging activity in its web console)

Paul: Tag of build?

David: Will tag this build.

    • FP2.acis
      • Status of InvertEBase setup
    • FP3.acis

David: Need to do more testing of client helper under configuration for hitting multiple endpoints.

      • State for harvest for NEVP

David: Views needs to be updated, OAI-PMH provider is running, script should be able to harvest from there. Need to check mulgara harvester, filesystem based harvester script, etc, to make sure that we get what we expect (for the taxon tree into mulgara and the occurrence data into mongo).

  • Morphbank integration

David: Sent mail to Greg two weeks ago, haven't heard back yet.

James: Saw Greg last week, he says the prelimiary work is done, just get back in touch.

David: Will send another email, cc James.

  • Agenda items for Thursday Tech Call:
  1. Examination of response back from Scott Bundy on the NMSU result set: records NMSU 1018061, 1018062, and 1018063. Date validation odd?
  2. Error handling in workflows - observation that we can't just run the workflow on the full SCAN data set.