2013Sep25

From Filtered Push Wiki
Jump to: navigation, search


Etherpad for meeting notes: http://firuta.huh.harvard.edu:9000/FP-2013Sep25

Agenda

  • Kepler
    • Report: Summary of discussion on Friday: Lessons from metrics.
    • Report: Date Validation (QC for core of taxon name, georeference, date collected).
    • Discussion Duplicate Finding Find_Duplicates. State of old code, Lucene, etc.
  • Report: W3C RDF Validation Workshop, report of FP participation.
  • SCAN TCN Support
    • Report: Deployment
    • Report: MCZbase Driver
  • NEVP TCN Support
    • Report: Node Status update (production deployment target date 2013 Aug 15).
    • Report: Specify-HUH driver
  • FP Infrastructure
    • Report: FP Node Refactoring
    • Report: Annotation Processor Enhancements
  • Discussion: Approaches to repeated QC requests for same records.
  • Discussion: Issue Tracking: Revisit Mantis-BT? See: CodeHosting and 2011Feb01

Non-Tech

Next Week

    • Discussion: NEVP New Occurrence record ingest process

For Future meetings

  • dwcFP and DarwinCore RDF guide - feedback.
  • Prospective meetings, development targets.
  • Burndown

Reports

  • Paul
    • With David, did some sanity checking of annotations accumulated so far in SCAN.
    • Note from John Deck, Semantic annotation abstract accepted for TDWG, 9 minutes allocated.
  • Chuck
    • Lots of little commits on AnnotationProcessor: Implement "Profiles" (just UI sugar, really) / Unapprove users / Style navigation links to give better sense of context. / Checkboxes for user roles are working.
    • Filed new bugs / clarified workplan for password reminders.
    • Was going to start on workflow UI, but David discovered that our use of JPA is not working in Glassfish: I think we know the direction forward, but I'll need help from David on it.
  • Maureen
    • Extracted non-UI code from Specify for workbench auto-mapping. Will work next on integrating extracted code into Specify Driver (small task) and then doing the upload step in Specify workbench upload process (medium task)

Notes

FilteredPush Team Meeting 2013 Sept 25

Present: Paul, Bob, Jim, Maureen, David, Chuck, James, Tianhong, Sven

Non-Tech

  • InvertENet TCN Proposal

Deadline (for us and Chicago) is Oct 18

Jim: Good faith effort to cut back on budget, but make sure that we have enough to deliver, small or larger cutback on on our part won't affect their bottom line significantly.

We've heard back from Petra, she's about to start tackling this again.

TODO: Paul: Follow up with Petra.

Paul: Heard back from John Deck, Semantic abstract accepted.

James: Two other abstracts got accepted.

Bob: Greg is looking for something on OA in his session.

Paul: Need to get current burndown figures.

Jim: Reminder: Please check with Mellisa before travel planning.

Tech

  • Kepler
    • Report: Summary of discussion on Friday: Lessons from metrics.

Revewed graphs of service request times.

Maureen: multiple potential causes to clustering, not enough data here to untangle.

Paul: Approach appears to be to QC records on addition/update, rather than on query, store resutls and return.

Bob: Some potental pitfalls to this approach, and TCNs are continually adding data.

Paul: QC based on correlations, e.g collector tracks being one of these.

    • Report: Date Validation (QC for core of taxon name, georeference, date collected).

Paul: Third leg of QC for TCN science goals.

Tianhong: Looked at Lei's code for date validation, raises some new requirements.

Paul: Two issues: Converting strings to dates/validating date formats. Then testing to see if the dates are correct.

Chuck: 10/11/12 may have different requirements for validation.

Bob: Decoding strings into valid (or possible set of dates) is one (standard sort) of problem, validating dates against other data.

Chuck: Different users will have different needs of fussyness about date cleaning.

Paul: 10/11/12 nice case - ambiguous on its own (parse problem), may be able to pin down elements with comparison to other data, as with georeference.

Tianhong: Looking at a occurrence data set from FP3 (NEVP). James: Verbatim event date is the verbatim field.

Tianhong: The code that lei had was for each collector, then looking for outliers for that collector. Sven: What kind of outliers do we want to detect? Either the date is wrong, or the location is wrong.

Bob: Or the collector name is wrong. James: Comes down to solve with more data - flag problems that we can't solve and pass on to a human.

Bob: There may be a relatively small number of possibilities for correcting the issue - UI presenting choices.

TODO: James: Start making some specifications, and some real examples, put on wiki.

  • Report: W3C RDF Validation Workshop, report of FP participation.

Bob: David co-author on a paper the OA people graciously included us on. Also provided a bit more context. We mentioned in the slides that our biggest concern in validation is convincing OA to adopt our extensions, we want to make sure that any OA validation procedure is consistent with our validation mechanisms (when extended to something common). Workshop is going to propose a workshop for RDF validation (not syntax, but rule/procedures describing whether the RDF has some semantics). Looks like several years of work coming up there.

David: Our approach is similar to several others - SPARQL rules and preconditions, desire to have a common language for expressing rules. Current rule based validators only have parttial functional overlap.

  • SCAN TCN Support
    • Report: Deployment

David: Sanity check found determinaion pairs, with only one of pair in annotaition, need to check with Ed. Some annotations aren't created as preconditions aren't met (from determination transcriptions).

David: We aren't treating new determinations differently from transcriptions.

Paul: We can distinguish in the annotation with oa:motivation, e.g. transcribing.

    • Report: MCZbase Driver

Maureen: Pending specify Driver work.

  • NEVP TCN Support
    • Report: Node Status update (production deployment target date 2013 Aug 15).

David: Still allready to go.

    • Report: Specify-HUH driver

Maureen: most of the way through extracting Specify code for doing the workbench upload process, took longer than anticipated, but almost done. This approach makes driver integration more seamless for Specify users, and less fragile for maintenance. (workbench upload in two parts: mapping, upload; neither requires workbench UI functionality to be present, so it will work for HUH too)

  • FP Infrastructure
    • Report: FP Node Refactoring

David: Finishing up javadocs and tests for knowlege. Implemented triple store interface for Mulgara (can use either that or Fuseki). Integrating new knowlege api with existing jobs, started into message driven jobs. Need to implement message status to support this.

    • Report: Annotation Processor Enhancements

Chuck: User administration, able to update user status, improved navigation. Working on glassfish/JPA issues.

David: In tomcat, we are using a non-JTA data source and spring is handling transactions, using JPA for transactions in glassfish deployments. Issue is in different use of JTA in the two deployment settings.

Chuck: Finding additional issues in trying to deploy on new workstation - evaluating.

Maureen: we're keeping Mantis, it has some quirks with some browsers but we can work around them easier than moving to a new system.


Next Week

    • Discussion Duplicate Finding Find_Duplicates. State of old code, Lucene, etc.
    • Discussion: NEVP New Occurrence record ingest process
    • Discussion: Approaches to repeated QC requests for same records.

Friday tech meeting agenda:

  1. where to put code that is not in JavaSOA
  2. client identity verification
  3. more on Kepler and repeated requests for QC of a record; apply analysis on harvest or on demand