2014May28

From Filtered Push Wiki
Jump to: navigation, search


Etherpad for meeting notes: http://firuta.huh.harvard.edu:9000/FP-2014May28

Agenda

Non-Tech

  • SPNHC
  • James: TDWG Symposium: Who to invite
  • James: FaunaEuropaea
  • InvertEBase
  • Request for second NCE
  • iDigBio, Discussions with Greg

Tech

  • Report from Thursday call
  • Status of going live with Morphbank integration
  • QC for SCAN
    • Run on full NAU dataset, send report to Neil
      • COL in GBIF Checklist bank as authority
      • Collector names and dates of birth - short list, avoid raising error conditions if can't validate.
  • Metrics for SCAN

Reports

Notes

FilteredPush Team Meeting 2014 May 29 Present: Bob, David, Chuck, Paul, James, Tianhong, Bertram Agenda Non-Tech

  • SPNHC
  • James: TDWG Symposium: Who to invite
  • James: FaunaEuropaea

James: Need to schedule a call with them next week, or early July. Might be better later.

  • InvertEBase
  • Request for second NCE
  • iDigBio, Discussions with Greg

Bob: Made contact with Greg, got things started. Discussed use cases for iDigBio, they are going to document internally at iDigBio. Tech

  • Report from Thursday call

Reviewed feedback on QC results from James and Nico. Looked at what needs to be done to get QC report on NAU data to Neil within a week.

  • Status of going live with Morphbank integration

David: Had conference call with Greg and one of his programmers, discussed how to do deployment. Have created a branch in the morphbank repository, will check in to there, they will review code and merge into branch. Also discussed how to do deployments (both morphbank and a node to support what they want to do with the croudsourcing system), looks like plan will be to create a puppet configuration for them. Testing for morphbank would need to get done on our side, we won't have full access, need more coordination. Discussion: Still domain specific entanglement in FP-Core, desirable to refactor this out into a project that can be brought in as a dependency for a deployment.

  • QC for SCAN
    • Run on full NAU dataset, send report to Neil

Tianhong: Making the changes to the actors following the feedback. Running into issues from the combinations of the actors, and from differences in the services. COL appears to be returning more than one non-homonym result from some queries. How to handle James' comment: "Actor Result" : "QUESTION"; Is this the equivalent of an > orange-colored cell where the result is uncertain? We may need to > consider a couple kinds of "orange-colored results." One case where > we ran the service but the result was "null" because no data was > available and a second where the service has data but it is different > than the data being assessed. Thus, the input data may be incorrect > based on the comparison and the user is being asked to verify it. Bob: I Don't Know - most frequent sort of correct answer. (1) Question - inconsistency that can't be resolved. (2) I Don't Know - Information not available.

      • COL in GBIF Checklist bank as authority

Tianhong: Coming up with queries on names that return more than one result without appearing to be homonyms. May need to write more code to distinguish.

      • Collector names and dates of birth - short list, avoid raising error conditions if can't validate.

If actor can't find list, put in comment can't find list and go on. Chuck: list of names is up on wiki. David: Have to work from some of Maureen's harvest code to do a load into solr. James: Good in the results, when there is an inconsistency between date collected and collector birth/death dates, to report the birth and death dates. TODO: David to look at the load of the list from the wiki into solr on FP2, schema spltting last name, other name bits, start year, end year. Provide endpoint to Tianhong to invoke with actor (in Akka on localhost, OK if remotely accessible, actually probably good for testing). Last name - case normalization, diacritc normalization Other name bits - case normalization, index for query on first initial of first name.

  • Metrics for SCAN

David: Constructing queries to produce the data that satisfies Neil's query. Chuck: FP-Data entry demo over NEVP with Patrick's interface. Haven't heard feedback back from him. For integration with Rapid - has globals that still need to be degeneralized to support more than one instance being invoked from one application (there is a global constructor that gets clobbered by one of the instances).