Etherpad for meeting notes: http://firuta.huh.harvard.edu:9000/FP-2014Jun04





Agenda: Non-Tech

  • Kurator

Bertram: Kurator got recommended for funding. Need to provide an abstract. Need to let program officer know that we'd like to change the start date to Sept 1. Moving to UIC, will need to do pre-award transfer from UCDavis to there. Looking for some more information after that. Paul: TDWG 2014 scheduled for Narobi Kenya, no information yet on venues after that. Bob: @Bertram: do you know what if any outstanding reports are blocking it? I dont recall what if any role I have on Kurator grant (Paul: software engineering 8% in years 1 and 2) and whether that role will block a report due from a UMB grant. @Paul: as Harvard employee (yes) or as consultant to Bertram?. My guess is that the only blocking would occur if I were Senior Personnel or maybe only as PI/CoPI. In any case, hope to get the UMB one out of the way early next week...) Jim: Abstract draft is up on a google doc. Bertram: Likely a need a budget adjustment to reflect move to UIUC. Paul: Kristin also needs to adjust to reflect change in start date. Jim: Adjust budget to reflect current reality and let NSF respond to that. Jim: must have FP report done and filed by the end of June, traveling after that, documents are up on google doc. Paul: Propose getting FP report done by June 11 (this meeting next week). Bertram: I like that. James: Looking at namespaces, grab a domain name for Kurator and (Kurator.net is open, but probably will cost), Kurator.org is a art/science site. wiki.kurator.? on biowiki farm. Instructions for abstract in googledoc note that it is good to have somewhere to point people to. Several other things in the instructions we don't fill yet. Bertram: Some sensitivity with naming. Kepler-project.org James: I like kurator-project.org as paralell. Bertram: Need to generalize some more in the abstract.


Paul: Need to make travel arrangments.

  • CNH Meeting: Find duplicates talk (Friday, June 13)

James: Giving a talk on find duplicates at CNH - could talk through a quick video, or screenshots. Paul: I'll provide screenshots and some text, then we can talk. Paul: See: http://wiki.filteredpush.org/wiki/FP-DataEntry as a starting point. David: I could put a demo up on FP1 as well (NEVP one).

  • James: TDWG Symposium. Travel?
  • James: FaunaEuropaea

James: Wrote them, but haven't heard back yet.

  • InvertEBase
  • Request for second NCE

Paul: Looks like everything is in order on both ends. Bertram: Admin here is looking for a brief justification document.

  • iDigBio, Discussions with Greg

Paul: David talking with one of Greg's folks later today. Tech

  • Report from Thursday call

David: Discussed moving forward with analysis and getting a view of the results.

  • Status of going live with Morphbank integration

David: Michael O. from Greg's group has been in contact. Have a call later today to coordinate. Have access to the morphbank subversion repository on sourceforge, checking into branch there.

  • QC for SCAN
    • Run on full NAU dataset, send report to Neil

Tianhong: numerous changes on akka workflow. Test on NAU data in different sizes in fp2/mongo: query, size, collection name year:1944, 4, NAU1944 year:1952, 16, NAU 1952 year:1958, 146, NAU1958 (checklistbank down during rerun) Paul: CoL (Catalog of life) also has a web service: http://webservice.catalogueoflife.org/col/webservice year:1966, 1079, NAU1966 Issues: 1. can’t distinguish homonyms/synonyms in some cases Paul: We should probably ask Nico about the scarab case. Tianhong: Can usually resolve if author is available, can't if author isn't. Paul: Reasonable to flag as a QC issue if the name is ambiguous and needs an authorship to resolve the ambiguity. Thus, more than one result, and can't tell which it is represents a question to flag. 2. GBIF checklistbank not stable, is down for now Paul: Suggests requirement for Kurator: Workflow can provide metadata on required services, and workflow or an upstream actor can use the metadata to test the prerequisites. Bertram: Concept of dryrun in Kepler. Also provide technology support for alternative services (if A isn't available use B). Bertram: Fits with Tianhong's research on workflow design and planning: one can think of a "I'm alive" check as constraint during wf design / assembly, or more importantly, during the "set-up phase". We currently have no explicit support for this in Kepler I think, but we probably need to do this for Kurator. 3. some records are missing in the result (<1%) Tianhong: Not clear what is going on here. Not all records in the input are found in the result.

      • COL in GBIF Checklist bank as authority

Mostly done (see the result above), issue with mixed homonyms and synonyms.

      • Collector names and dates of birth - short list, avoid raising error conditions if can't validate.

Paul: Chuck provided an updated list just before he left. (sent by email, wiki read only at that point). David: Working on getting solr running to index these.

  • Metrics for SCAN

David: Created query to show for each taxonomic expert for each of their interests how many occurrance records need ID, running into issues displaying determinations made for just a particular expert, as the determiner is only tracked as a text field (which may contain different formatted content than the name of the user). Working with Ed to supplement the determination table and UI to be able to collect the information needed to obtain accurate counts.

  • Roadmap

Paul: Most significant critical path concern is the annotation processor/driver. David: Need to give some attention to how to make deployments more efficient. Need to add more monitoring to icingia to see that things are in the expected states as deployment proceeds. Also need to do some tuning in maven. This should free up time. Paul: Good target for reducing technical debt. Paul: Let's David and I meet (friday) and review the NEVP and SCAN diagrams and the roadmap. For Thursday: Examine the 4 result sets (above) from NAU in mongo.