2015Mar03

From FilteredPush
Jump to: navigation, search


Etherpad for meeting notes: http://firuta.huh.harvard.edu:9000/FP-2015Mar03

Agenda

Non-Tech

  • Meeting With AnnoSys
    • Results of Planning call
    • Logistics
  • Publications
    • Paul/James: Collection Objects
    • Paul/Bob/David: QC Reports
    • Bob: Refactoring Dup finding cluster analysis
      • Bob: Access to larger scale infrastructure
    • Bob: List of additional topics
  • Schedule: Next call in two weeks.

Tech

  • Annotation Processor
  • State of Deployments
    • FP2.acis
      • Status of InvertEBase setup
    • FP3.acis
      • State for harvest for NEVP
  • Morphbank integration
  • Habitat, Phenology Ontology work.

Reports

Notes

FilteredPush Team Meeting 2015 Mar 03 Present: Bob, Paul, Tianhong, David, Bertram, James, Jim Agenda: Non-Tech

  • Meeting With AnnoSys
    • Results of Planning call

Bob: We worked out an agenda with them. https://docs.google.com/document/d/12KXnv4Aj1p5CAzk-YxZhSNGZ1kWAy1JcZFRHsIfJ4mo/edit

    • Logistics

Paul, Flights in train. Need to check with Melissa about how best to handle accomodations. Pushing off to start one day later. In two weeks, 17-19th March.

    • Sanity check: production FP deployments are dwcFP 2.0

David: Currently, using original version of dwcFP, in filteredpush.org/ontologies/oa/dwcFP.owl. We are planning to switch to dwcFP 2.0 filteredpush.org/ontologies/FP/2.0/dwcFP.owl Effect of changing should be very small other than namespaces. Bob: Removed terms for compliance with DwC RDF Guide weren't used in production, most of production is using datatype terms, remaining in dwc: namespace.

  • Publications
    • Paul/James: Collection Objects

James: A little more progress. Will work more in the next week.

    • Paul/Bob/David: QC Reports

Outline to Bob, David to forward again.

    • Bob: Refactoring Dup finding cluster analysis

Bob: Nothing new since last week.

    • Yes Workflow paper:

Bertram: Tim has circulated a final draft, ready to go out, last call for comments.

      • Bob: Access to larger scale infrastructure

Bob: Haven't heard back yet from Illinois or Brazil.

    • Bob: List of additional topics

Nothing yet.

  • Schedule: Next call in two weeks?

Paul: Overlaps with the AnnoSys visit. Bob: Gone next week. Plan: No FP call next week. FP call week after next (17th) and week after that (24th) Tech

  • Annotation Processor

David: Started to look at NEVP ingest code, creating a sequence diagram based on that, to understand communication between driver and annotation processor, will use this to develop buisness logic for driver.

  • State of Deployments
    • FP2.acis

David: Up to date, everything running, icingia monitoring more services. Paul: What is involved in moving to dwcFP 2.0 David: Reconfiguration of annotation generation (build with new configuration and rules (rules will need to be updated with namespaces), then rewrite queries. Also need to convert the old annotations to new namespaces. Bob: Any impact on document store? David: Document store contains the input to the generation, can easily regenerate rdf from the document store (for newer annotations), will have to convert older annotations. Believe all responses are tests. If there are production responses, we'll need a mechanism to retain the id for the annotations.

      • Status of InvertEBase setup

David: Have turned on, have to test that it is working properly, may need an update to the inverebase symbiota instance. Submitting annotations to node on FP2 as planned.

    • FP3.acis

David: Supporting infrastructure is installed, running (exept for access point), consistent with FP2. Same monitoring in icinga as FP2. Access point isn't currently up, working on multiple destination client helper (could be used to support Morphbank), single client helper on Symbiota4, would like to use it by both SCAN/InvertEBase and NEVP, with the client helper configured to send SCAN/InvertEBase stuff to FP2, and NEVP stuff to FP3. Needs a little more client matching in the sequence of communication. Submitting things to multiple desinations is working.

      • State for harvest for NEVP

David: Haven't run a harvest yet. Looking at what is involved, need to update the view in Symbiota. On line for end of week with next SCAN harvest. Paul: Would be good to have the occurrences (filtered) in solr moved to FP3. Would be good to have the harvard list of botanists solr indext to support the collecting event date QC. David: Harvests? Paul: Occurrences into Mongo. Filtered Occurrences into Solr for DuplicateFinding. Taxon tree into Fuseki. Agents into solr. David: Where to run analysis? Tianhong: Two performance issues (1) large data set can take up more than available memory (2) Runtime is many hours to days. Bertram: What bottlenecks can we handle? Tianhong: Probably at best we can handle right now with the FP-Akka workflow. Same workflow, with different input data, previous networking benchmarks apply. Paul: we already "optimize" network issues (BL: through caching?) Memory requirements now an issue; streaming might help!? Freeing up memory ... Might need to throttle the data reading actor to avoid buffer overflow on the channel downstream. Tianhong: Issue with memmory may be that reader can load data faster than downstream consuming actor can consume it. Bertram: This is something for Tim to look at. Two proposed optimizations: (1) Rate throlttling on the data loading actor to prevent it from overflowing memmory. (2) Review (with benchmarking) the caching of web service calls. Tianhong: Also we'd brought up the redundancy in data sets, same values encountered over and over. (e.g., geoRefValidator needs identical latitude, longitude, country, locaility and etc. so it's quite unlikely to make many redundant calls as locaility string varies a lot, so caching may not help a lot to reduce geoLocate call). Bertram: Tianhong to provide a summary of these issues for Tim. Delimit the issues, the things we think are at fault, and benchmarks that we have so far.

  • Morphbank integration

David: Nothing back yet.

  • Habitat, Phenology Ontology work.

Paul: NEVP starting to move ahead with defining controled vocabularies for Habitat and Phenology, starting to have a discussion with them about relevant ontologies. Bob: Probably of interest to Plazi for applications against treatment.owl ontology for taxonomic treatments. Will raise it. What has Patrick done about habitat ontologies so far? Paul: Email Patrick, James, and Myself and we can see.