2011Jan04
Etherpad for developing the meeting notes: http://firuta.huh.harvard.edu:9000/FP-2011Jan04
Contents
Reports
- Zhimin:
- Indexed gbif cache with Solr. This suggests we may want to add the search function to the demo.
- Found a open source project [[1]] working on distributed sparql query based on capacities of source nodes. Combined with the D2RQ platform, it is possible to support(only) sparql query in our network.
- As we talk about the quality control based on attributes dependencies, there is some research going on for automatic dependency finding and checking:[[2]],[[3]]
- James:
- Jim Hanken now officially the PI for the FP project.
- New candidates for the second programming position
- Made SOME progress on hiring Zhimin and Bob's contract
- Paul
- At NOMINA VIII, Dave Remsen asked the NOMINA team to review two controlled vocabulary documents: http://rs.gbif.org/vocabulary/gbif/taxonomic_status.xml and http://rs.gbif.org/vocabulary/gbif/nomenclatural_status.xml. The general consensus was that these needed more terms, revision, and treatment as an ontology. I've been working on a draft ontology http://www.aa3sd.net/IPNI2_svn/trunk/IPNI2/ontologies/taxonomic_nomenclatural_status_terms.owl (contact me for a username/password)
- TDWG now has a wiki page up for the Annotations Interest Group http://wiki.tdwg.org/AnnotationsIG/ (but not the charter or a presence on the TDWG Activities page).
Agenda
- Workplan (need to break this down into components and prioritize them)
- Some things to prioritize:
- What kind of query is supported by the network. SPARQL, Tapir or Tapir like schema.
- Public interface (annotation web interface over GBIF cache)
- Three node network
- Network APIs
- Short term goals
- Have running/available by 2011 April 1
- Have running/available by 2011 July 1
- Some things to prioritize:
Notes
Time to meet - MA, CA, China (next couple of months).
- Needs to be around 6PM or 7PM Eastern Time. Perhaps do occasionally as needed.
Workplan
Some things to prioritize
- What kind of query is supported by the network. SPARQL, Tapir or Tapir like schema. (Agenda item for discussion next week (include Lei)).
- Public interface (annotation web interface over GBIF cache)
- Web Client
- GBIF Cache, copy running, behind Web Interface -
- Requirements for hardware/storage?
- Storage of annotations?
- Sandbox only?
- Collect and retain annotations, but allow users to flag annotations as test/sandbox annotations.
- Document use case for sandbox/training for annotations. (Per user defaults? Etc.).
- Design of UI for annotating data with tens to hundreds of concepts.
- Workshop on usability.
- Three node network
- Network APIs
- Test of Google Cloud integration
- Changes received from network (into arbitrary database), how can we get the annotations into the operational database (need mapping tool).
Short term goals
- Tomorrow:
- Estimate of disk space for firuta
- Coordinate order of disks for firuta with Anne Marie
- Next Week
- Evaluate new network architecture, examine path (incremental?) to three node architecture (This Friday).
- Establish work area for examining data mapping and transformations from local collections databases to ABCD and DwC.
- Discuss queries to the network.
- James: Finish Reports (FP prototype, Nomina)
- Have Running by 2011 Feb 1
- Upgrade Firuta Storage
- GBIF Cache up on Firuta
- Copy of IPNI running on Firuta
- Revise branding of web interface
- Add SOLAR search capacity to web interface.
- Bob to create additional annotation examples in AO covering the rest of our use cases.
- KeplerGooglePackage - bundle of actors and sample workflows that allow curation workflows with clustering and duplicate finding, with email notifications out to data curators as a technology demonstration.
- James: Discuss Morphbank and Annotations with the Australians
- Have running/available by 2011 April 1
- Draft Network APIs
- Message Schemas
- Annotation Ontology
- Path to update copy of GBIF cache on demonstration.
- May be non-trivial task - may not have stable IDs - ask Tim about stability of identifiers in the GBIF cache.
- Three Node network (Firuta, Davis(workstation?), UMBFP).
- Physical server for Davis herbarium from set of servers in year 3?
- Test of Virtual machine with lightweight network node. (Might be introduction to iPlant).
- Initial cut at a local schema mapping tool (linking to HUH collections, doing simple field transformations (on a subset of fields)).
- Examine FNA Data.
SPNHC at last week of May (Web client over GBIF Cache? QC, Workflow, KeplerGooglePackage?)
Need date/location for usability workshop.
- Have running/available by 2011 July 1 (Should be targeting release 2011 June 1, for End June/Nov 6 month release schedule (with more frequent subreleases and feature freezes)).
- Three node network with higher capabilities.
- Virtual machine with network node
- Network API documentation
- Possibly initial work with DataNet, might be starting a trial node at this point.
- Draft of client library.
- Network to local schema mapping tool with more complex transformations.
- ?Test QC of FNA data? ?Test QC of GBIF data with IPNI data?
TDWG 2011: Document for Annotations interest group. FP Demonstration. Focus group/workshop for FP?/Annotations?
AGU Fall Meeting 2011 - Semantic Annotation
More Notes
Bertram: --> contact Jim Quinn re. ecological data (more widely varying data and wider set of quality problems than specimen collections data) for FP, have a 3-way call --> find time to have Lei join calls --> re. "public face": should we ask Lei to keep up her google site or delete it? Currently reaching ECS-166 (Scientific Data Management), looking for ideas for homework assignments :)
- incorporating updates into collection databases:
Paul: simple: transformations and field mappings harder: multiplicity; one vs multiple entities
global FP schema vs local DB schemas schema mapping tool to map between global and local
Bob: internal correlations?