2011Jan04

From Filtered Push Wiki
Jump to: navigation, search
User: David Lowery | User: James Macklin | User: Bob Morris | User: Zhimin Wang | User: BertramLudaescher | User: Paul J. Morris | User: Chinua Iloabachie | User:
"User:" cannot be used as a page name in this wiki.
| User:
"User:" cannot be used as a page name in this wiki.
| User:
"User:" cannot be used as a page name in this wiki.
| User:
"User:" cannot be used as a page name in this wiki.



Etherpad for developing the meeting notes: http://firuta.huh.harvard.edu:9000/FP-2011Jan04

Reports

  • Zhimin:
    • Indexed gbif cache with Solr. This suggests we may want to add the search function to the demo.
    • Found a open source project [[1]] working on distributed sparql query based on capacities of source nodes. Combined with the D2RQ platform, it is possible to support(only) sparql query in our network.
    • As we talk about the quality control based on attributes dependencies, there is some research going on for automatic dependency finding and checking:[[2]],[[3]]
  • James:
    • Jim Hanken now officially the PI for the FP project.
    • New candidates for the second programming position
    • Made SOME progress on hiring Zhimin and Bob's contract

Agenda

  • Workplan (need to break this down into components and prioritize them)
    • Some things to prioritize:
      • What kind of query is supported by the network. SPARQL, Tapir or Tapir like schema.
      • Public interface (annotation web interface over GBIF cache)
      • Three node network
      • Network APIs
    • Short term goals
      • Have running/available by 2011 April 1
      • Have running/available by 2011 July 1

Notes

Time to meet - MA, CA, China (next couple of months).

  • Needs to be around 6PM or 7PM Eastern Time. Perhaps do occasionally as needed.

Workplan


Some things to prioritize

  • What kind of query is supported by the network. SPARQL, Tapir or Tapir like schema. (Agenda item for discussion next week (include Lei)).
  • Public interface (annotation web interface over GBIF cache)
    • Web Client
    • GBIF Cache, copy running, behind Web Interface -
      • Requirements for hardware/storage?
      • Storage of annotations?
        • Sandbox only?
        • Collect and retain annotations, but allow users to flag annotations as test/sandbox annotations.
        • Document use case for sandbox/training for annotations. (Per user defaults? Etc.).
      • Design of UI for annotating data with tens to hundreds of concepts.
      • Workshop on usability.
  • Three node network
    • Network APIs
  • Test of Google Cloud integration
  • Changes received from network (into arbitrary database), how can we get the annotations into the operational database (need mapping tool).

Short term goals

  • Tomorrow:
    • Estimate of disk space for firuta
    • Coordinate order of disks for firuta with Anne Marie
  • Next Week
    • Evaluate new network architecture, examine path (incremental?) to three node architecture (This Friday).
    • Establish work area for examining data mapping and transformations from local collections databases to ABCD and DwC.
    • Discuss queries to the network.
    • James: Finish Reports (FP prototype, Nomina)
  • Have Running by 2011 Feb 1
    • Upgrade Firuta Storage
    • GBIF Cache up on Firuta
    • Copy of IPNI running on Firuta
    • Revise branding of web interface
    • Add SOLAR search capacity to web interface.
    • Bob to create additional annotation examples in AO covering the rest of our use cases.
    • KeplerGooglePackage - bundle of actors and sample workflows that allow curation workflows with clustering and duplicate finding, with email notifications out to data curators as a technology demonstration.
    • James: Discuss Morphbank and Annotations with the Australians
  • Have running/available by 2011 April 1
    • Draft Network APIs
    • Message Schemas
    • Annotation Ontology
    • Path to update copy of GBIF cache on demonstration.
    • May be non-trivial task - may not have stable IDs - ask Tim about stability of identifiers in the GBIF cache.
    • Three Node network (Firuta, Davis(workstation?), UMBFP).
    • Physical server for Davis herbarium from set of servers in year 3?
    • Test of Virtual machine with lightweight network node. (Might be introduction to iPlant).
    • Initial cut at a local schema mapping tool (linking to HUH collections, doing simple field transformations (on a subset of fields)).
    • Examine FNA Data.

SPNHC at last week of May (Web client over GBIF Cache? QC, Workflow, KeplerGooglePackage?)

Need date/location for usability workshop.

  • Have running/available by 2011 July 1 (Should be targeting release 2011 June 1, for End June/Nov 6 month release schedule (with more frequent subreleases and feature freezes)).
    • Three node network with higher capabilities.
    • Virtual machine with network node
    • Network API documentation
    • Possibly initial work with DataNet, might be starting a trial node at this point.
    • Draft of client library.
    • Network to local schema mapping tool with more complex transformations.
    •  ?Test QC of FNA data?  ?Test QC of GBIF data with IPNI data?

TDWG 2011: Document for Annotations interest group. FP Demonstration. Focus group/workshop for FP?/Annotations?

AGU Fall Meeting 2011 - Semantic Annotation

More Notes

Bertram: --> contact Jim Quinn re. ecological data (more widely varying data and wider set of quality problems than specimen collections data) for FP, have a 3-way call --> find time to have Lei join calls --> re. "public face": should we ask Lei to keep up her google site or delete it? Currently reaching ECS-166 (Scientific Data Management), looking for ideas for homework assignments :)

  • incorporating updates into collection databases:

Paul: simple: transformations and field mappings harder: multiplicity; one vs multiple entities

global FP schema vs local DB schemas schema mapping tool to map between global and local

Bob: internal correlations?