2011Feb22

From FilteredPush
Jump to: navigation, search


Note: Starting today, regular weekly team meeting will begin at 2:45 Eastern, not 2:00 Eastern.

Etherpad for meeting notes: http://firuta.huh.harvard.edu:9000/FP-2011Feb22


Agenda

  • Annotation Ontology
  • Queries
  • Short term development targets
    • SVN to Sourceforge
    • Milestone: Public access to web interface. Due in 5 days (02/28/11)
    • Milestone: Three Node Alpha Due in 5 weeks (04/01/11)

Reports

  • Zhimin
    • Working with Bob on architecture design
    • Continue to evaluate Tuscany and prepare to repackage demo in Tuscany way.
    • Working with Lei on schema mapping
    • Help James for Filteredpush demo
  • Paul
    • Moving forward on candidate list.
    • Anne Marie still back and forth with Dell support on disks for Firuta.
  • Lei and Bertram
    • Using the idea of clio system to generate schema mapping rule and do query rewriting for the example mapping in Arctos. It seems this technical route is possible. But since there’s no practical system we could find and the paper doesn’t cover all the details, the development of such system will take some time and we might come into some functional and performance problems.
  • Bob
    • Working with Zhimin on architecture
    • More study of Semantic Media Wiki documentation
  • James
    • Gave a presentation to colleagues at AAFC (Agriculture and Agri-Food Canada) on "Being a botanist in the 21st century" including a significant section on Filtered Push.
    • Worked with Zhimin to work out bugs in the web interface.

Notes

2011 Feb 22 FilteredPush Team Meeting.

Present: Bertram, Lei, James, Jim Hanken, David. Zhimin, Bob, Paul.

   * Annotation Ontology 


Meeting with Paulo next week (monday). Bob Rearranged the web page about annotation ontology to reflect changes. AO just out of 1 year incubation, starting on WC3 standardization process. Bob making minimal extensions to make useful for data.

   * Queries 


Good time to formulate a list of Domain scientist questions for (annotatatable) pooled data.

    What questions would one ask of a large body of occurance data?
   How would those questions vary if the data were vouchered specimen data or occurence data.  
   How would questions vary if the data quality is known or not known?  What kind of confidence does one have in specimen data, what kind of confidence is possible for observational data. 

James: Concept of sampling and bias: Many collectors are sampling in a biased way (in order to find their organisms).

Bob: Are there systematic patterns that probablistic models can account for?

James: Taxonomic researcher often looking for variability, in contrast with observational data which is often survey (time and space) oriented?

Bob: Perhaps ask about specimen data first, then as a separate enterprise ask about observational data.

Bertram: Involve Davis team in AO work? Probably good idea to avoid requirements in the ontology to do reasoning, and limit to effectively precompiled concept heirarchies. Also: link questions by Jim and James to ontology (which user queries need the ontology and how do they need the ontology?)

Bob: Started with AO, haven't looked hard at its limitations. Goal to see if anotations would fit into something that is likely to be standardized. Good to see what Paulo has thought about the complexity of reasoning. http://etaxonomy.org/svn/FP/FP-Network/trunk/design/ontologies/ao/aod.owl


Mapping:

Lei: Clio mapping Arctos data concepts. Document on clio as a mediator to rewrite user queries to local schemas. "Data Schema Mapping in Filtered-Push Project" Specific Clio examples of local-global(DWC) mappings.

Bertram: Tough for us to reinvent the (complex) wheel on mediation. Not easy to find an off the shelf query mediator that could do the job for us.

What extent do we need a mediator approach as opposed to a catalog/warehouse approach.

Bob: This at best is a configuration question. Some people using a FP network might have a requirement to have distributed data, others might use a centralized instance.

Paul: Two kinds of queries: -- SQL style selections, warehouse may be natural here -- updates via annotations, e.g. simple corrections: here is a new value for field X; new identification for specimen, add this new info

Bob: Garison Keeler on Unitarians, 6 commandments and 4 recommendations, responsibility for maping is in the client, and client may take whatever is handed it wholly as a recommendation.

Bertram: Generate updates in response to an annotation, craft some examples of update scritps. What kinds of issues come up when we try to do such updates.

Bob: The resulting database should have some provenance about why the record was changed.

Bertram: Good to have some undo ability, unwinding local changes. Interesting to review patterns of accumulation of provenance information over time.

Bertram: Desire specific examples for annotations that would trigger update queries (providing understanding of

   * Short term development targets


Bob has started cleaning the UMB svn repository, checking the licesning statements in the code, probably a week or two of work to finish.


   * SVN to Sourceforge 
   * Milestone: Public access to web interface.  Due in 5 days (02/28/11)  


James: Ask James Cuff about temporary storage while waiting for firuta.

   * Milestone: Three Node Alpha Due in 5 weeks (04/01/11) 


Two nodes. UMBFP, Firuta, third node?. Bertram: Would be good to have one node here at UC Davis, so Lei could work with it directly (if such a node is available) (is there an available machine there?) Bertram: in the short term, we could "scavange" an existing machine. But having a dedicated server later might be good. (once we put some real data on it)

Bob: One question needs to be settled before starting APIs: how the client obtains answers to queries?

About 3 weeks to write the APIs from the documents at this point. About a month beyond that for code.

Zhimin: Can have dumb components early on, add more complex later.

Bertram: Time for tech meeting: (Friday?) - 3:15pm EST ( = 12:15pm PST to avoid conflict w/ (Bertram's) group mtg) or - 12pm EST (= 9am PST)

Fridays 3:30 EST, 12:30 PST for tech meeting.

Jim: Good to have a current FP poster.

We just need to revise.

Bob: We just gave Dave V. a couple of good overview slides for DataOne.