2011Jan18

From FilteredPush
Jump to: navigation, search
User: Chinua Iloabachie | User: David Lowery | User: James Macklin | User: Bob Morris | User: Zhimin Wang | User: BertramLudaescher | User: James Hanken | User:
"User:" cannot be used as a page name in this wiki.
| User:
"User:" cannot be used as a page name in this wiki.
| User:
"User:" cannot be used as a page name in this wiki.
| User:
"User:" cannot be used as a page name in this wiki.



Etherpad for meeting notes: http://firuta.huh.harvard.edu:9000/FP-2011Jan18

Reports

  1. Lei and Bertram (UC Davis)
    1. Developed e-bird count limit validation demo workflow and associate number range checking actor to test whether the bird observation count exceeds the limit of that species in that region.
    2. Documenting actors and demo workflows in kepler/curation package
    3. Read data schema mapping materials and paper (Clio: Schema Mapping Creation and Data Exchange).
  2. Zhimin
    1. With Bob and Paul, re-factoried the query processing diagram of the three node network. Made a new sequence diagram for annotation insertion and notification dissemination and a new architecture dealing with the scalability of publishing annotations.
  3. James
    1. Produced a draft of the final project report for the FilteredPush Prototype project, circulated to Bob and Paul.
  4. Paul
    1. Working on revision of final project report for FilteredPush Prototype project.
    2. Revised taxonomic/nomeclatural status terms ontology, finished conversion of singleton classes to individuals typed to skos:Concept. http://webprojects.huh.harvard.edu/ontologies/ still needs substantial work on restrictions.
    3. Working with Linda to get approvals to release a subset of MCZ Ichthyology records to Bertram.

Agenda

  • Progress on e-bird count workflow and discussion of quality control use cases.
  • What kind of query is supported by the network. SPARQL, Tapir or Tapir like schema?
  • Progress on the three node architecture and implications for workplan for next 3 and 6 months.
  • Requirements for bi-directional network exchange schema to local database mapping.

Meeting Notes

FilteredPush meeting notes. 2011 Jan 18.

Present: James, Bertram, Bob, Paul, Jim Hanken, Zhimin, Chinua, David.

Next week TACC will have webex available for us.

Agenda

  • Progress on e-bird count workflow and discussion of quality control use cases.


Bertram: Lei has made progress working on an e-Bird count quality control workflow. This is an implementation based on a use case with Steve Kelling (of e-Bird fame) for data quality control.

Paul: Do we have a good understanding at this point of use cases for data quality control?

Bob: In the sequence diagrams we have been working on, we seem to have a good understanding of the messages that carry data, but need to work on ones that operate on the data. We need to consider if we we support humans assertions of the quality of the data, or machine assertions or both. Do we have measures, criteria, for the quality of the data? One use case, the human annotating the data, is effectively the human asserting the data is high quality. We haven't put a place in the use case model for where the criteria should be hung. Perhaps we do need to go back to the use case diagrams and make sure we haven't asserted that data quality is something everyone knows.

Zhimin: There are sentences in the scenarioes about the user specifying criteria for data quality (http://www.etaxonomy.org/mw/Use_Case_Scenarios)

James: There are both qualitative and quantitiative assertions we can make about quality. Three general cases: quantitative, subjective, and both.

Bertram: We need to go back to the scenarioes and prioritize the criteria proposed therein.

Action Item: Bob, Paul, Zhimin to meet this week and review Use Case Scenarioes and Use Case for data quality control.

  • Progress on the three node architecture and implications for workplan for next 3 and 6 months.


Bob: Substantial progress. We are about half done. We have good sequence diagrams about asking about data. Close to APIs specified enough for Lei to start coding to ask questions of a FP network - how a workflow should ask questions. Now need to schedule Zhimin's work to Java code for the interfaces.

Zhimin: Progress on three node network needs copy other than GBIF cache to query.

Bob: Suggest we split out parts of the cache to represent local nodes.

Bob: We need to be constantly testing right now whether what we are developing is general or specific to specimen data.

  • What kind of query is supported by the network. SPARQL, Tapir or Tapir like schema?


Zhimin: Tapir suffers from an issue of recursion down the paths to potentially infinite depth. Also, limited range of possible queries. ABCD explicitly avoids recursion and crossreferencing. In some cases you can specify that you have a fixed number of layers, but in other cases, the layering isn't fixed. Example, reprentations of taxonomic trees (as fixed heirarchy or edge representation).

Bob: Yes, had to address in nature serve by setting arbitrary depth limit. In ITIS, encoding in an node is the parent. Client side code will need to be prepared to do recursion.

Bertram: We should look at the particular questions, there may be ways to get around the problem without general support for recursion in the query language. XPath is a very simple language for querying xml, has // as a means for working around not knowing the length of the path. Many possible languages and technologies we could use, we should look at which are well supported by the languages we are using, and which support well our particular use cases.

Bob: This means we would need to know which nodes could support which query languages. Give the query agent the opportunity (perhaps in a later grant), if you can't answer my question in this language, answer it in german, and I'll worry about finding a translator. The world that needs our immediate solutions, we could do worse than figure out which queries a Tapir service could suport. Is Tapir a minimal expectation for a collections based FilteredPush node.

Jim: Are there people responsible for maintaining Tapir? Task for Bob: find out if GBIF has formally published lifetime of services it operates or supports.

Bob: Yes, substatial installed base of DiGIR and Tapir providers in GBIF, and GBIF has too much at stake to not have support for Tapir. Would be nice if GBIF would publish expectations for long term support for particular software and technologies.

James: What is the interaction between Tapir and the DarwinCore star schema?

Zhimin: Tapir is independent of the schema, just needs query along specified path in a schema space.

Bob: Without being a Tapir server you could fairly easily answer questions in the Tapir response language.

  • Requirements for bi-directional network exchange schema to local database mapping.

See paper on Clio pointed out by Lei for data integration (www.almaden.ibm.com/cs/people/fagin/mylo09.pdf). Also Orchestra (www.cs.ucdavis.edu/~green/papers/sigmod07-demo.pdf) Multiple existing prototypes. Bertram proposes contacting several folks working in the area and seeing if they have usable software that we can build on.

Possibility of working with Chris Jordan (sp?) at TACC on data integration issues.

Bertram: Would be nice to have a paragraph describing what we are trying to do in mapping between a local schema and an exchange schema to have a partcular example to circulate for people to recommend particular systems.

Action Item: Bertram and Paul to draft this paragraph.

Bertram: DatabaseLanguage? allowing specification of mapping in both directions. They have a prototype (language for updatable views).

Relational Lenses: www.cis.upenn.edu/~bcpierce/papers/lenses-etapsslides.pdf This is part of the Harmony Project, which produces lots of papers and an open-source prototype it seems: Harmony: http://alliance.seas.upenn.edu/~harmony/old/index.html


James: Would be good to have a place where we can document large number of schemas in biodiversity informatics.

  • Report on wiki crash

Wiki down for three days after failed update, rolled back to backup of database. Need to recover data around last weeks meeting (meeting notes, sequence diagram, categorization of citations). Interest in maintaining wiki is semantic extensions.

Next meeting 2011Jan25 Tuesday Jan 25