2011Feb08

From FilteredPush
Jump to: navigation, search

Etherpad for compiling meeting notes: http://firuta.huh.harvard.edu:9000/FP-2011Feb08


User: Bob Morris | User: Zhimin Wang | User: BertramLudaescher | User: James Hanken | User: Paul J. Morris | User: James Macklin | User:
"User:" cannot be used as a page name in this wiki.
| User:
"User:" cannot be used as a page name in this wiki.
| User:
"User:" cannot be used as a page name in this wiki.
| User:
"User:" cannot be used as a page name in this wiki.
| User:
"User:" cannot be used as a page name in this wiki.



Agenda

  1. Progress on mapping. Limitations.
  2. Possibility: Use Clio's schema mapping language and implement a subset of functionality, incrementally adding broader support over the course of the project?
  3. James Macklin status of FNA and Hong Cui.
  4. SourceForge
  5. Review coding standards


Reports

  • Lei and Bertram
    • Read papers of Orchestra. It extends Clio’s data mapping idea. Its support for update propagation and provenance might be very useful for us.
    • Read papers about data integration, especially query rewriting. "Constraint-Based XML Query Rewriting for Data Integration" (SIGMOD 2004) introduces a query rewriting algorithm based on Clio’s mapping rules which might be useful for us.
    • Discussed with Zhimin about issues we need to consider and academic idea and techniques we might use.
  • Zhimin: Reread several papers on schema mapping and work on formalizing schema mapping requirement.
  • James:
    • Cannot currently use the EtherPad application as the port is blocked. I will put in a request to have it unblocked but this could take some time...
    • Made contact with colleagues here who are informatics minded and will present a seminar on Feb. 21 where FP will be featured prominently.
    • I will attend a two-day meeting in Ottawa to discuss and write a major research proposal to build on the Canadensys Project. I think there is a great opportunity for the Filtered Push network to be used to aid with capture, quality control, and promoting research use of the data.
    • I will look into the WebEx equivalent which is used here for collaborative work. Maybe if I control the meeting we can at least fix it if we have issues...
  • Paul
    • Created SourceForge project for FilteredPush https://sourceforge.net/projects/filteredpush/. Bug filed in Trac for migration of code repository from UMB to SourceForge. https://sourceforge.net/apps/trac/filteredpush/ticket/1 Both SVN and Git are available, plan is to migrate the existing repository to SVN, we shouldn't commmit into there until Bob has migrated the repository. Bugs can go into Trac https://sourceforge.net/apps/trac/filteredpush/ . Still need to set up the SVN-Trac integration. Mediawiki, codestriker for code review, and dotproject for project management are all enabled. We have mailing lists for filteredpush-announce and filteredpush-commit.
    • AnneMarie has heard back from Dell, but thus far just with a message that the question is being pushed to an engineer. It isn't clear if firuta can accept disks of different sizes or whether they all need to be the same size.
  • Bob (reporting from Karlsruhe):
    • Further work on the suitability of (a small extension of named AOD, Annotation Ontology for Data. Though maybe DAO would be more mnemonic and more charming, since DAO is an alternative spelling of TAO) Paulo Ciccaresi's Annotation Ontology (AO) Refactored the "Correct a spelling error" annotation example so that it sits more or less alone, i.e. can import the AOD rather than being part of it. Protege is very unfriendly to managing Individuals...it mainly wants to model classes and properties. One thing that happens is that whenever you making an assertion about an Individual A that uses a property P from an imported ontology, the resulting serialization always(?) includes a declaration of that property. In real data you probably wouldn't have nor need this, and for these examples, it is more annoying than important. Alas, if you construct the examples with another application, e.g. an editor, and neglect to declare the properties, the Protege OWL input parser will not accept them, even though there is an import statement. No doubt reasoners are not trying to model data as an ontology, whereas Protege is, so this is mainly a pain for constructing one-off examples.
    • Refactored the Wiki Ontology page in preparation for more discussion of DAO (the ontology formerly known as AOD).
    • Second example (annotating an image) to provide a taxonomic determination is also there. Third one (not there yet) will be the same determination example, but instead modeled as an annotation of Audubon Core image metadata, rather than annotation of the image data. Many questions for Paolo.
See AOD_Extension_of_AO_for_Data for links to the files. Will probably put them on the wiki when the third one is done. Welcome suggestions on that page for radically different annotation scenarios than annotating a DwC record.

Notes

Present: Zhimin Wang, Jim Hanken, James Macklin, Bob Morris, Paul Morris, Bertram Ludaescher

1. Progress on mapping. Limitations.

Bertram: Clio, Talked with Lucian Popa at IBM, Clio not freely available. Talked with several colleagues about possibilities (Orchestra, other tools). Data warehouse possiblity, ingest minimal data, warehouse, keep annotations on warehouse. Several things in academic community, none appear production ready.

Bertram: Lei was told her visa was approved; should be back next week.

Bob: Are we talking here about use cases that involve specimens?

Bertram: Same/similar question: Are we developing an integration solution for the specimen dbs (w/ or w/o annotation)?

WebEx disconnected.

Zhimin: See http://etaxonomy.org/mw/Use_Case_Scenarios#Scenarios_for_schema_mapping for examples.

Bob: Where in the architecture does this problem arise.

Paul: Boundaries, movement of annotations into curators data stores.

Bob: Didn't sleep on plane....

What are the requirements to allow all the users of a particular exchange schema to map their data onto that schema.

Bertram: Example?

Paul: Collector concept to CollectingEvent-Collector-Agent relationship, with single concept mapped to multiple rows in tables.

Discussion: Specify6, MCZbase, Arctos, mapping. Plan make a prototype mapping elements of darwin core onto a subset of a complex schema. Explore limitations and options.

2. Possibility: Use Orchestra's schema mapping language and implement a subset of functionality, incrementally adding broader support over the course of the project?

3. James Macklin status of FNA and Hong Cui.

4. SourceForge https://sourceforge.net/projects/filteredpush/

5. Review coding standards: http://etaxonomy.org/mw/Production_Coding_Standards

6. Brief report on AO.See http://etaxonomy.org/mw/2011Feb08#Reports

Bob: Hope 6 can go first as I'm not sure how much longer I can stay awake and have to review my notes on Guido's thesis before I go to bed. Not much sleep on flight.

Bob:Re 4. I will move the repo next week

Bob: Re 5. I'd like to see every file have svn $Id$ be accessible programmatically, e.g. with a get from a class static variable. This should be done in a standard way that we need to discuss (not at this meeting, maybe in the discussion page for Production_Coding_Standards


What's the plan?

We'll need a list of phone numbers as a backup....