2011Jun28

From Filtered Push Wiki
Jump to: navigation, search


Etherpad for meeting notes: http://firuta.huh.harvard.edu:9000/FP-2011Jun28


Agenda

  • Caution on BOUML and SVN.
  • Goals for TDWG meeting.
  • Brief summary of architecture review.
  • Role of RDF/OWL in architecture.

Reports

  • Paul
    • Began review and cleanup of AOD ontology examples.
  • Lei
    • Finished coding part of the Kuration package and began documentation
  • Maureen
    • Got Specify 6.3 up and running
  • Zhimin
    • Refatory network arch
  • Bob
    • More UML review
    • Preliminary configuration of Specify SourceForge Trac; Further discussions with Specify team about same. SF does not support adding fields to tickets, so some talk of using a local trac.

Notes

Filtered Push Team Meeting 2011 June 28

Present: Maureen, Paul, Bob, David, Zhimin, James, Jim, Lei, Tim

Agenda:

    Caution on BOUML and SVN. 
    Goals for TDWG meeting. 
    Brief summary of architecture review. 
    Role of RDF/OWL in architecture. 

Notes

  • Caution on BOUML and SVN.

Bouml and Svn: we have confirmation that Bouml is not friendly to collaborative editing of files; it caches some of the files it checks out, which causes synchronization problems. If you change the names of UML objects, they are properly changed everywhere the names are used, however, that can lead to people overwriting each other's updates. There was an instance of this that required Zhimin to re-do some work. Checking out is not a problem, committing is.

The procedure to use is to do svn operations while Bouml is closed. Do an svn update, open Bouml, make changes, close Bouml, do svn update and commit. That is not enough if there are two or more people working on the same diagram.

The problem arises if you have Bouml open when you do an svn update. If you then save your work in Bouml, it will overwrite your svn updates with cached data-- it does not check to see if the filesystem has newer information than what is in memory.

The way to avoid this is to not have Bouml open when svn updates and commits are done. If you are working on the Bouml project, please stay in contact with the rest of the group.

  • Goals for TDWG meeting.

How to approach tdwg: two separate tracks, three issues. One track is the annotation ontology track; Bob and Paul will work on that. The other track is presentation of the project as a whole and what we want to present. That is the question for James and Jim, what do we want to do overall. Tied to that is the technical issue of what is a target goal. On the table is an updated version of the SPNHC demo.

Last time we discussed also how/whether the Davis tools could be used to provide a service to the FP network. This reverses the nature of the SPNHC demo, and shows Kepler's engine running within the FP network, rather than FP being a service used by Kepler.

Since tdwg sessions are planned by leaders, we have to find a way to fit in to one of those sessions. The only way to show the demo without it seeming rigged is to use the SPNHC demo approach.

We are only a subset of the members of the Annotation Interest Group. We should convince the rest of them that we need an Annotations session. We should remember that there are other approaches to annotations, and we should be pushing ways for annotations to be interoperable rather than pushing our own needs for AO. Perhaps if someone else, the Australians or BiSciCol maybe, feel comfortable showing uses of AO we could demonstrate interoperability with them.

Paul is cleaning up some of the annotation examples to make them more clearly a part of AO. One of the kinds of annotations Bob hopes to get to this week inolves asking a triplestore via sparql what its vocabulary is-- to demonstrate you need not know that info in advance.

A demo of interoperability might be to begin by asking the target source what its vocabulary is. We would have to ask the other annotation group members if that would involve a lot of work to support and whether they are interested.

We would want half a day for this. The first part would be an introduction to what AO is and represents for the community and getting comments and input. We also want to find out in more detail what other people are doing with annotations. We could put out a call and ask if anyone is interested in contributing to that discussion.

We should ask for one session of presentations to gather current scenarios and their requirements, and then a quarter day of a working group session with those people. We want this in the order of 1) get the "what are you doing" talks, and then 2) a talk about what the annotations interest group is about, then 3) a working session

It is important to make the annotations group not FP-specific.

TDWG is an annual meeting; its scope shifts between taxonomic standards and biodiversity; they try to move the meetings so that they are sometimes in the US and sometimes outside. This year it is the second week of October. James, Paul, Bob will be there, maybe one of our programmers.

It has not yet been decided what sessions will be held or who will be running them. There will be a general call after that "do you fit into one of these sessions?" For TDWG, the abstracts are formally reviewed and released as a publication.

James is following up on suggestions that he work on a group about digitization bottlenecks.

Programmers: if we set a goal of being able to demo the Kepler analytical capability within the FP network, is that plausible by October?

Zhimin says first we need an understanding of what kind of analysis engine interface we need for the network. The invocation is then easy, after invocation, what happens next probably needs more thought. He recalls a package that makes Kepler a web service, but doesn't know how far it can go.

Lei thinks that technically there shouldn't be a problem for Kepler to be running as a service. According to the use cases, we need to work more on the network part. There are a lot of details that need to be refined. If the user says they want the job to be run according to a set of conditions, how do you express that?

We should devote some time to a fallback position in case we encounter difficulties implementing FP2, perhaps using FP1.

Zhimin notes that if we needed to fall back to FP2, there is a concept of a KnowledgeBase which can be adapted with new Spring files.

With the sparqlpush approach, we could implement a scenario in which new data becomes known to the network, that generates a message to a

Kelpler engine which re-processes data. Sparqlpush could have a subscriber on the FP network which listens for this.

A particular use case for the Kepler-as-service demo: (continuous quality control); initially some client calls for the network to run a Kepler workflow, that workflow completes, inserts some annotations and if any data is discovered with the following criteria, re-run the workflow and maybe compare the results in some meaningful way.

Do we want to focus development effort on this use case and demo for the next few months, or do we want to focus on building the core network pieces and working on the mapper?

An alternative for TDWG would be demonstrating crossing the last mile with a mapper, showing an annotation being ingested into someone's data set. That also hasn't been shown before.

It is already known how to run Kepler "headless," but have to make sure that no actors try to invoke Swing. There is no well-defined data interface, so you hve to have actors pull the data-- read a file or whatever. The simplest way to do this is to start a jvm with Kepler in it for each message requiring it; however that comes with overhead (could probably make a stripped down version though). There would be a lot more work to make Kepler's service a daemon process.

Bigger bang for the buck by showing the embedded Kepler in the FP network, or on mapping?

One vote for mapping: very serious consequences for our own goals if we can't map.

Bob wants to know sooner rather than later whether there is a lot of bridge work to be done for using FP1 vs FP2. We don't have an interface at this point in FP1 to say "here is a new data set," nor for a client with old data to insert a trigger. What we can do is inject a msg onto the network that carries some assertion.

Paul gives an example: let's say we have an anotation come into the network that says an objuect has a new georeference. Suppose we now have an analytical capablility that runs an actor that tests for collecting events that don't appear to fit in the track of a collector.

There is some data that represents a collector's track. There is a specimen that used to fall within that track, but the new annotation says that the specimen now doesn't. We could trigger an actor dedicated to determining collector tracks by new georeference annotations. (this perhaps is a story we can tell with FP1)

  • Brief summary of architecture review.

The tech group of us got together and walked through Zhimin's set of sequence diagrams of the new architecture. There are two main changes. Paul and Bob found a problem with publishing and subscription being in the same interface, so Zhimin will separate them. The other change is that we have added to Triage a service called "Discharge," for when we do an async query, how do we release the answer?

Paul's take on the discussion is that we're getting a good understanding of how clients interact in the network, in particular data and whether we should lean more towards data binding, dependency injection, or service discovery. Bob was glad to have some fresh eyes in the form of Lei and Tim. Tim says the discussions were useful for him too, to identify assumptions being made.

  • Role of RDF/OWL in architecture.

Let us put off discussion of this until next week.

  • New business

Bob worked on Specify's Trac in Sourceforge and making it as close to their old bugzilla as possible. Their Trac is lacking one feature in particular: in a standalone Trac, it's easy to configure attributes for tickets. It's static once you configure it, but if you're careful you can change it. That is not supported in the SourceForge Trac. In Bugzilla there were pulldowns for platform/architecture, but that is not represented in the SourceForge Trac. One advantage of using Trac is that SourceForge itself uses it as their ticketing system; so they will have the same problems as SourceForge customers, and will be motivated to support features we all have in common.

At the moment there is also no mechanism for importing local Tracs into SF Trac.

Lei notes that her Kuration package will be released soon, which has actors that depend on services on firuta, how should we handle that? Those services are being made available through Tomcat on port 80, but when we implement FP2 it may be another port.

A suggestion: deploy FP as a Virtual Box. Kuration package users wanting to use FP would download that.

  • Action items:

Bob will open some lines on discussions of interest in the annotations group, after cleaning up the AO examples. NCEAS (nat'l center for ecol. analysis and synthesis), BiSciCol, the Australians, NESCent (phenoscape)

Paul will explain to potential AIG (annotation interest group) members what the goal of AIG will be.

Bob and Paul will provide Lei a more detailed use case for embedding Kepler as a client of the FP network. James volunteers to read over what they've got done. Meeting adjourned.