From Filtered Push Wiki
Jump to: navigation, search

Etherpad for meeting notes: http://firuta.huh.harvard.edu:9000/FP-2011Jun21


  • Annual Report. Have all contributions in by this Friday.
  • Goals for TDWG meeting.
  • Progress on Sparqlpush and PubSubHubbub
  • Progress on pipes/Camel


  • Jim
    • Emailed out current version of annual report. Areas needing further work are highlighted. The Davis team needs to enter their information ASAP.
  • Bob
    • Assisted with annual report
    • Reading 2nd Edition of Allemang and Hendler, Semantic Web for the Working Ontologist,
    • Reviewing FP Network architecture
  • Maureen
    • Got Sparqlpush and PubSubHubbub working! Woo!!
  • Zhimin
    • Working on implementing pipe line using Camel for network
  • James
    • Made contact with JSTOR re FP API to JSTOR Plant Science resource which includes type specimen data, images, and connection to literature. They are currently working on a mechanism to to push back annotations to the collection where a comment (=annotation) has been made.
    • Got permission to hire a programmer to support my research. Hopefully in place by Fall sometime
  • Lei
    • Working on Kepler/Kuration package release


Filtered Push team meeting 2011 June 21

Present: James, Lei, Maureen, Bob, Zhimin, Paul, David.


  • Annual Report. Have all contributions in by this Friday.

Lei, circulated draft of Davis additions. Tim will flesh out further today. James made some edits to draft.

Bob is willing to migrate text from google doc to fastlane on Saturday.

  • Goals for TDWG meeting.

James: Call for sessions(?) imminent.

Bob: Sent out messages to interest group conveners asking if they want meetings.

Paul: Hasn't seen any such message yet.

James: Two pieces: Annotation Ontology (working group), ?presentation to larger group?

Paul: a single working group session.

James: second, theme digitization, how to fit in FilteredPush. Kepler workflow relevant.

Paul: by then we can demonstrate it over a simple network. what dates are we looking at?

Lei: Kepler workflow with FilteredPush, what difference from SPNHC demo?

Paul: Very similar presentation, very different, audience. James: the audience will be more technical, people who are interested in standards.

Bob: the audience might be interested in the FP2 architecture also.

Paul: Bob and I should pursue the annotations interest group.

Bob: Paolo said his schedule permists him in mid-July to get back with us. we should plan on writing a paper then

James: Not suer about what to do with applecore and tdwg. Poster? Not sure where a talk would fit in. AppleCore is important too.

Paul: AppleCore is a story, talking about FP technology,

Bob: AppleCore terminology?

Paul: AppleCore is the guidance document for herbaria on how to use Darwin Core. FP is something else. Maybe James and Jim can sound people out about TDWG.

James: OK. ApplePie for the network?

Paul: if you put in a bottleneck session, we could put an FP architecture talk in, and Kepler demo. Bob and I will do annotations, James and Jim can do non-annotations.

Bob: Plant ontology (should) commit to covering morphological characters (saving duplication of work. FNA would be motivated to not wait around but propose an extension to PO.

James: from discusions in boulder, they are open to anything like phenotype, there will be a discussion this afternoon about what goes into pato (?). will find out how low-level they are willing to go.

Paul: serated? would that be a term in pato?

James: don't know. shapes are difficult.

Bob: he would probably accept "oblong." not a shape, though, is it?

James: it's a descriptive character of an edge.

Paul: it's a shape descriptor of a line rather than shape descriptor of an area.

Lei: if we want to show Kepler in this meeting, will we do a demo or technical talk or poster... ?

Paul: in the past tdwg has set up a venue for posters accompanied by computer demos, where people will schedule times for being in the poster hall with the computers running the demo. Poster session as software demonstration, rather than poster per se.

Bob: we can keep the audience attention focused properly if we demo FP as a service of Kepler as well as the other way around. It would be interesting to show that use in the case of continuous quality control. Simulate discovery of a new data set, (or have message sent "here's my new data set"), an annotation on the queue represents a trigger to re-run on the new integrated data.

Paul and James like that idea too. For TDWG might be ambitious.

Bob thinks maybe if we use concocted data it might not be too ambitous. Let's leave it on the table but not discuss it now.

Paul: that would get Lei back into programming for the network, making Kepler an analytical node. Put on agenda for next week for further discussion.

Bob: if we're not after showing absolute generality, it would be a good target.

    • Action item: Paul and Bob to follow up on annotations with TDWG.
    • Action item: James and Jim to follow up on FP with TDWG.


James: preliminary, but said at ne herb consortium, one o fthe main goals this summner is to write inside their env a way for people to make comments abou t specimens, determinations, e.g.; they want to get that back in a structured fashion to the owners. right now they have an internal system that you log in to, and the comments come as email. people have been complaining that email isn't so useful. that is exactly the fp problem to be solved. James suggested they work together. top priority is Hong's proposal, but will get back. we could guide them in how to structure

Paul: they would want to implement a network or join the apple pi network

Bob: they'd be a good apple pi node

James: hopefully I can convince them. they need to get people into jstore that will pay for it.

  • Progress on Sparqlpush and PubSubHubbub

Maureen: All turned fun when I got it to work.

Working network with one PubSubHubub server and one triple store endpoint. Load tripples in to store, set up queries, new data matching query results in push of notification through pbuSubHubub server. You register a callback with the pubsubhubhb server, and it notifies you on a new piece of data.

Bob: Fantasic for me. Having gotten second edition of hendler book (with nice chapeter on sparqul). Would like to be able to make an annotation (in any query language) that takes the form: These kinds of returns,,, These should be the properties for any valid answer to this query. Any valid answer to this query meets these requirements. Anyone making such a queery is expressing a desire for something in the internal logic of the triple store. Would be good to try to formulate annotations in these terms using the system Maureen set up.

Maureen, if we have time at end of meeting we can look at screenshots.

  • Progress on pipes/Camel

Zhimin: still doing some preparation. focused on the concurrent part. if we do pipes we need scalability. they already have quite a bit of work done. another nice thing is that camel has some builtin async abnd sync communication, probasbly we only need to wrap it. read a chapter of the book, and it seems to be addressing the problems we have. our problems are typical data integraton problems.

Bob: we'll have a better sustainabilty story if most of our code is just calls to other peoples code.

Paul: it introduces dependencies too

Bob: we should choose other code wisely and consider other peoples update schedules

Zhimin: adapting camel would not be at the system level but at the programming level. Paul: when some msg comes into a FP network, some component in the network says "these tasks need to be strung together in this order to deal with this problem." that task list is handed off to a camel-supported structure to pipe relevant input/output into the right pipes.

Zhimin: sometimes you want parallel, sometimes pipes, you can parallelize part if you want

Paul: contrast to hadoop, using it as a backbone in a similar role, it was parallelizing by default without the flexibility

Zhimin: it is for enterprise integration pattern. part of our story is EIP.

Paul: feels relevant. we could see an interaction between that system and the sparqlpush components, having them included somewhere in the pipes.

Zhimin: add a record for sparql push somewhere in the pipe, "listen to this topic," e.g.

Maureen: showed sparqlpush screenshots

James: vertnet is using pubsubhubbub, should we be talking to them?

Paul and Bob: we could interact with them, depending on security requirements, perhaps ssl would be sufficient

Zhimin: the subscription occurs with ssl, everything follows that key event. you have to trust the last mile

Bob: in one regard that's not our problem, but if there's no solution to trust the last mile it is our problem. I doubt apple pi has a requirement to be secure against snooping on the wire. most of the access control is authorization

Paul: one of the known threats is people listeneing on the wire for endangered species data. there requirement is that all endangered species data is encrypted while in motion. It would be easier to not try to figure out which data this applies to and just encrypt all data. we don't have a requirement in apple pi for data at rest to be encrypted. the network consists of authorized users, authorized users have access to all data.

Bob: "there's a change in species x which we know we can't communicate to you on this channel, please log in with the proper credentials" decryption keys passed on a different channel.

Paul: a likely problem is low-level interoperability, if the place we want to interoperate at is annotations

Bob: encryption is just another kind of opaque data. Next meeting, overview of architecture, 1:30 eastern time tomorrow. We may have a webex session, maybe not. This will be a different webex link than usual, Paul will send it out when he has confirmation. Meeting adjourned.

Lei: Formulating questions about TDWG.