From Filtered Push Wiki
Jump to: navigation, search

Etherpad for meeting notes: http://firuta.huh.harvard.edu:9000/FP-2011May17


  • Agenda for Meeting at UC Davis
  • Review Web Presence Requirements
  • Specify as a FP Client


  • Zhimin:
    • working with Bob on design
    • help Lei on demo
    • gave an instruction of prototype system to Maureen
  • Maureen:
    • followed Zhimin's instructions to get FP code checked out, built, and running
    • followed Lei's instructions to get Kepler FP demo suite checked out, built, and running
    • followed Specify's instructions to get Specify trunk checked out, built, and running
  • Tim
    • Began looking into technical issues around mapping annotations communicated via FP to queries on connected databases.
    • Reading about data-mapping tools.
  • Lei
    • Finished the function to identify the collecting event outlier by using the local dataset and the data queried from the Filtered-Push network. Now we have two access points to Filtered-Push network in the SPNHC demo.
    • Added actor to send text message to notify curator the incoming curation request.
    • Making a backup video

Transcription (by Maureen, this is not a precise recording or synthesis of what was said)

Filtered Push Team Meeting 2011 May 17

Present: Bertram, Maureen, Zhimin, David, James, Lei, Tim McPhillips, Jim, and from Specify: Jim, Rod, Tim, Ben; and also at HUH today, Bob's former students Josh and Matt


1. Specify as a FP Client 2. Agenda for Meeting at UC Davis 3. Review Web Presence Requirements

Introductions. Can somebody please transcribe (@ Harvard?) Thanks :)

Transcribing (Maureen).

at Kansas: Jim Beach, director of Specify project, head of informatics. Rod Spears, Specify. Ben. Tim Noble.

at Davis: Bertram, computer science. Lei, in Bertram's group. Tim McPhillips, project scientist.

Bob: 3 items on the agenda. discuss order? could put review of web presence requirements last. goal for Specify topic: how to get functionality from branched version into Kansas' repository.

Should do Specify first? (agreed).


Which Specify people at SPNHC? Just Andy.

Bob: what do we need to do to get Started? Rod and Maureen...

Jim B: don't have text of proposal. we're open to all possibilities. what has changed since you submitted that, Specify was awarded a grant. Ben has been hired to design and implement. We're calling it SGR, scatter gather reconcile. Most of you have seen report to NSF. Interested in releasing as production software to allow people to discover and annotate their own collection records as a component of our thick client. That functionality would work off network caches like GBIF to discover duplicate records and locally compare and reconcile them with locally matching records.

There are some natural ways that functionality could interact w/ FP, we should talk about that.

Bob: there's another relationship that Lei will show at demo camp. In ways that can be independent of FP networks, she has a Kepler based workflow architecture for specimen quality control. Important to get Specify input on how that applies to SGR.

Jim B: Ben did a demo at Chicago

Bob: we need to know what functionality we're all working on. We're committed to having Specify launch annotations into FP network. What makes the annotations separate question.

Jim B: we're interested in an interface for that

Rod: did a demo at Woods Hole for TDWG, hadn't changed much between then and Ben's demo

James: there are parallels, we're using the same resources like GBIF cache. Dealing w/ GBIF cache is difficult, data is dirty, putting that burden on users is something we don't want to do. You guys have had success accessing that data with speed. The key for us is annotations, what comes after SGR, we want to give back to the community what works with those records, we need to plug in with that piece. have to figure out how to exchange annotations. What we learned at Harvard was that for botanists to use Specify there are things to be filled in, like containers by Rod.

Rod: from what Jim said, it sounded like FP a lot more interested in making people aware of annotations taking place rather than the creation of annotations.

Bob: right

Rod: all for that kind of infrastructure

Jim H: not clear if SGR is to work for Specify only?

Rod: initially, for Specify, but if it becomes a set of web services, anyone could plug in. we'll make available APIs. If we do this with specify, there will be a lot of users

Bob: don't yet have a clear vision of what we need to do on our side to help decide where we make Specify in general and SGR in particular clients to launch stuff into an FP network. there's a corresponding question of understanding your architecture for whether SGR has notification mechanisms, whether those can be invoked for SGR to push things into FP queues

Rod: by end of year release, someone would have a stack of herbaria sheets to catalog with min info for a label, and then be able to search w/ SGR to find duplicates and be able to take that info and create / select records combining various fields from info they got back, for ex. lat long, may not be there but can tell the item is a duplicate, can save to database. what happens then is it would be an annotation, could put that on FP network, then whoever's registered could find out that this person created this record for this specimen and accept that info into their collection.

Jim B: from your side, could you talk about interaction points with Specify? a web client?

Bob: that's a client by which you make annotations on records that are fetched by a back end, so it's another kind of FP network client independent in principle of Specify

Rod: so anyone with a non-Specify client could do that?

Bob: yes, in principle they could annotate any record that could be fetched by a web client host, they just have to be authenticated

Rod: how to get info back into own collection?

Bob: they would have to use some client

Zhimin: first vision, you manually just manually download csv to incorporate into local db, another is to use web client to do that

Rod: you do work on web client, save work to csv, import to db?

Zhimin: yes

James: you've created the workbench, could csv it to Specify. the other problem is the two way mapping of databases, have made progress but a difficult problem

Rod; with Kepler, the interface?

Bertram: no

James: it's hard to say what the relationships would be without understanding some details

Bob: is SGR in any form that we could download something and play with it, related to anything Tim will show?

Ben: at this point, it's alll on the workstation, in theory could be demoed in current state, but has certain other requirements

Bob: two goals on the table, are they high-enough level stated? One is because we believe network participants will want to use Specify, we need a version of Specify. that can launch annotations into FP network and can also read from FP network when notifications arrive to be able to do something with them. One of the things we're committed to is we are going to deploy a duplicates network, a functional one, among a bunch of biology clients, guessing all based on Specify as the management system. How are we going to connect FP2, we're just in the process of formalizing apis with FP as a client, the other thing is how we can exploit the value of what you're adding with things like SGR?

Bertram: can't comment on SGR yet, but to the first point, that's where I see symmetry and value for both teams, with all the Specify installations out there, wouldn't it be lovely if they could be connected in lightweight manner. whether there are plans to federate Specify db? FP could be used for annotations transport, there is value in that, why not empower scientists who already use Specify to tap into FP network. working together on that would be beneficial.

Jim B: agreed on that. we have no imediate plans for federation, this makes perfect sense to use FP for that purpose.

Bertram: could you say something about SGR, what it is?

Jim B: it's a two or three capabilities we conceptualize as modules at the end of a data-entry workflow . take partial records from a database and discover pre-existing duplicate records in other databases. currently working w/ GBIF and db from Mexican institute Connabio. Mexican plants in Michigan herbarium

Bob: so what that tells me is that on our side we need to ... there's three things we need to find out how much overlap they have. One is what Lei ahs done, the other is SGR has overlap with , the third we know we have to make a particular FP network that does this-- what FP does about things that are dups or have missing data.

It's wrong to say FP has a scientific goal to manage duplicates. Its scientific goal is to manage annotations, makes sure they come from and go to the right place. One use case we're committed to is identifying missing and bad data. One of the things in particular about a FP network in our old and new architectures, there's a part of the system called Triage which has a responsibility for figuring out what kinds of services are on the network. For example, there are places where a triage module might try to say "SGR can answer this." Triage module might invoke that.

Everything is bidirectional as FP networks are concerned. Sometimes clients invoke network and sometimes vice versa. What are the plans for interfaces?

Rod: FP 1 has this aspect of how the night before you figure out what to search for, and then go out and get the answer and the next day the results are in. In FP2?

Bob: yes, query and annotation launching both parts of the architecture. there's a lot out there not accomplishable instantaneously. you might ask for synchronous request, you get back an answer thats "here's what I got so far," "here's an address for notification when more data available"

Rod: how is that different from DIGR? one way that failed is that you could ask the same question twice and get different answers

Bob: we're talking about annotations, a general architecture. part of the data quality story is , yes things are changing, and the story changes from day to day. This phase of FP we informally call "continuous quality control" once an answer comes back, people can be notified of the change, including data that previously had no known connection. One of the things that 's wrong with digr is what wer're exploiting

Rod: in context of annoations that makes sense, but not in terms of searching for data

Bob: le'ts put that on agenda for future meeting

Rod: the Harvard data model for Specify had forked. talked at Woods Hole about making changes to begin to bring it back to trunk

Bob: we should talk about what that involves. we should start something at which Maureen and you look at current state of your trunk, what is necessary for us to functionally have what Maureen had in mind when we finished old project. Don't necessarily need to merge. sense from Maureen we don't need to merge, maybe there's only a week or two of work to look at merger direction. Perhaps we can just stash our divergence. We'll just re-do them from current code base. Might be easier for all of us?

Jim: can Maureen come to Kansas in June?

Maureen: I'm OK with it.

James: like to see baseline botany functionality, like to see interface for the containers, in next version?

Rod: no tree view of containers, but UI for adding containers to containers, probably shipping tomorrow. Also shipping colleciton relationships.

Bob: at UMASS Boston we're collaborating with some anthropologists on a pollinator grant that specifies Specify. When you say shipping, does that entail sourceforge site update? Branch on svn?

Rod: we' haven't done tarballs for release, we could. we'll make a branch for the 6 3 0 release, let trunk continue on, bug fixes would be on the branch. and trunk. if we need a tarball we can do that. does have anonymous checkout

Bob: we have near term things for Maureen to do, talk with Paul about what they are, try to figure out what functional overlap between what Lei's doing with Specify

Rod: I would be interested in hearing about what Kepler's being used for and how

Bob: we could ask them to show a demo, maybe record the SPNHC demo?

Bertram: there's the particular use of Kepler in FP, there are earlier demos available (videos), there will be another as a backup for demo camp. for this community, there's Lei's project.

James: Rod, is workbench separate or incorporated in this version?

Rod: still have standalone version that ships, but with 6 3 integrated in Specify. you could just use the workbench within Specify, with 6 3 have added a lot of validation to workbench before upload, color coding, uses your own authority files to see what taxa are in there

(Specify folks have to leave the meeting at this point)

Bob: schedule for Davis

(discussion of travel plans)

Tim: what kind of agenda? between very concrete engineering discussions, through pbulic relations, where on that spectrum do we expect to be, or all over the place?

Bob: probably all over the place. one of the most important things is schedule for APIs for you to start writing against in FP2. what you have now is FP1. there is always the issue of bidirectionality, what can we do, you may already have APIs. If we invoke a workflow from Triage, what are we supposed to do? Could have half hour talk at beginning of high level architecture. There are three high level UML diagrams, sequence diagrams about fairly abstract things. Couldn't make code from them, but could be a good place to start.

Bertram: two things to start with, one would be to see where we are w/ respect to SPNHC demo, see what the data is and what needs to be done in last days remaining, the other is to take a step back at the overall milestones and see what is to be delivered by FP quickly, narrow down the many use cases to those that would be used right away

Tim: a good opportunity to make sure we all understand what everyone's talking about. easy on phone calls to gloss over things to look back later. one possibility on the more engineering side of things is to pick a usage scenario that would show most of the functionality and work through it and have people record what needs to be done. what aer all the implications for messages? what are the implications for API for client? each of us could have responsibilities for different questions. hard to do by teleconference

James: if we are going to talk about things where Maureen and Zhimin are available, need to schedule time with them

Bertram: 1-2 pm local Pacific time on Webx? have something for them to look at then

James: if there are questions, specific conversations where they have the information

Tim: most valuable if discussing engineering issues, what are they really trying to do, build a shared understanding. any video available for whiteboard?

James: Thursday maybe time for us to work with Lei, SPNHC demo, Friday might be best to do what Tim suggests.

Bob: URL for 3 high level diagrams into chat


Bob: let's put Tim's item on Thursday agenda, with tasks and schedules.