From FilteredPush
Jump to: navigation, search

Etherpad for meeting notes: http://firuta.huh.harvard.edu:9000/FP-2011Mar15


  1. Introduction, Tim from UCD.
  2. User Community Meeting.
  3. SPNHC abstract.
  4. SPNHC
  5. Report from Friday meeting.
  6. REU supplement.
  7. NSF July deadline that Bob circulated.


  • Paul
    • Finished up example annotation ontology instance document along with proposed changes to data extension for AO. Reviewed with Bob. Proposed changes merged into AOD file, it and the example instance are in svn.
    • Reviewed financials with OEB admin.
    • James Cuff's research computing group can provide us with network storage plus backup for firuta at a significantly lower cost than new drives. Zhimin and I will be working with them to determine the best combination of deployments amongst network storage, vm space, and space on firuta.
  • Zhimin
    • Working on system design
    • Working on a primitive process for mapping annotations into local database
    • Set up a environment for James to introduce FP web demo to Amanda.
  • James
    • Attended 2-day workshop to begin the "Arctic Flora of Canada and Alaska." Strong influence from me and others led the group to agree to try out Scratchpads as a way to develop, manage and communicate for the project. This made me consider whether Scratchpads should be on our list of potential clients...
    • Spent an afternoon discussing FP and demoing the web client with Amanda and Jason at BRIT with support from Zhimin.
    • Got Bertram last-minute invited to the "NSF-sponsored Scientific Software Innovation Institute (S2I2) program workshop to explore the potential for technology to overcome entrenched collections workflows and achieve significant efficiencies in the rate of biocollections digitization."
  • Bob
    • Working with Paul, merged his AOD extensions with mine in support of his data annotation example.
    • Began listing coding practices for ontologies
    • Attended OTS Informatics Advisory subommittee of Science Committe meeting at LaSelva. Lots of data by field station visitors and served in real time by weather stations above and under the LaSelva canopy.
  • Lei, Tim and Bertram
    • Working on abstract for SPNHC2011 and curation workflow demo

Meeting Notes

Present: Jim, James, Bertram, Lei, Tim, Zhimin, Bob, David, Paul


1 Introduction, Tim from UCD.

Bertram: Asking Tim to join the effort at UCD, helping Lei, and helping out with the overall project management.

Tim: Background in chemistry, working on workflow systems. Lots of experience gathering requirements and producing production systems with high reliability requirements.

2 User Community Meeting.

Jim: Plant seed for a meeting. Participant support costs not easy to redeploy, lack indirect costs. Thus have option to schedule another meeting. All hands meeting held at begining of grant. Good target for a meeting would be a select group of people who represent the target user communities for formal or informal discussions, introduce product, and what they would like to do with it. Would want to get on everyone's calendar before the end of the summer.

Bob: Important thing to do. Good to identify a set of people who would be participants in the deployed duplicates network.

James: I agree. Great use of the funds. Limiting factor, how far along will our product be when we have that meeting, what will we have to show them.

Bob: Proposal for the line: There should be something they can go home and start playing with as a real duplicate finding network, even if they have a lot of warts.

Jim: What pressure for date to expend funds? Best to schedule when we have something to work with with the user community.

Bertram: NSF constraints are likely to be soft constraints. Bob's idea of having something to show is good. At same time, generally like to talk to users often and soon. Balance to strike between focused meeting with users early on, users we talk with along the way. For particular target users, good to talk with early and often. Undecided about agressive scheduling of meeting.

Jim: Point well made. Discuss again next week when we have clear understanding of financial contraints.

James: SPNHC also available for discussion with users.

3 SPNHC abstract.

Bertram: Also have questions about meeting next week. For workflow, for SPNHC, who are the users? One approach to use network to improve own data (curation workflow), another where other parties are saying things about other parties data (general set of users).

Bob: We are taking about filtered push as a component of a workflow?

Bertram: Yes, demonstration of pull.

Bob: Two kinds of human clientell, one is people who are already using a specimen management system Kepler has to be able to take specimen management systems.

Demonstration to people who are principly involved in collections. Are allready using some kind of specimen management system. Outer envelope is a workflow, such a demonstration should demonstrate a specimen management system as an actor in a workflow.

Bertram: Next week more technology demo. SPNHC more integrated.

Bob: What things might specimen data managers do: Consult name authorities? Scenario: Someone has a specimen that hasn't been cataloged, a workflow defined converationally, some expert has made a determination of it as a binomial with an authority, collection manager wishes to see if this is the current name (and what it's higher taxonomy is).

James: That's a very specific example. For workflow, perhaps talking more about data sets than individual records. Extract a set of data, push it into a workflow, use authorities (and tests) to check the data, provide output of issues in the data. Relates very closely to curating researcher specific data sets and bringing them into a collection database. Evaluate dirty data set and prepare it for injest.

Bob: By next week we have an informal workflow called "clean up collections data set". Most of the work would be bringing in existing actors to clean the data, e.g. georeference fixes, weakly specified taxon concepts, errors in collector name spelling.

James: Focus next week on efficiency in data capture process. FP usefull for finding duplicates and being efficient. Kepler workflow also could be used to manage larger process.

Bertram: Need a couple/few slides, specific for the target audence for the purpose of efficiency.

Bob: How much overlap with DataOne meeting last week?

Bertram: Probably small overlap.

Bob: Good small set of slides set in DataOne context, should work in almost any audience. Bob will post.

Lei has drafted a first version of abstract, plans to circulate later today for comments.

Friday submission deadline.


James: Also a discussion amongs the botany folks for a minimal set of required fields.


Plan: Meeting just before SPNHC, Paul to stay for few days at end. We'll firm up dates this week.

5 Report from Friday meeting.

Zhimin: Talked about possible architecture for client library, particularly for mapping. Went through a complete scenario from the client side for a user injecting an annotation into the network and another user consuming it. Currently sketching out an algorithm for handing the insert/update case. For deletion, haven't got any details yet. Likely to be a tricky case. Bob is skeptical about generality of the approach.

Lei: Mostly focused on SPNHC abstract.

6 REU supplement.

Asked David for CV. Will plan on asking for an NSF REU supplement.

7 NSF July deadline that Bob circulated.

New program: Call for Systematics and Biodiversity Science.

James: This certainly needs to have a very strong biology component.

Jim: Two focal areas: Biodiversity discovery/analysis Phylogentics. Also another: Advaincing revisionary taxonomy and systematics (ARTS) - might be an appropriate target, as it includes novel approaches that achieve goals without compromizing integrity of the process. Major focus area funds actual work on doing the systematics. Can ask if biodiveristy informatics is a focus for this program.

James: Translating discovery work and moving it to dissmeination might be interesting.

Jim: Likely to need a focus on a revision of a particular group of organisms.

Program has two deadlines per year, July and January.

Bob, James, and Paul in Tuscon next Tuesday, but should be able to participate meeting as normal.

Field Museum Mtg: Scientific Software Innovation Institute (S212) Workshop for Biological Collections http://www.nsf.gov/awardsearch/showAward.do?AwardNumber=1064422 >>> need couple of slides for FPush intro at the beginning SPNHC abstract: addtl. info requested: different curation wf use cases and scenarios >>> who are the (primary users)?

Paul: Key messages:

  1. Duplicates problem in botany nailed (!?) [efficiency]
  2. Designing a general system

Bob: Main message:

  1. "The FilteredPush" vs *A* FilteredPush network

many communities could add *a* FP network