2011May31

From FilteredPush
Jump to: navigation, search


Etherpad for meeting notes: http://firuta.huh.harvard.edu:9000/FP-2011May31



User: David Lowery | User: Zhimin Wang | User: BertramLudaescher | User: Lei Dou | User: Tim McPhillips | User: James Hanken | User:Paul J. Morris | User:Maureen Kelly | User:
"User:" cannot be used as a page name in this wiki.
| User:
"User:" cannot be used as a page name in this wiki.
| User:
"User:" cannot be used as a page name in this wiki.



Agenda

  • Proposal to change meeting time to 1PM Eastern, 10AM Pacific.
  • Review of SPNHC Demo.
  • Programming targets for next 2 weeks.

Reports

  • Zhimin: Help Lei install a local copy of prototype demo; Prepare for network implementation
  • Bob
    • At Davis worked with team on specs/design of mapper and evaluating frozen yogurt.
    • Gathering annual report material.

Notes

Filtered Push Team Meeting 2011 May 31

Present: Bertram, Tim, Bob, Zhimin, David, Maureen.

Agenda:


   *  Proposal to change meeting time to 1PM Eastern, 10AM Pacific. 


Bertram: Works after next week. Jim: Works.

Next week at 2:45 Eastern.

Week after (Tue June 14: begin at 1:PM Eastern, 10AM pacific). Confirm Room, Notify TACC, notify James.

   *  Review of SPNHC Demo. 


Bob: One interesting thing that showed up about 10 minutes after tweeting the demonstration, some commerical company started following the tweet feed - Datenhammer. Site that says "comming soon".

Bertram: Translation from their site: Controlled data quality.

Bob: Were there folks from BiCycle there?

Bertram: Allready have a story worth telling. Good time to look for journal to publish in. Two questions arose from demo: How do we work with large data. How do we deal with long running workflows (feedback from curators might take days) (Sven Koehler has been working on recovery/restart of workflows based on checkpoints. Digital collections community as target for publication?

Bob: There are examples in the demo of non-georeferencing errors that are only found by georeferencing. Related to the question "what is the most useful attribute". The data mining community does look at how and when to weight attributes.

Jim: Biodiversity Informatics as a possible journal target.

Bob: Ecological informatics another possible target, but not open access.

Paul: PLOSS one/biology?

Bob: Paper would be similar for all three, though PLOSS would need more discussion of the context.

Tim: Might keep in mind if the Issues that Bertram has raised for pubilcation - what about scalling. Reviewers would likely ask these questions, and we should have an answer ready for them.

Bertram: We should be thinking about this, and possibly have an answer to include.

Tim: Advantage of shooting lower is a report on the demonstration. Might be a harder sell to the broader audiences without addressing scalling issues.

Bob: Assume a time frame of 6 months before we would be able to assert that we've addressed scaling issues - this is a shorter timescale that reviewer's comments.

Bertram ToDo: Think about whether it will be feasable to have a response to scaling issues within the next 6 months, have a response by next week. If positive, we'll start by targeting PLOSS one.

   *  Programming targets for next 2 weeks. 


Maureen: I need to understand what would make this a network. Started looking at platforms for P2P networks with Java implementations. FreePastry is one such implementation. Will investigate.

Zhimin: Will start to generate code from the design documents. In paralell with Maureen's work. P2P is incorporated later, first step is static, later is dynamic. Can focus on skeleton of network, while Maureen is investgating overlay network.

Paul: Interface between is evident, static is good place to start defining that.

Zhimin: Jxta another possiblity, seems popular.

Tim: First step is to figure out how to decompose the deliverables for the subtask. Need to decide what objectives to target first.

Bertram: Do we have oucomes from last weeks meetings pulled together?

Tim: Have access to enough of the information to discuss this.

Paul: Logical place to communicate between

Bob: Where and what schedule for defining client APIs?

Paul: Three areas of API: Message injection, retrieving query/interest results, other interactions with the mapper that aren't message injection or query/response of network (e.g. data harvesting end of queries).

Bob: Publishing clients (e.g. goldengate), probably don't launch queries, but launch annotations. Are there clients that only launch queries?

Maureen: Does specify know that there is a mapper?

Zhimin: Is a mapper a client?

Maureen: Tend to view mapper as a service .

Bob: High utility to consumers of annotations if they can be in a global ontology.

Paul: Probably three specific cases to start with: (1) Client asks network for potential duplicates and gets results. (2) Network asks data provider for data in which potential duplicates might exist. (3) Network provides an annotation and client consumes it and integrates into local database.

Tim: In parallel to figuring out how this works, my inclination is to start with getting data from a collection database into a filtered push network.

Paul: (2) above.

Maureen: I'd like to make sure that Tim's point about how data gets into the network gets on the agenda for next week.