From Filtered Push Wiki
Jump to: navigation, search

Etherpad for meeting notes: http://firuta.huh.harvard.edu:9000/FP-2011Apr26

User: David Lowery | User: Zhimin Wang | User: BertramLudaescher | User: James Macklin | User: Lei Dou | User: Tim McPhillips | User: James Hanken | User: Paul J. Morris | User:
"User:" cannot be used as a page name in this wiki.
| User:
"User:" cannot be used as a page name in this wiki.
| User:
"User:" cannot be used as a page name in this wiki.


  • Review Discussion of FP Charter document. Talk:FP_Charter
  • Progress on SPNHC Demo
  • Start of review of architecture/requirements documentation.
  • Goals for meeting at Davis before SPNHC
  • Review discussion of Web Presence Requirements Talk:Web_Presence_Requirements



  • finished switching fuzzy search from Mtree to Solr.
  • Network design
  • Help Lei prepare Demo
  • Take a close look at the latest design documents of DataOne authentication module.


  • Synthesize the dataset based on what is provided by Paul to make it fit into demo
  • Built a workflow prototype and generate the CurationSummaryReport
  • Discussing with Zhimin about client library, configuration and API to inject annotations into FPush network


  • Began reviewing system requirements and design documents for completeness. Started listing key engineering ideas (actors, components, use cases, design diagrams, etc), with references to other pages, at FP_Engineering. Will be organizing and expanding over the next few days.
  • Current focus is on clarifying the FP engineering effort from the point of view of users and client software (e.g., Kepler workflows, the web client, and the Specify client) and beginning a specification for the client library/protocol.


  • Reviewed/commented on charter document.
  • Discussed goals for review of existing design documents with Tim. Discussed goals for API documentation and other outcomes desired from meeting at Davis before SPNHC.
  • Progress is happening on connecting firuta to research computing's scalable resources.


  • Reviewed, commented on Web Presence requirements.
  • Have an ECS-265 class project on using Map-Reduce based Duplicate Detection (GBIF folks are interested, too!)

Meeting Notes

Filtered Push Team meeting 2011 Apr 26

Present: Bertram, James, Lei, Tim, James, Paul, Zhimin, David.


  • Review Discussion of FP Charter document. Talk:FP_Charter

Training materials: Focus on undergraduates.

Tim: Goals?

Paul: Review, prepare to revise and move on.

James: Timelines? Specificity?

Tim: Dependencies - timeline might reflect dependencies for components within the project, and dependencies for interactions with other projects.

James: Does this sort of broad timeline fit here in the charter?

Tim: Key thinks that need to be done on a timeline or it will be very critical (problematic?) for success of the project.

Paul: 3 critical things:

  • AO
  • FP API for clients
  • Library of SW, using the API

James: Can also breakdown in subcharters and project timeline tools.

Tim: Any questions/concerns.

Bertram: Yes some concerns, wording, specificity, will work with Tim.

Paul: Goals for Kepler?

Bertram: Annotating workflows least importance, have some capability in Kepler for this allready. Dearer to our hearts are Kepler as client and workflows running within the network. High level charter,

Tim: For having external users write kepler workflows and have a method to deploy in network, this is something we should spell out if we want to have this as a goal. Your workflow, running on behalf of other people, on our network.

Paul: Two broad goals in proposal: (1) production quality software to deploy in domain, (2) experiments on ways of enhancing this software to do other things.

James: Should look at categorizing the goals in this framework.

  • Progress on SPNHC Demo

Running synthesis using both purely synthetic data and data fuzzed from from GBIF cache. Significant progress actors to perform each of the QC analysies. Discussing API for annotation injection with Zhimin. Working on visualization actors.

Bertram: What method for duplicate detection?

Zhimin: Match on collector name/collector number, no significant analysis for demo.

  • Start of review of architecture/requirements documentation.

Tim: Sorting through the documents, various ages and maturity. Working on finding more recent, more authoriative things. Making a list of entities for engineering in the project, finding things that should be clearly defined. Missing pieces might be overviews - e.g. actors and systems. Reviewing documents from the perspective of someone outside the deployment, client software, local collection databases, domain scientist users: how complete/contradictory is the documentation.

  • Goals for meeting at Davis before SPNHC

Paul: API documentation on table.

James: We should look at the list of clients. We should try to group the clients by architecture, etc, and priority, showability, and needs.

Bertram: API sounds good to me. Have we identified clients?

James: Yes, some in proposal, some emerging.

Bertram: Working on API could be part of the system design/requirements for new system.

Tim: Particular artifact by the time we get together?

Paul: A question, what should we have going in to the meeting and what coming out?

Bertram: Knock over architecture diagrams?

Tim: Approaching system as a black box as a client, what can a client say to the network and what comes back. Some advantages - we can develop clients to a network that doesn't exist yet. We can start developing system tests. Now that we know what the clients are doing, we can focus work on the innards.

Bertram: like the idea of "outside perspective": FP providing services for client tools

Zhimin: Two sides of interface in the system, one for client user, the other for the data provider. What are the expectations for a data provider for making their data servable in the network?

Zhimin: Also may need to abstract the annotation ontology - difficult for clients to use. For example, java object model generated from the ontology document is painful to traverse when generating an annotation instance in java or when specifying a query.

Tim: If the annotation ontology is difficult to use and classes are difficult to traverse, who is it making life easy for?

Zhimin: Ontology for exchanging opinions. Goal: Allow people to easily use the ontology. Ontologies tend to be complicated - lots of structure, lots of relations - but end users have very specific perspectives. Need some utility for converging the two.

Tim: We should be able to say clearly why this is critical.

Bertram: Means to rally the community around an entry point to the project.

James: In a community sense: having means for communicating, both amongst the people and in software. Common vocabulary. Also a means for communicating outside the community.

Bertram: Is the ontology a prereq for FP client-APIs? (Somehow I just had to think of Nico's comments at the last TDWG meeting ;-)

  • Review discussion of Web Presence Requirements Talk:Web_Presence_Requirements

Put off to to next week.

Bertram: Not too many comments is probably a good thing.

Bertram, random questions:

  • what function(s) to use for clustering / duplicate detection?

(for ECS-265 student projects)

Zhimin: Basic strategy for large volume duplicate detection, most popular approach, is rough blocking as straight clustering is too costly (n^2). Typically use a sliding block with clustering within block.

Bertram: Will communicate with Zhimin and Lei.

Zhimin: Several good papers on approach. Will forward.