2011Apr05

From Filtered Push Wiki
Jump to: navigation, search


Etherpad for meeting notes: http://firuta.huh.harvard.edu:9000/FP-2011Apr05


Agenda

  • Follow up on an overhaul of our project web presence. Who can take this on?
    • Categories
    • Overhaul FilteredPush Page
    • Develop General and Collaborating Project entry point pages
    • Others?
  • Ontology progress
  • SPNHC Demonstration progress
    • Synthetic data to clean
    • Review specific QC demonstration goals
  • Client Library/Web Client progress and targets
  • Quick check of high-level project deliverables categorization

Reports

  • Paul
  1. Did distribution upgrade from squeeze to lenny on firuta, providing tomcat 5.5 required for the demonstration from the distribution.
  2. Research computing has set up storage and a VM for backend storage of the GBIF cache, but they haven't finished setting up to provide us access to those resources (plan is tomcat container for demonstration and node software to run on firuta, back end mysql instance to run on the VM, and the database to live in attached scalable storage.
  • Zhimin
  1. deployed demo onto firuta http://firuta.huh.harvard.edu:8080/JSF2FilteredPush. For fuzzy match, it is still slow. Some optimization or some change of strategy need to be done.
  2. making class digram for messages (annotation, query, notification, result)
  • James
    • Some discussion with Hong and Lei about FNA treatment information as authorities for data cleaning in a workflow.
  • Lei, Tim and Bertram
  1. Worked on SPNHC demo: tried to find out how to use phenological data; evaluated services of BioGeoMancer, GeoLocate and USDA Plant
  2. Worked on provenance browser
  • Bob
    • Still wrestling with AO

Notes

2011 Apr 5 Filtered Push Team Meeting.

Present: Zhimin, Bob, David, Paul, James, Tim, Lei, Bertram, Jim.

Agenda:

  • Follow up on an overhaul of our project web presence. Who can take this on?


James: Following up.

  • Categories


Return to that next week.

  • Overhaul FilteredPush Page


Return to next week.

  • Develop General and Collaborating Project entry point pages


Tim: One page project charter approach. Willing to contribute a rough draft of a charter, separate the technology discussion from the goals.

Action Item: Tim to sketch out rough draft, send to James, refine then circulate to group.

Need a simple public presence entry point.

  • Others?


Project management set up on source forge, use pending migration of svn UMB repository.

Donna did a round of work on wiki cleanup and presence but didn't get very far.

Bob: Possible to produce static pages or wiki pages that look like static pages.

James: In top level, useful to sumarize progress thus far.

Action Item: Paul: Set up private wiki presence.

Tim: Scope of etaxonomy.org? Bob: Broad scope. Primarily FP, but other broad goals.

  • Ontology progress


Bob: 5-6 examples fleshed out, about to engender discussion on Annotation Ontology mailing list. Trying to focus on getting these examples in a consistent form to this group for discussion. Cleaning out old cruft from examples and cleaning up the namespaces to make the examples more readable for the group. A couple more days work needed. Software engineering issue is the absence of good tools for working with instance data.

  • SPNHC Demonstration progress
  • Synthetic data to clean
  • Review specific QC demonstration goals


Lei: Looking at what can be done with phenological data set.

Data to quality control?

Paul: Suggest create sythetic data.

Zhimin: Question is which data is under quality control.

James: FNA data from Hong Cui is authoritative for this case.

Bob: Construct data by starting with real data and introducing errors.

Paul: Yes.

James: Can pull out some obvious examples.

Lei: Vascular plants.

James: Starting point is a data set - I have a set of occurance records, I want to know more about the quality of this data set - I want to submit it to the workflow to have the workflow assess the quality of the data set from evaluating the submitted data by comparison with authoritative sources.

Bob: Is this demonstration going to illustrate the launch of any annotations into a filtered push network.

Tim: Categorization may help. (see below)

James: Story for SPNHC should be a curation story. Check out occurance data from a data base, quality control, check back into local database.

Paul: A scenario for the demonstration: 1) Start with a collection manager with a specimen data set (taken from real specimen data, but fuzzed to create sythetic quality issues). 2) This data set is submitted to a quality control workflow. 3) The worflow runs the data against a series of authority sources and a duplicate finder. 4) Specific quality issues are found and brought to the collection manager's attention for correction. 5) in the background, duplicates are located. 6) collection manager asserts corrections on data set. 7) annotations on corrections are sent to members of duplicate set(s). 8) collection manager at other institution views incomming quality anntations based on the first collection manager's corrections of their own data.

Zhimin: Component that is pugabble for local data provider, able to identify annotations of interest.

James: Need a set of fields defined for the demonstration. Issues for generating annotations and finding problems: level of data set, and...

Bob: Actor in workflow, generates annotation and injects into annotation store - lightweight annotation injection client. Easy to give a plausible story where the network consists of a global annotation store.

Zhimin: Lei can allready push message into FP network via API.

James: Last step to open web client and show annotations introduced from workflow.

Paul: Requirement for a system maintinace function to clean out an existing annotation - to let us test and re-run the same anntotation scenario.

  • Client Library/Web Client progress and targets


Zhimin: For SPNHC we can use the old technology and interface, can show off new scheme of annotations coming from workflow. For new development, strugging with how to make the implementation of annotations in the network (and client library) independent of the current draft of the annotation ontology, which may change, and query technologies which will also likely change. Examining dependency injection to handle some of the discovery issues - particularly if more than one protocol is in use in the network at once (as parts of a network upgrade before others).

  • Quick check of high-level project deliverables categorization


Tim: Categories of use cases - using filtered push to clean up your own data set.

Can FPush deliverables be thought of as facilitating three broad categories of user scenarios?

  • I. FPush-assisted data entry (primarily an efficiency story)
    • (web client, specify client)
  • II. FPush-enabled local data cleaning
    • (workflow)
    • (specify client, filtering incomming annotations from network)
  • III. FPush-powered community data quality control
    • (embedded workflows)
  • IV. FPPush powered notification of changes of interest
    • (web client - my interests)

All depend on: Annotations, network...

  • Possible Specify6 collab project

Discussion. Looks likely. Need clarification of goals and synergies with Kansas team.