2012Feb01

From Filtered Push Wiki
Jump to: navigation, search


Etherpad for meeting notes: http://firuta.huh.harvard.edu:9000/FP-2012Feb01

Agenda

  • NSF Workshop RFP
  • SPNHC 2012
  • AOD
    • Progress on paper
    • Generation of annotations from Specify determinations
  • Mapper (now AnnotationProcessor)
    • Integration Progress
    • UI Design
    • API For Query and Cluster Finding
  • Morphbank
  • Pending Tech Issues
    • FilteredPush Requirements Report_on_FP_Requirements
    • Tech group needs to make a decision on or set a date for decision for the query language for pub-sub for Apple Pie.
    • Tech group needs to make a decision on or set a date for decision for the domain objects supported for Apple Pie.
    • Tech group needs to decide on or set a date for decision for the scope, composition, and implementation of the "global cache."

Reports

  • Paul
    • Editing AOD manuscript with Bob.
    • Some work on cleaning up and categorizing on wiki
    • Fixed broken BOUML model for FP-Network.
  • Lei
    • working on integration of mapper
  • Maureen
    • Got Morphbank server and web services runnning, did a bit of tweaking.
    • Worked on integration of mapper with web client demo

Notes

Present: Bertram, Jim, Lei, Maureen, David, Bob, Paul

Agenda:

  • NSF Workshop RFP

Draft went to NSF, haven't heard anything back.

  • SPNHC 2012

Jim: Registration is open for SPNHC 2012. We are planning a presentation for the meeting. Are we planning to have a meeting of our own in parallel? 11 June through 16 June. [Just checking my calendar -- looks OK so far, modulo teaching; which I have to double check]

Discussion yes. Probably most constrained is James. Could meet at Harvard and travel to Yale, or meet during SPNHC at Yale.

Action Item: Paul to check with James about constraints and circulate possible dates for consideration.


Discussion: Yes, good to have a

  • Status on Morphbank install.

Maureen: Installation up and running, both web services and main part.

  • AOD
    • Progress on paper

Bob: Paolo is looking at draft of MS. Will circulate after we've addressed his comments. May need to add createdWith as well as createdBy to express what software created the annotation.

    • Generation of annotations from Specify determinations

David: Have code that generates from a Specify instance generate an RDF instance document for annotations containing new determinations. Sent first copy to Lei. Next step to add fuzzing.

Put discussion of the details of this on agenda for Friday meeting.

Bob: Point is a test bed.

Paul: Generate annotation instances for testing annotation processor, and provide code to reuse in generating annotations from Morphbank web services and generating reply annotations from annotation processor. Cases should help us understand how to link annotations as generality to rules to workflows.

Bob: Discussion from this morning about this clearly pointed out issues about using URI's for phisical objects and using URI's for digital representations of those objects. The current iDigBio request for comments on guids is underspecified here.

  • Mapper (now AnnotationProcessor)
    • Integration Progress

Maureen: Working with Lei to integrate code. Moving Lei's tree interface into the web interface, then start implementing mapper driver pieces. Working on integrating PrimeFaces/MyFaces widgets. Lei using tomcat 7.

Action Item: David to install tomcat 7 on firuta.

Paul: How to link in David's work on annotations.

Maureen: Purpose, as understood, to get more annotations based on real data.

Next step is to generalize to taking annotation object as input.

Bob: Redstore ok for testing but definitely not for deployment - easy to get started, but not for production use (e.g. single tread...).

Lei: Generated annotation files directly, then feed into annotation processor direcltly. Then use code to generate response annotation.

Bob: Installing Specify at Davis?

Bertram, yes helping entomology group at Davis on the side.

Action item: David to install tomcat 7 of firuta, then redstore instance, put notes documenting process on wiki, then return to refactoring annotation generation code.

Bob: Might be good to have a fixed set of sparql queries to ask the end point for testing.

Bob: Might deploy redstore as a maintinance tool within each node (performance not an issue there).

    • UI Design

Maureen: Emailed James macklin, agreed to do some testing of UI. Will send out screenshots and/or script on interface a couple times a week.

Paul: Jim also?

Jim: Scope?

Maureen: Pure end user testing - script of actions to try in user interface to carry out a task, or screenshots to look at. Will include time estimate of completion.

    • API For Query and Cluster Finding

Paul: Two levels of API - getting data and framing questions.

Bob: For clustering three elements: Dataset to be clustered, attributes on which you are going to cluster, Algorithm used to cluster. Then a matter of implementation.

Maureen: Only clusters from requirements is clustering botanical duplicates.

Bob: That describes the output.

Paul: Path to defining the problem.

Bob: We should examine Zhimin's UML and code and ask if it addresses the proper level of generality.

Bertram: Kepler could support a clustering mechanism as long as we write the actors to do the clustering where dataset and attributes are parameters.

Lei: What kind of generality do we want?

Paul: First level of generality is more types of cluster - taxa, localities, second level of generality is analysis of data, particularly for quality control.

Jim: Also we need to establish boundaries for Lei and others to get their work done. Need to recognize that we may need to be less general. Collections data sets contain lots of dirty data that are dirty in simple ways, can get lots of traction in data clean up with a few simple questions about data quality.

Bob: We can make choices that make it easier to refactor later.

Maureen: Clustering like partitioning network into a bunch of buckets that have content that resemble each other, is the main goal for users to obtain one specific bucket or to have the full array of buckets to browse?

Discussion: Primaraly (for collection managers, taxonomists, etc) one user wants one bucket or a small set of buckets at one time. Reusers of the data (e.g. climate change science) are most likely to be interested in multiple buckets at once.

Lei: Need copy of specify data set to go with David's code.

Action Item: David to provide copy.

  • Pending Tech Issues
    • FilteredPush Requirements Report_on_FP_Requirements
    • Tech group needs to make a decision on or set a date for decision for the query language for pub-sub for Apple Pie.
    • Tech group needs to make a decision on or set a date for decision for the domain objects supported for Apple Pie.
    • Tech group needs to decide on or set a date for decision for the scope, composition, and implementation of the "global cache."