2011Oct25

From Filtered Push Wiki
Jump to: navigation, search


Etherpad for meeting notes: http://firuta.huh.harvard.edu:9000/FP-2011Oct25


Agenda

  • Reports from TDWG
    • iDigBio
    • Symbiota
    • Specify6
      • Implication of Specify-Web/Dina for Mapper
  • Specify6-Symbiota-Morphbank-FilteredPush for AppleCore
  • FP Messages
  • Programmer search update (Harvard)
  • FP collaboration with Southwest TCN
  • A possible process for moving forward with ApplePie:
    • agree on a list of next steps (such as this one) and put a name and estimated date of completion on each
    • identify the main non-FP software components (Specify6, Symbiota, iDigBio etc.) and provisional target schema
    • identify the high-level flow of information among the components
    • set up a test environment: create documentation and scripts for installing and running each component that can be independently installed; for unrecreatable external services, obtain connection information and credentials for testing; include documentation and scripts for loading sample data; create scriptable tests that verify proper installation of each component of the test environment
    • analyze the non-FP components and identify changes or protocols needed
    • stub out the interfaces between components; start with tests
  • FP presentation at upcoming iDigBio Summit: Nov. 30 -Dec. 1 (Macklin)

Reports

  • Paul
    • At TDWG. Multiple contacts with iDigBio, existing TCNs, Specify, and Symbiota.
      • iDigBio to provide for start, 'Repository' not 'Cache' of aggregated data. iDigBio to provide some high level infrastructure for linking collections data. iDigBio does not have solution for image storage costs. iDigBio to provide packaging and support for deployment of a cluster of useful applications for digitization as identified by the TCNs and community.
      • Ed Gilbert with Symbiota and the Lichen/Bryophyte TCN quite eager to collaborate with FP.
      • Specify6 team, HUH, Agriculture Canada, and European Dina project discussing collaboration to produce Java EE version of Specify with focus on three tier JSF web application (db, JPA middle layer, JSF presentation layer).
    • Ran Annotations Interest Group. Meeting notes at: http://firuta.huh.harvard.edu:9000/TDWG2011AIG Discussed technical issues with TDWG infrastructure and AIG presence on TDWG website, for now working around those issues by posting the AIG charter on the interest group's page on the TDWG wiki. Interest group expressed substantive interest in AO/AOD. Interest group has consensus to form a Task Group, but needs to work on consensus for deliverables for such a group.
    • David Patterson very interested in Annotations. Would like to discuss with FP team. Perhaps some of us travel to Woods Hole?
    • Tony Rees very interested in Annotations, might meet on Thursday on way through Boston.
    • Mary Panahiazar expressed interest in some sort of interaction with FP team.
  • Maureen
    • Personal time off. Finished a working draft of OAI-PMH component for Specify, committed to local svn repository with build file and unit tests, deployable as a war file. This provides a standardized way for OAI-PMH harvesters to collect Specify CollectionObject data as Simple Darwin Core occurrence xml. The mapping from CollectionObject to DWC is based on a configuration file generated by a utility already present in the Specify thick client, so Specify users can configure the mapping with the gui.
    • Next step open for discussion, but here are some possibilities, I presume that any coding should be accompanied by a build file, unit tests, and a basic readme.txt containing instructions on how to install and run the code:
      • modify an existing sample harvester to deposit the harvested data in a central Lucene index
      • write sample code to take one harvested DWC occurrence record and convert it to an FP2 message (need to know whether it would be an annotation message or something different)
      • write stub FP2 server application that would receive FP messages and update a status page (this would involve stripping down the FP1 code and mapping the relevant pieces to FP2)
      • create a simple web registration interface to identify OAI repositories for the harvester to target
      • extend specify-oai to provide taxon and/or botanist metadata (need suggestions for appropriate xml schemas)
  • Lei
    • Finished the poster and one-page slide for the IDCC11
  • Bertram
    • Attended DataONE AHM in Tamaya near Albuquerque; co-hosted the ProvenanceWG meeting.
  • James M.
    • Attended TDWG and chaired the Annotation Symposium which was very successful.
    • There was great interest in annotations as witnessed by Paul's notes above.
    • Invited to the iDigBio summit meeting in Gainsville, FL. FP will have at least 15 mins. to present


Notes

FilteredPush Team Meeting 2011 Oct 25 Bertram, Maureen, Paul, James (who can now actually see the Etherpad!), Jim, Lei

Agenda

  • Reports from TDWG
  • iDigBio
  • Symbiota
  • Specify6
  • Implication of Specify-Web/Dina for Mapper
  • Specify6-Symbiota-Morphbank-FilteredPush for AppleCore
  • FP Messages
  • Programmer search update (Harvard)
  • FP collaboration with Southwest TCN
  • A possible process for moving forward with ApplePie:


Notes:

  • Reports from TDWG
  • iDigBio
  • Symbiota
  • Specify6
  • Implication of Specify-Web/Dina for Mapper
  • Specify6-Symbiota-Morphbank-FilteredPush for AppleCore
  • Report from Bertram on DataONE

-- co-hosted the ProvenanceWG meeting at DataONE AHM -- DataONE 1st release by the end of the year -- tutorials on "ONE-drive", "ONE-R", ... Making progress on authentication layer, able to delegate to OpenID, GoogleID, etc. Progress on making authentication easier to use. -- moving forward with D-OPM model; goal is to accomodate provenance models of different sci-wf systems (Kepler, Taverna, Vistrails, Restflow, ...) -- post-doc ad out now

  • FP Messages

Programmer search update (Harvard)

  • FP collaboration with Southwest TCN
  • A possible process for moving forward with ApplePie:

TDWG

James: having run the annotation session, it was successful, people were interested. Many have annotation use cases for their projects, including GNA (Global Names Architecture).

Paul: three different kinds of systems that deal with annotations:

all-in-one-silo (static annotations)

ALA store-and-query systems. one dataset with an annotation store and the presentation collects the annotations, but the datastore is at least in part harvested from other repositories.

FP distribution system-- any point in the network can be a source of annotations (FP and BiSciCol)

iDigBio

They are envisioning a very large scale enterprise bus architecture, with a repository rather than a cache of harvested data. They would like to integrate/bundle a set of software components that they community has indicated interest in.

iDigBio is having a meeting at the end of November.

Specify6-Symbiota-Morphbank-FilteredPush seems like a natural bundling of software. Perhaps there is a role for oai-pmh in allowing symbiota to harvest incremental updates.

Southwest Arthropod TCN

The proposal is mature and ready to go in. The MCZ would supply support for the Specify-FP-Symbiota integration: configuring an FP instance, including features in FP web for improving specimen data: find records needing further work, sending messages to relevant experts. Our piece has been scaled back.

There is a parallel story with the Northeast Herbaria consortium: Specify-FP-Morphbank integration with different workflows.

Specify 6 Dina meeting at TDWG

Dina is a project of the Swedish Museum and country-level partners for building collections management software. They did a detailed analysis of existing resources and chose Specify's datamodel as a basis but prefer to use a new web interface (Java EE) rather than the thick client in order to solve problems of distribution.

The meeting involved Dina, Specify, James, Paul, and others. There was general agreement to collaborate.

There is an immediate concern for us. Their design is to use JPA with a caching mechanism between the JPA layer and the database, and a JSF interface on top of that. Given the existence of the cache, all interactions that change the database will need to go through Java objects rather than direct SQL. We should re-evaluate our approach to the mapper to include this special case for Specify integration.

For SPNHC next year: high priority to work on the mapper

For Friday meeting: discussion of implications of Dina for FP and mapper in particular.

Thoughts on FP process. Maureen: Would like to have concrete tasks assigned. Start with high levels and fill in the details as we go along. Assign tasks to people with dates. More structure would help.


  • agree on a list of next steps (such as this one) and put a name and estimated date of completion on each
  • Target for SPNHC: (i) collecting annotations from a client; (ii) transporting annotations through an FP network that identifies recipients; (iii) the recipients ingest the annotations into their databases 2nd wk of June 2012 Consenus on this target.
  • identify the main non-FP software components (Specify6, Symbiota, MorphBank, DataONE, iPlant[?]) Consenus on this list.
  • identify the FP software components (Kepler client, Kepler service, web client, annotation store, cache, mapper, network, client library) Stake in the ground for discussion.
  • identify possible schemas to use for collection objects, and: determinations, taxon names, botanist names (Action Item: Paul)
  • choose provisional target schema from the possibilities: SimpleDarwinCore pending further analysis and discussion (new determination and new collection object annotations)
  • identify the high-level flow of information among the components (synthesize documents on the wiki; include the requirements list; include a review of requirments list with some specific demonstration scenarios-- relate demo requirements to specific use cases) (Action Item: Maureen and Paul over the next week) Also develop slides on architecture/capabilites/workflow for James to use at iDigBio meeting.
  • set up a test environment: create documentation and scripts for installing and running each component that can be independently installed; for unrecreatable external services, obtain connection information and credentials for testing; include documentation and scripts for loading sample data; create scriptable tests that verify proper installation of each component of the test environment Produce deployment (maven/ivy) environment. (Maureen will create the documentation and scripts for Specify over the next week)
  • analyze the non-FP components and identify changes or protocols needed After test environment is up.
  • stub out the interfaces between components; start with tests After test environment is up.
  • Review definitions of Client-Network messages (Bob)
  • Firm up definition of annotations. (Bob, Paul, AO/AOD)
  • Examine within network communications.
  • Mapper targets (discussion Friday).