From Filtered Push Wiki
Jump to: navigation, search

Etherpad for meeting notes: http://firuta.huh.harvard.edu:9000/FP-2011Feb15

User: David Lowery | User: Bob Morris | User: Zhimin Wang | User: BertramLudaescher | User: James Hanken | User: Paul J. Morris | User: James Macklin | User: Lei Dou | User:
"User:" cannot be used as a page name in this wiki.
| User:
"User:" cannot be used as a page name in this wiki.
| User:
"User:" cannot be used as a page name in this wiki.


  1. Review of architecture, settled on component design/responsibilities?
  2. Tuscany
  3. James Macklin status of FNA and Hong Cui.
  4. SourceForge
  5. Second Project Programmer Position


  • Zhimin: Start designing the client API and intend to make it schema agnostic. Looking at a service component architecture implementation (Apache Tuscany), a very possible candidate for managing and wiring our service.
  • Lei and Bertram: Prepare slides of koogle-kuration package and video of SpecimenMerge workflow for Ptolemy-mini conference
  • Paul: Anne Marie has had more communication from Dell, but they have yet to provide new information on disks available for firuta.


Present: James, Lei, Bertram, David, Jim, Bob, Paul, Zhimin


  1. Review of architecture, settled on component design/responsibilities?

Where are our specification and architecture docs again? (url please http://etaxonomy.org/mw/FP2_Design:) Thx is a TOC but needs more added, especially the SVN pointers. Goal is that is always the place to start. Will improve some this afternoon after meeting....Bob

Review of: http://etaxonomy.org/mw/List_of_Component_Responsibilities

Bob: There's one in the client library that may be implicit: Messages might send notifications. Client's responsibility to listen for notifications.

Zhimin: Hook to a callback mechanism.

Bob: Where is the responsibility for authorization/authentication/user management? Implementations as calls to DataOne. Agent management.

Analogy: Nice car to drive away from the dealer, but no gas in the gasstank.

Additional component:

  • Agent/UserManagement
    • UserRegistration
    • MintingAuthorizationTokens
    • ValidatingAuthorizationTokens
    • (Expect to refactor this component once: 1) we have an access model, 2) when DataOne services are better defined.

Bob: Also missing: System administration, logging, etc.

Traffic shaping / traffic management in general. Hopefully able to delegate to something else.

Paul: Is it a responsibility of each component to report its status?

Bob: Not at the point of shaping this collaboration graph. Question of need and not provision. Now a time when we should turn the list of component responsibilities to a UML component diagram, making sure that each responsibility is reflected as a provision and a need.

Zhimin: Need a good boundary between the management level and the application level. Client API should be independent of the message schema.

Bob: Right abstraction is marshaling and unmarshaling - xml and xml schema being one instance.

Zhimin: In moving across the interface transform from object model to serialization back on the other side to object model.

Bob: XML schema substatntial headache for extension. RDF better choice.

Bertram: Not yet ready to commit.

Bob: Agree, not quite at the point of making that decision.

Bertram: Some technologies add a level of indirection that isn't needed. For example, implement in native Java, or layer in web service, or layer in generic service...

Bob: Advantage of dependency injection frameworks is ability to escape design decisions on a component specific basis, without having to rearchitecure.

  1. Tuscany

http://tuscany.apache.org/ "Apache Tuscany simplifies the task of developing SOA solutions by providing a comprehensive infrastructure for SOA development and management that is based on Service Component Architecture (SCA) standard."

Our design incorporates lots of services, service component architecture implementation Apache Tuscany may be relevant. Components can be local or remote. Provide a service, and a configuration file, Tuscany is a platform for integrating services. Broader sense of Inversion of Control pattern (e.g. spring).

Bob: Analogy: Runtime Spring.

Bertram: What problem is solved using Tuscany?

Zhimin: Architecture includes lots of components that provide services, and lots of service discovery (see: Find Service calls in Annnotation_Sequence_Proposed_in_network.png )

Bertram: In some projects we have see attention spent on emerging technologies and standards (eg OSGi: http://en.wikipedia.org/wiki/OSGi) modularize the software. But this distracted from making the hard software refactoring decisions. In a way, "new technologies" can parasitize effort that otherwise could be spent on tackling the problem head on ;-)

Zhimin: Currently evaluating.

  1. James Macklin status of FNA and Hong Cui.

Collaborative Proposal to do parsing of biodiversity literature, Hong Cui as PI. Connected to FP through Hong Providing data from Flora of North America to provide a digital flora demonstration, with FP allowing annotation of flora. Bob, James, Alex, Paul going to Tuscon in late march, looking at algorithms, terms, logic, movement towards demonstration. FP proposal includes the demonstration FNA flora. Bob has subcontract from Hong at UMB.

  1. SourceForge

Bob to do migration from UMB this week.

  1. Second Project Programmer Position
  1. Roll-over from last week? Prototype to accept annotations, apply changes to (production?) database

Lightweight implementation of a client to accept annotations. Need: A copy of a target database. Annotation request format. Then implement.

Zhimin: Comes back to the query language.

Bertram: Prototype to see incoming annotations.

James: Web interface.

Bob: Users have problems that they want to solve. We can easily, and should, write down all of the questions that can be answered by a duplicates resolution network. The next part of that has to be (hidden from the users) what are the requirements for the query languages. As an example, for DiGIR, it was regarded as successfull that queries had to be expressed in a small subset of SQL but tied to DarwinCore, Tapir was more general, but effectively more expressive SQL less dependent on a schema. What are our requirements for queries and responses? Are they met by RDF and SPARQL, and can we express this question sufficiently in UML.

Do we have a list of top-3 (top-N) kinds of change requests (annotations) that we need to support?

  1. Correction of a value.
  2. New Determination.
  3. Theses things are members of the same set.

See notes in: http://etaxonomy.org/mw/Use_cases#Use_Case:_Annotate_Specimen http://etaxonomy.org/mw/Use_cases#Notes_2

  1. Separate out management and technical discussions again.