2008May14

From Filtered Push Wiki
Jump to: navigation, search


Zhimin wants to restructure architecture of project. Hadoop is not very efficient for communication. We should make Triage a layer of web service and hadoop will just be in background for specific jobs. We need an interface for Triage. Could be very simple-- Triage only accepts a string of xml, or maybe there can be some typing with WSDL.

One problem is that it might be slow for parsing large files. Can you stream SOAP? Maybe, Zhimin thinks he's seen some commercial products. Maybe use indirection to handle large files: pointers to data rather than the data. Most messages will be small, like annotation messages.

Authentication might take place in Triage.

Not all nodes need to run a Triage service. For example, Harvard may have six nodes but only HUH has Triage service. Triage should be able to decide whether the query needs to be authenticated or not. Triage can enforce policy but not do authentication. Triage can decide what kind of credentials are acceptable without verifying the credentials. Maybe nodes can shop for Triage services that meets their needs. Can use simple discovery.

Requirement that it be very easy for client to get to Triage through firewalls. May become difficult for some NIH scenarios, where firewalls might not let through encrypted messages that weren't encrypted locally. Need to enable keepers of secure networks to use FP in a way that suits their needs. Ought to redirect requests for security services that FP doesn't want to handle.

---

Zhimin needs to know what kind of information will be given to the network for annotation messages. What is the source record which is the basis for annotation? How do we identify the source record. Source institutions should provide DateLastChanged for records when available. Need guid for institution and unique identifier for record within institution.

Include whole original record or reference to it in annotation messages? Include the original record. There is a problem with records composed of mutliple distributed files. Include the file which is the basis for the annotation. Paul has a scenario in which a researcher wishes to make a new determination on a specimen on the basis of an image of it that he saw in Morphbank. There are parallels with human scholarship, citations. It is possible that reification in RDF will provide enough functionality that we don't need to worry about it. Can an OWL document have, for example, three top-level "abouts?" (About the image file, about the specimen database record, ...)

---

Zhimin asks if there a difference between an annotation of a single record and an annotation of a collection of records? Not if handled with RDF. To what types of things can we apply the property "IsAnnotatable?" To what types of things can we not apply that property? The representation of the thing referenced must be resolvable.

--- Graph of the sort of things involved in a domain neutral annotation from the whiteboard, with an example of data elements in the DarwinCore1.2 namespace. Annotation graph.png