FP Messages

From Filtered Push Wiki
Jump to: navigation, search


General Concepts

Informal description of atomic messages: One message = one purpose!!!

Implementation description of atomic messages: One message type = one concrete job.

What is "message originator" (person? client?) Perhaps both, thus Header element <xsd:element name="Originator"/> as a complex type consisting of a GUID for the node that originates a message and a string that represents an (untrusted) assertion by the client software of the name of the person who is logged in and using the client to generate the message.

--David Lowery (talk) 18:35, 27 February 2013 (CET) Message originator in the current implementation is an authorized client. Currently we have private/public key pairs that correspond to each authorized client and the only metadata associated with them is an alias such as "symbiota-scan". The message originator has a stub implementation in org.filteredpush.identity.ClientIdentity.java, is currently not being used, and probably needs to associate more metadata with the client's public key used for authentication (currently the network only knows that someone who made a request was authorized to do something, but information about who that someone was is not being retained)

Multiple clients at same node? Probably easier than not

--David Lowery (talk) 18:35, 27 February 2013 (CET) Multiple clients can currently authenticate with the same network node (via access point) and the key from the xml dsig is compared against all certificates of authenticated clients in a PKS12 keystore (PKS12 was chosen in favor of the Java keystore format to maintain cross-compatibility with other implementations, i.e. SparqlPuSH which is written in PHP and can use the ssl libraries to obtain keys from the PKS12 store)

Atomicity of Message filtering? All or none for now.

Transport

To fit Requirement14 FP Messages can be encrypted XML messages.

Requirement15 is met through client software signing FP Messages with XML dsigs, and the FP AccessPoint maintaining a list of authorized client public keys.

Message arguments

  • Network object ID (or list of same[?])

Tradeoff - send message multiple times, or send large complex messages. Implementation level decision?

--David Lowery (talk) 19:05, 27 February 2013 (CET) In the current implementation each message has a unique "messageId" field that contains a UUID. This messageId is also passed back in the response to the client. The messageId can be used by the client as a handle to obtain the results that are related to the request initiated by the message (i.e. query results, analysis results, annotations meeting interest). Internally, the messaging system can attach other response messages to the original identified by this handle by setting the inResponseTo field of any other messages created to store results for retrieval to the UUID identifying the original message. A call to check messages by the client with the original handle with return all messages "inResponseTo" or associated with the original message.

  • Operation ID

What kind of message is this (FP_TAXON_COMMENT). Extensible and schema based? Portal to community is another edge where messages apply - privileged on one side, not privileged on the other (relevance of xml access controls).

--David Lowery (talk) 19:05, 27 February 2013 (CET) In the current implementation this is the type field of an FPMessage. The representation of a message type is currently extensible (via class hierarchies) but is not schema based. One idea proposed is to use xsd enumerations to represent valid types and to extend these types using xsd union (for example the union of the base fp message type enumerations and the apple pie message type enumerations describes the set of all valid message types in use by the system configured by that schema).

  • Message Signature
    • Signature by the client code that generated the message, Public key available in the network to validate signature and source of message as known client. Requirement15
  • Destination of message

FilteredPush messages do not have explicit destinations other than the FilteredPush access point to which they are submitted.

--David Lowery (talk) 19:05, 27 February 2013 (CET) The destination of a message is determined by the jobplanner/jobrunner mapping the message type to the appropriate job implementation classes. A message is then associated with the job which contains the code for interpreting the message content according to the scheme property of the message. After extracting message content the job invokes the messaging system and will store the message metadata and content in the current MySQL implementation of a messaging database. The schema for the MySQL messagestore is defined by key value pairs associated with messageids. Current plans are to implement the messaging using MongoDB where content that can be structured as kvp (such as message type, scheme, etc and other message metadata) will be stored using the kvp store capabilities of mongo and other content (such as the message content xml) will be stored as binary data using the filesystem store (gridfs) capabilities of mongo.

It is the responsibility of the FilteredPush network to match messages to interests/destinations.

Are there messages where the destination is other than broadcast on network: No.

Message destination other than access point may be irrelevant as this level of message handling is delegated to the messaging system.

--David Lowery (talk) 19:05, 27 February 2013 (CET) The services provided by the network must be invoked via the access point which authenticates the message. As a result all messages must go through the access point as the single point of entry.

  • Message content - still needs elucidation.

Content: CDATA

ContentScheme:

--David Lowery (talk) 19:05, 27 February 2013 (CET) Current implementation is Java based and content is a String field of FPMessage (maps to xsd string when deserialized as xml). The proposed schema based implementation with content element typed as xs:anyType as opposed to CDATA would allow for more structured message content as xml in addition to simple strings of character data such as kvp.

    • See message types below

Sets

Is a Set immutable? One app is stuff that is seemingly a dup, but isn't, or isn't but something determines later that it is.

Are sets non-intersecting?

What are relations between Sets.


How is a Set defined? Hopefully some mix of automatic and people-originated. Replace "Set" with "Cataloged (Virtual)Container" ?

Other General Comments

Sites as collections

Primitive is cataloged collection object?

Composite objects: what investigates passing messages down to composite pieces?

"Cataloged" = "GUIDable"?

Value to an FP network of "I want to know if these investigators ever begin working on these network objects (or objects defined by these properties). Also, "What are the stuff other people say the above is met; subscribe me to those")

Need authentication of agents (people or software), whether or not through a node, perhaps only through a portal.


Potential Client to Network Messages Listed in Use Cases

Use Case Find Duplicates

  • findDuplicates
    • makeQuery
    • findSets
  • makeAssertion ANNOTATION
  • addToSet

Use Case AnnotateSpecimen

Use Case Quality Control New Record

  • makeQuery
  • makeAnnotation ANNOTATION
  • injectWorkflow
  • invokeWorkflow
  • discoverWorkflows
  • (also) annotateWorkflow ANNOTATION

Use Case Overview

  • analyzeGroups
  • addRule
  • applyRule

Use_Cases_from_Web_Client_Scenarios

Network_Monitoring_Use_Cases

  • reportAccessPointStatus PING
  • reportNetworkStatus

Use Case Researcher_Seeks_DwC_Metadata

Potential Network to Client Messages listed in Use Cases

Use Case Annotate Specimen Use Case Ingest Annotation

  • recieveNotification

Use_cases#Overview

  • listenForEvents

The Network_Monitoring_Use_Cases could also call for event notification of administrative clients.

  • reportSuspiciousActivity
  • reportNetworkProblems

Messages

 

The following message types are currently defined and have implementations of jobs unless stated otherwise.

BaseFPMessageTypes

PING

FP Client wishes to know if a FP Access point is listening. Access point returns a MessageID and takes no further action.

STATUS

FP Client wishes to know status of resources in FP Network Instance. Message instantiates a job that checks and reports on knowledge/messaging/analysis capabilities.

QUERY

FP Client wishes to execute a query against knowledge. Instantiates the Query job which will launch the appropriate query depending on message scheme (i.e. SPARQL, KVP ...)

ERROR

Used internally to signal an error. Currently no implementation for this type but would probably be attached to the original message that the client has a handle for and would be returned to the client after the call to check for messages.

ANNOTATION

Wraps a OA/OAD annotation.

The semantics of what is being annotated is delegated to the annotation.

Annotation typing is delegated to rules: See ApplePieRules for typing of annotations.

Annotation messages were typed in the Prototype, this entangles the concerns of the domain with the concerns of the annotation with the concerns of the transport layer. We do not advocate the structure used in PrototypeTypedAnnotations.

FP_FIND_SETS

--David Lowery (talk) 19:21, 27 February 2013 (CET) This is currently in the BaseFPMessageType but I am not sure what it does. No implementation yet.

ApplePieMessageType

REGISTER_INTEREST

Semantics: originator registers interest in <something>

DELETE_INTEREST

Originator would like to remove a previously registered interest from the system and stop receiving notifications on it. This is currently not implemented.

NOTIFICATION

--David Lowery (talk) 19:21, 27 February 2013 (CET) This is currently in ApplePieMessageType but I am not sure what it does. No implementation yet.

FIND_DUPLICATES

Deals with the find duplicates use case.

RUN_ANALYSIS

Run an analysis using the available analysis engine on the data and the named workflow provided as parameters in the message.

--David Lowery (talk) 19:21, 27 February 2013 (CET) We are currently working on an implementation of analysis using Kepler. As of right now triage will plan and run the analysis job which invokes kepler wrapped in an ejb and this ejb will print out a message to the logs "Start kepler!"

ADD_ANALYSIS

Currently not implemented. Enables a client to add to the set of available workflows.

Schemes

BaseFPMessageScheme

Currently no schemes defined in this class.

ApplePieMessageScheme

RDF_XML

The RDF/XML scheme describes the message content of an annotation message type.

SPARQL

The SPARQL scheme describes the message content of a query message.

KVP

The KVP scheme describes the message content of a query message or an interest message.

PROCEDURE

The PROCEDURE scheme describes the message content of a query message that refers to a stored procedure.

--David Lowery (talk) 22:09, 27 February 2013 (CET) The annotation processor currently uses this stored procedure message type to run canned queries stored in the network (or queries that require multiple steps). One example is the getAnnotations(annotationid) query or getResponses(originalAnnotationId). Stored procedures have message content of the form <procedureName>(<procedureArgs>).

Potential Messages

General

 

FP_NOTIFICATON

Note: These retain substantial bagage from the prototype and need to be reworked.

E.g.

  • An asynchronous message has a reply waiting for you (network notifies client)
    • one of your subscriptions has a new publication
  • I am a data provider with new data available to the network (client notifies network)
  • Network has a new subscription that people might be interested in (network broadcasts?)(what about authorization?)

FP_DATAHASCHANGED

Semantics: A data provider is indicating that data they have available for query or harvest has changed.

FP_ASSERT (depreciated)

Generalization is FP_Messages#FP_ANNOTATION

args: (true, false, accept, not-accept). Semantics: originator is asserting that something is true (and thus accepting it), false (and thus rejecting it), or accepting (or rejecting) it without agreeing or disagreeing with its validity. Fourth case of not-accept emerged in discussions at TDWG 2008, including with Mark Mayfield who indicated a desire to not accept some subset of new determinations that might be true but which reflected a new combination that their institution might not want to store in their database or record as an annotation on the specimen. The value not-accept is essentially a formal mechanism for ignoring the message (possibly distinguishing institutions that review incomming annotations from those that ignore them). Examples: James accepts all annotations made by Tony. James says that this determination made by Anne is correct. James says that this determination made by Henry is incorrect.

Queries

 

FP_QUERY

General, or specific subtypes (inventory, find sets, get data)?

FP_INVENTORY

Semantics: How many sets do you know about with property X?

FP_FIND_SETS

Semantics: Which sets do you know about with property X?

FP_GET_DATA

Given a set, retrieve all associated data.

Set Operators

These look like they can be generalized, and may be two sorts of operation with some (add/remove) being expressed as annotations, and others (build sets/add generation rule) be analysis instructions.

FP_ADD_SHEET

Message: FP_ADD_SHEET args: SetID; SpecimenID Semantics: message originator is asserting a specimen belongs in the given Set (Is this an annotation, or should there be a way to enforce it?)

FP_REMOVE_SHEET

Semantics: reverse of FP_ADD_SHEET

FP_ADD_NEW_SET

Semantics: ???

FP_ADD_SET_GENERATION_RULE

Semantics: message originator is describing a novel set of rules for creating sets and determining set membership (e.g. a new rule for building sets of collection objects that have determinations within the same taxonomic concept).

FP_BUILD_SETS

Semantics: given a rule and optionally limiting criteria, build sets with that rule (find all sets of duplicate specimens of Rubus, find all sets of duplicate specimens known to the network).

Community Messages

Need further discussion and elucidation.

FP_WIP

Semantics: Notification that work is in progress on a network-identifiable object. Up to client to interpret what to do with the object, e.g. if it is decomposable at the client side. Does this entail producing new, transient(?) identifiable objects?

Might include.

  1. Mark as Work In Progress
  2. Release Work In Progress
  3. Query for Work In Progress
  4. Inventory Work In Progress

Next to do

Spell out specifics of these messages. Link to use cases.

XML Schema

For current authoritative representation see: https://sourceforge.net/p/filteredpush/svn/HEAD/tree/trunk/FP-JavaSOA/FP-Modules/FP-Core/src/main/resources/fpmessage.xsd?format=raw


Old Versions: File:Message.xsd and File:MessageReturned.xsd