Annotation Stake In The Ground
This page describes a stake in the ground for an Annotation ontology. The ontology Annotation.owl consists of several concepts, illustrated in the figure below. Following that is an instance example. The owl files may be fetched from this zip file --Bob Morris 16:27, 26 September 2010 (UTC)
Hilmar Lapp reminded me that the Annotation Ontology (AO) is a restriction of Annotea that shares some of our concerns. It is aimed at annotating documents, and it is not clear that it could support using an arbitrary vocabularies for the annotation concerns, but it is worth re-examining it in its recent form. --Bob Morris 23:10, 3 October 2010 (UTC)
Concepts in Annotation.owl
The key concepts are:
- an Annotator, the agent making the annotation
- the Annotation itself, and a few attributes:
- a GlobalIdentifier
- an Annotator
- the subject of the Annotation, which is an object in a class named "InterpretableObject", discussed later. In the example, it is a small fragment of a putative DarwinCare object containing an country name mis-spelled.
- the date at which the annotation was made
- the content of the annotation, which is an InterpretableObject. It may be RDF or "opaque". In all uses of an InterpretableObject, a URI given by hasInterpretationURI serves as guidance for the interpretation object by the consuming application.
- The motivation, an InterpretableObject representing the Annotator's motivation for making the observation.
- The evidence, an InterpretableObject representing the nature of the evidence offered for the annotation. None is provided in the example below.
- The annotator's expectation of an outcome given by a small enumeration.
InterpretableObjects and dwcterms.owl
Annotation.owl intends to take no position on the content of an Annotation. Instead, the hasContent and hasOpaqueContent provide data by which any external vocabulary can be used. A hint, the object of hasInterpretationURI, gives guidance to consumers about how to interpret the data given by hasContent and hasOpaqueContent. In the latter, the content is simply a string and a consumer must extract it and use the interpretation URI for guidance following some agreed upon community standards. Two examples might be a string containing XML from an occurrence record served as DarwinCore. In this case, the consumer would interpret using tools for dealing with DwC, and the URI might just be something signifying that the opaque content is XML ( with an embedded schema reference. The second case, used in the example following the ontology diagram, uses an InterpretableObject given by hasContent. Its interpretation uri references an OWL2 representation of DarwinCore created for the purpose. (It is in the zip file as dwcterms.owl). This design permits SPARQL queries directly on the annotation content
The design of dwcterms.owl is simple: I started with the RDF representation of Dwc For each term, each of the attributes that were about the term itself were turned into OWL2 AnnotationProperties (these can be reasoned on independently of reasoning on the data described by the ontology). This adds no formal semantics to the data given by a term---there is none now (which is a shame and should be addressed by TDWG. But it's probably a big job)---but it should still support SPARQL queries.
The example following the ontology diagram is intended to show the use of Annotation.owl with content defined using the vocabulary of dwcterms.owl, i.e. of DarwinCore. To accomplish this, dwcterms.owl has a single class named DwCOwlFragment. To bind the dwcterms vocabulary to Annotation for use in InterpretableObjects, an instance of Annotation asserts dwcterms:DwCOwlFragment to be a subclass of annot:InterpretableObject, therby making annot:hasContent, whose range is annot:InterpretableObject able to refer to vocabulary from dwcterms.
One point of the architecture using hasContent and InterpretableObjects is that it is straightforward to annotate Annotations. In this case a community defined vocabulary for such annotation would be of great utility, and applications that understand both that and the content vocabularies could easily manage the provenance of the underlying content.
The owl files are all OWL2 compliant, but not OWL2 DL. However, that's because I included foaf for descriptions of Agents, and foaf has some needless obstructions to DL. Anyway, it's unclear foaf is the right vocabulary for describing agents.
The Content and OpaqueContent models seem as though they should be modeled in some way more related to one another.
Absence of domains on all properties and ranges on Object Properties is intentional, but debatable. Domains help detect nonsense( observation of Jonathan Rees) but they nail down some semantics and need careful thought. Since the content of an annotation is meant to be expressed in a controlled vocabulary chosen by the Annotator at annotation time, maybe that is also the point at which domains and ranges should be specified, so that consuming applications can detect whether an annotation is internally consistent on its own terms. --Bob Morris 16:07, 27 September 2010 (UTC)
Annotation ontology diagram
The diagram was produced by the CMapTools ontology editor.
Errata: hasInterpretationURI takes values xsd:anyURI, not xsd:dateTime.
In the example below, an Annotator named James Macklin asserts that in a cited specimen record, the spelling 'Mangalia' of a DarwinCore country name should be 'Mongolia'. He expects that anyone holding this record would want to replace that spelling, though a consumer of the annotation can do what they like. Keep in mind that dwcterms:DwcOwlFragment is a subclass of annot:InterpretableObject. The hasContent is way at the right...sorry about that.
Perhaps more notes during or after TDWG.