TDWG2010 Annotation Use Cases

From FilteredPush
Jump to: navigation, search

Contents

Presentations

Presentations in introductory session.

Michael Giddens, SilverBiology

Cyberflora of Louisiana, imaging 1.3 million herbarium sheets. Each images has scale bar. Use Case: Add annotations to images describing plant morphology, e.g. leaf widths. Goal to share annotated data to do things like building interactive keys.

Discussion: Primary object the image (or the specimen), annotation is added information. New determination is a form of annotation, familiar as being written on strip of paper which is then added to the herbarium sheet, or added as an additional label to the tray/bottle/pin container for the specimen/lot.

Question about how to expose annotations.

Question: What other than the original object (specimen or observation) is an annotation?

Jason Best, BRIT

Images of herbarium sheets, extracting textural data ultimately into semantically structured content, involving both machine (OCR, parsing) and human (ROI recognition, parsing) processes. Creation of Regions of Interest and classification of those ROIs, OCR of text found in those ROI's can be considered as an annotation, Likewise human transcription of text. Text blocks are marked up, e.g. by recognising which bits are the collector name, this can be thought of as another annotation.

Amanda Neil, BRIT

Atrium - online access to plant collections. Forms for displaying specimen data. Forms for collecting assertions about the specimen (collection data, specimen, collection images), as annotations. Typed annotations: e.g. Determination. Record basis of determination (e.g. saw a particular herbarium sheet , or saw field photographs, or saw cumulative evidence). Also record Annotator, Annotator's institution, date of annotation, notes about the annotation. Can print out annotation label and associate with the herbarium sheet. Can apply annotation type other to particular sheets - e.g. remark about removal of sample for DNA analysis.

Rod Spears, Specify

Morphbank-Specify integration. People are able to annotate specimen images in Morphbank (e.g. new determinations), need to transport these annotations to relevant specimen records in Specify. Example set of Mexican plants digitized in Michigan, correlating records with CONABIO and GBIF data looking for duplicate specimen records (GBIF data: 32 million plant specimen records, about 11 million with collector numbers [Lots of Data Quality issues: Basis of Record has 1600 different values; Collector number and Collector transposed]). Record sets in workbench - set of related record s: Ability to create a consensus record by dragging fields amongst set, button to choose base record from set, ability to all treatable as annotations. Records get changed, desire to track and disseminate these changes. Use Case: Disseminate Annotations.

Arron Steele, Silver Lining

Morphbank publishing annotations, Specify subscribing. Issue of scale - large numbers of subscribers, large numbers of annotations. Investigating PubSubHubbub http://en.wikipedia.org/wiki/Pubsubhubbub as a model for handling large numbers of subscribers.

Brian Heidorn, BiSciCol Tracked

Just Funded by NSF. Biological Science Collections Tracker. Collections are distributed.

Multiple duplicates distributed, multiple related specimens, one specimen in one location annotated, desire to distribute annotations to related objects.

Update harvester collects assertions about specimens, assertions are made about relationships amongst objects. Assertions are distribute to related objects.

Links created between objects, e.g. specimen X is duplicate of specimen y, x is related to a unique identifier in genbank. Resolvers for different guids in the system.

Use Case: distribute annotations amongst related objects. Use Case: mark objexxts as related.

Needs gateway to managed filtering input of large numbers of incomming annotations. Annotation: human assertion about another annotation (e.g. accepted/rejected).

Jetz, ?? Synthesis/Biocase Annotations

Biocase: Prototype, model for annotations didn't work as well as desired. Data portal for annotations, annotation service integrated in to portat. Ability to collect assertions about records observed in portal (ABCD, ~1000, elements). Use Case: Allow users to annotate rich content. Went back to ABCD annotation document, allowed users to annotate that document, stored different document, used style sheet ot display differences between versions of xml document. User gets xml from portal, user changes xml document, user returns changed cocument, xml diff and style sheet display annotations. Very fliexbile. Issuies: Invalid xml Documents, too many real world records were ndt' falid. Nwtworking issues, needed to retrieve raw xml from provider to portal then to user.

Greg Riccardi, Morphank

Linnaeus: Collecto specimens, make drawings illustrate morphoilogy.

CTOL Images. Apparent work procsses, collect image, annoate (illustrate specific collections of characters in image) . Presence of one or more characters of interest in image. Photoshop images with embedded arrows marking anatomical characters. Issues: can't detect photoshop arrowss, an image contains multiple characters, expose annotation as an independet discoverable item assocated with but not embeedded in an image.

Annotated character occuring in part of image - term in ontology descrbes a feature in an image. This image of this specimen has an area of interest which contains this feature which causes me to say that this specimen belongs to this species. Morphbank (images, annotations in image) - Specify (Specimens) - Morphster (ontology terms) integration.

Paul J. Morris, Filtered Push

Presentation: File:TDWG2010 5minuteFP.pdf

Walter: Need to be free of the specimen data concepts in making assertions about other sorts of data such as checklists. Users may desire not seeing constant stream of annotations, but see aggregated results of annotations and changes to a data set ten years later.

Second Session Use Case Discussion

Lots of examples so far of specimens, other subjects exist.

Subjects of annotations

Annotations about annotations. Annotations about images. Annotations about specimens. Annotations about data sets. Annotations about literature.

Rich: Almost every piece of information related to an assertions - all of our data could be GUID representing objects and a long list of assertions about those objects. Where is the boundary between annotations and the data? Taxonomy is a domain where annotations are common. Assertions about taxon names are also subject to annotation.

Walter: Locality of annotations? Created in databases, created on the web. Where are we going to store annotations? How are we going to communicate annotations.

James: Specimen with annotations on sheet, are the database records of these annotations themselves annotations?

Rich: Let's keep this practical.

James: Annotation at the level of the record. Annotation at the level of the field/attribute. Annotations at the level of the set. Annotation can define a set.

Walter: Sets can be defined by criteria (e.g. all specimens from Harvard with elevation greater than 1000 m.).

(?). Scope includes metadata about collections themselves? If yes, going down a very slippery slope.

Jim: Why try to constrain what levels can people can annotate? Annotation model should be general and able to annotate at a particular level.

Walter: Yes, scope should be any level, but this does increase complexity of implementation.

(?): Back to Niko's point, one person's data is another's metadata.Scared about annotating virtually defined sets, sets of specimens defined by altitued.

Rich: Anything that has a GUID should be annotatable. What about annotations of arbitrarily defined sets, issue of inheritance - members of set may not inherit annotations about set. Without persistent identifier of set, can't identify.

(?): Work in a lab with microsope, look at bug. I used this body of information to identify this critter. Use Case: Document basis of making a determination of an observation. Use basis of the determination as a way of linking the concept used in making the initial determination with the subsequent

James: Question: what part of that story becomes an annotation?

(?): Orignal record instantiated in my database. Everything subsequent seems to be an annotation of the original record. Use Case: Document inferences made about original data. Use Case: Provide changes/updates to original data.

(?) Use case from Barcoding community. Something collected in the field (not yet a specimen until accessioned into a holding institution), samples extracted, sent to labs, multiple extractions, process steps, sequences derived. Process steps are stored in different databases - term annotation doesn't seem to fit well to this series of process steps that produce chains of related data? Can annotation track these parent - child relationships. These relationships may be flattened out or become higher level records. Tightly related to workflows.

Annotating workflows? Annotation of workflow steps?

Assertion that something is a child record of something else can be an annotation. Identification of duplicate sets exist by conventions in herbarium community (same collector, same collector number, different annotations).

(?) Since we do not build our databases with tracking of changes, comments on changes to individual fields and linkages between separeate records (specimen database wants to link to rather than injest barcode data). Use Case: Link Records. Use Case: Make corrections to incorrect links.

(?) Requirement: ability to uniquely identify records.

Rich: At a top level, putting aside what is a guid, can you attach an annotation to a piece of logic. E.g. everyhing in your collection called xus yus should be called Xus zus. Is this an annotation expressed as a annotation of a logical set - applies to things that don't exist yet - time dependent.

Bob: FP working with Kepler workflow project. Annotations able to schedule rexamination of data and conclusions. Ability to consider Apollo system with 96 bit event space with last 24 bits being a time.

Rich: As collection manager entering data, informed that annotation exists for this set of circumstances which you just met, your are told that you can

(?) What do you do with annotations that are no longer relevant?

Bob: Annotations to annotations (though this brings complications of discovery).

Use Case: Annotations of annotations expressing opinions about rejection.

Niko: (missing important bits), annotations, assertions what's the difference?

Rich: Use case: Inform user of annotations that may apply to their current data.

Niko: Let's stop here at annotation of existing data objects, not annotating things that don't exist yet.

Walter: Return to object of annotation. Annotating an xml document by making changes to that document, don't we need means, e.g. schema, to define what was annotated, in this case, two annotation documents might have the same identifier, just different versions.

Third Session: Domain Issues

Introduction.

James: Can the Darwin Core handle all the concerns in the domain, without the need for any additional description of annotations?

Donald Hobern, ALA

ALA, Atlas of Living Austrailia , Australian government initiative linking data from as many primary sources as possible and improving the quality of those data by building new linkages.

Approach very conditioned by ideas of linked data. Open ended model. Things than can be identified and refered to with stable identifiers. Clarify as a community what sort of things those are, specimen record, taxon concept etc. Facilitate exposure of sets of properties associated with any identified object. Atlas should be trying to achieve everyone putting up their data with identifiers and associated information, ALA stitcthes this a s patchwork of connected data.; Catalogs of protected areas, as axis for organizing data as an example. People provide sets of assertions about their objects, goal is connecting thes things.

Recognized that there are at least two very significant ways of enriching this web of data 1) a member of the public or a researcher may be able to come to web site and tell us about things we didn't know before. E.g specialist adding new identification of an image. Possibility for user interface for people to provide additional properties about one of the identified things. Working for three years with annotations in a very general set Annotations as snippets of information for something for which we have an identifier. Use case: Form on a web page to capture structured set of properties about of an identified object. This immediately leads to a generalized concept of agent as providing an annotation, leading to automated quality control and data enhancement agents. Different users of different tools can assign annotations to different objects in different contexts, an context can determine how to display these to users. Team at university of Queensland, using Annotea (pseudo-)standard for annotations.

Use case: embed tool in browser allowing user to annotate any web page. Less usedfull than usecase of annotating identified objects known to system.

Turning out to be harder than forceen to use this general concept of annotation to the atlas. Snippets of data can be stored, but is often easier to store locally as other tables in data cache for aggregating data, as domain specific ideas. Backends and tools are tending to be relatively tightly tied to the web UI fro particular views of the data.

Examples of interfaces in progress: Proposing corrections to darwin core specimen and observation data. Thumb up /thumb down for accuracy or value of specific data elements, particularly common names and images - is this common name (in french, but not coded as such), as valuable as this common name (australian). Annotations of BHL - article boundaries in journal runs, linking species data (species descriptions, protologs, etc) to species names. Identifying illustrations in the literature to the relevant linked data (the species shown in the illustrations, the specimens shown in the illustrations), here user annotations become very interesting.

Use Case: Propose Correction.

Use Case: Rate Value of Linked Content/Property values. Use Case: Collect Feedback. Use Case: Tag properties.

Use Case: Tag identifiable objects with Properties.

Use Case: Annotate Structure of Documents.

Use Case: Annotate to link portions of documents to related data.

Annotation Agents: Specialists, General Public, Software.

Less yet concentration on workflows, distribution of annotations, chains of annotations.

Annotation typically coming from a source outside the set of standard data providers. Very open world concept of anyone can add assertions as properties to identifiable objects - visualization of who is saying what very interesting. Vision on IT side to ask what kind of tool that will allow you to see all of information from original sources with subsequent assertions, with level of trust slider to display changing links in graph.

Some stages of concepts: record is (raw) original data in darwin core. Record is internally augmented (e.g. standardizing use of names) by source. Record has added corrections from subsequent sources. Record has added properties with cardinality of one to one. Record has added properties with cardinalites other than one to one. Compositonal rules of how annotations are supposed to relate to the orignal source data. Variable cardinality an interesting issue.

Discussion

Are there conventions we need to devise within the domain? Links between objects?

William: Are there categories of annotation at the annotation level? This is a georeference that corrects your data.

Amanda: There's a typing layer between Annotation and DomainAnnotationConventions. Refinement of annotations. New Georefernce as a refinement might require different elements than a New Georeference as a correction. Risk of not typing is having all possible properties available to users when making annotations, without some typing of understanding the purpose of the annotation and the sorts of things that might go into it.

James: Can we handle a very substantial set of cases by defining a small set of Domain Annotation Conventions.

Annotation has a subject. Some domains might have subjects (identifiable objects) for which a particular type of annotation does not apply. Do we allow people to make assertions that we can't understand.

(?) Multiple actions. Can we make multiple assertions in the same annotation.

Is the DomainAnnotationConvention layer simply a user interface issue? Is typing of annotations relevant at this level? These look like actions that the annotation proposes.

(?) Person making annotation may not know what the action to be taken should be.

(?) Tie at the user interface level the actions to be made to the choice of domain. Subject is a specimen, motivation is a correction, inferr that this is a determination. Generalize to the actions to be taken by the

Pete: Domain Annotation Conventions look like a controlled vocabulary for expectations, or the expected consequences that the annotator would like to have happen on the object. DomainAnnotationConventions as just an extension of hasExpectation for the domain. Controlled vocabulary extends what the annotator intends for the annotation.

Zhimin: Domain Annotation Conventions connects the general concept of annotation with domain specific vocabularies - how to draw connections between the domain concepts and the motivations and expectations.

Pete: Annotation hasPurpose: Deliver a new determination. Different from has motivation "to give feedback to the author of the dataset" hasMotivation sounds too subjective. Purpose of an annotation sounds very helpfull. Has purpose tied to a controled vocabulary? My expectation might be different if I am annotating a piece of my agency's data than if I am annotating someone elses data.

Jason: Has expectation is likely to differ from the user who submits than in the underlying systems.

(?) From perspective of collections database administrator in having people improve my data, I'm interested in the subject, the annotator, and the content, much less so in the expectation or the motivation or the evidence. I get to choose which things I accept.

James: Can you annotate your own data?

(?) Wikipedia style approach, annotation rolled back, describable, motivation describable, but very optional. Changes made can easily be tracked, reviewed, flagged.

Use Case: Bring annotations of interest to the attention of data curators.

Trust level is a optional choice of the data curator.

Conclusions: hasMotivation very subjective. Propose replacing with hasPurpose. Domain specific values for has purpose handle the DomainAnnotationConventions layer.

Proposal for Terms for hasPurpose

  • Interpretation
  • Augmentation
  • Correction
  • Refinement
  • Replacement

Fourth Session: Transport and Interchange

Discussion:

Is hasUrgency a needed concept.

Is hasUrgency a domain relevant issue, as in severity and urgency of the issue in bugtracking systems - is this a blocker. Or is this a transport issue - this needs to be delivered at a high priority.

There are definitely annotation level issues of urgency and severity.

John: http://code.google.com/p/pubsubhubbub/

John: There is an available hub for distribution of pub/sub messages. Provider receives data, passes on to pubsub hub, messages are passed on to subscribers. E.g. vocabulary manager subscribes to vertnet for any vocabulary for any term in 'sex' that hasn't been seen before. Vocabulary manager can map that term to the existing set of standard values for 'sex'. Optimized messaging on the same protocol without having to build an infrastructure.

Clear requirement for systems to push annotations to subscribers.

PubSubHubBub will be tested in the Silver Lining project.

DiGIR failures due to direct tie of transport to schema. Very important to separate the transport from the content being transported.

John: Missing layer here - applications that are handling the messages (where urgency perhaps applies). Institutions lack ability to handle any incoming messages, let alone urgent ones... Except in a financial model... I will answer your urgent questions for $0.50 now.

Urgency is relevant, but not to the discussion of annotations.

Questions

(?) Subject of annotation is a DWC resource, can subjects be other domains. Observations working group has been discussing annotations, with a very high level of detail: This particular observation was on a particular entity and it used a particular measurement standard, and it was a measurement of a particular (e.g. height of a tree) attribute.

Answer: Yes. Annotation layer should be domain independent.

John: DomainAnnotationConventions causes some concern. A couple of use cases: Ad hoc observation about a record - I was there in 2006, it didn't look like this place. I was there in 2006 and I have a set of very specific things.

Typing: DomainAnnotationsConvention - should probably move off to separate templates - not a concern of annotations in any way, but "Pete's annotation template" or best practice documents defining best practices for making domain specific annotations useful to people.

Take the green layer, build something that uses it and let people use it.

Use case scenario: provide an image of a leaf off a herbarium sheet to citizen scientist and let them classify the leaf shape.

Brief discussion of FilteredPush progress and workplan.

Matt: hasSubject? In EnityQuality ontology there is a quality, is there an equivalent to this in annotation here? Answer, not sure.

Niko: Have a parent-child relationship, what are the guarantees that a query on one will retrieve the annotations related to both.

Bob: Instance of a FP network has permanent network store of annotations. The instance can set rules for the tree propagation of queries, based on the configuration of that instance of the network. It is possible that this answering the questions may be an NP complete problem. It isn't clear yet if this will be a problem for instances of FP networks in the domain.

Fifth Session: Ontologies

Bob, stake in the ground ontology AnnotationOntology designed to be shot at to find where the holes are to see where needs of the community aren't being met (particularly for interchanging annotations between annotation systems).

Note: Annotation Ontology Diagram, annot:hasInterpretationURI is not xds:dateTime but xsd:anyURI

Using a facility into which you can put all your terms regardless of the the vocabulary you are using should be no worse off than a plain social contract and leap of faith that things are similar, even if the result isn't able to be formally reasoned upon.

Most important thing: 1) philosphical point, I'm opposed to putting premature domains and ranges on properties, this is thus a very lightweight owl ontology, though it doesn't go through the Owl DL validators, probably for harmless reasons (FOAF being in play, and probably being the reason). 2) the critical thing is a class called InterpretableObject. It has very little in it, but has some properties (for which it would be the domain, if you were insistent, of hasContent and interpretationURI). Multiple instances of interpetable object in the example.

interpretationURI needs community to agree on informal semantics, in the example, a modified DarwinCore terms ontology that passes the owl2 validator. For things like the subject (what the annotation is about), and the content (what the annotation is asserting), interpretationURI and hasContent carry the information. Community agreement, if I see the a particular namespace in the interpretationURI I will take the content and do with it what I normally do with anything else in that namespace. hasContent bears the stuff in the interpretationURI's namespace.

hasSubject and hasContent are both interpretableObjects. Probably, in practice, most annotations in most people's annotation systems will be about something and will say something about it.

To DarwinCore terms, had to add a class, dwcterms:DwCOwlFragment, declared as a subclass of an interpretableObject. dwcterms:DwCOwlFragment can have all of the darwin core terms applied to it - this conversationally provides a place off of which to hang the darwin core terms.

Probably not very controversial: annotations have annotators (unfortunately chose FOAF to add content in the example), annotations occur at dates and time.

Example. the annotation hasSubject which is an interpretableObject with hasInterpretationURI of DarwinCore making a convention that the attached terms are darwin core terms. Validator if there are enough formal semantics would be prepared to say that "you told me that this would be darwin core and you gave me some mrtg terms and i don't know what to do." Several terms here, catalog number, institution code, and original data of darwinCore country "Mangalia." Could be a need for some formal semantics here instead of the convention of the nature of the attached terms. Requirement for hasSubject to either tell the consumer what the subject is, or enough information to query to find out what the subject is.

Model: who, what, why, where, when.

hasContent is similar to hasSubject, and interpretableObject.

Important use case in our minds, the subject and content of the annotation may not use the same vocabularies, simple example, content of annotation is encrypted, but subject isn't.

In this example, interpretationURI for hasContent is again darwin core, and content is now dwcterms:country = "Mongolia".

Some things here, perhaps about transporation, but likely needed for the annotations to be useful, and it is possible to stop here. Additional things that the consumer is likely to be interested in: Why did the annotator say this? What am I supposed to do about this? hasMotivation "I looked this up in a gaziteer and I didn't find this", (new) hasPurpose, hasExpectation - community agreement needed on common expectations, might need guidance from interest group. hasExpectation here is "update", indicating the annotator's expectation is that the recipient will fix the record in their database.

Framework able to handle the controlled vocabularies produced by the communities who are annotating their data. Everything here should be tested with "first do no harm", is this going forward or going backwards?

Is this something someone in some other community allready knows how to do (e.g. is this a reinvention of named graphs)?

Discussion

Amanda: FOAF, James Macklin as a FOAF:person. Does it make more sense to include this here, or is this pointing to a need to extend DarwinCore to include annotations/annotators.

Bob: FOAF available to use in Protege in early stages of the development. Is the DarwinCore rich enough to encompass these terms, question for DarwinCore folks. For this session, the discussion is looking for a level that is free of the particular vocabularies. But more important, can this ontology handle either the case of an extended DarwinCore that can identify annotators or the case of annotators being describe by FOAF, or annotators being described in ABCD. Ontology still survives if more than one vocabulary is used for one, e.g. annotator.

Matt: Does this model fit well with EntityQuality ontology? Which is the Entity, which is the Quality? Thinking through the example, the subject is the Entity, the content is the Quality?

Mark: This seems to be a curation process ontology.

Bob: Involved with folks involved in electronic publishing, can make a convincing example, similarites to here, fitting this model that fit the concepts of annotating data in electronic publishing. That is an important test, however, and we need to compile a very broad scope of use cases for annotations on the interest group's TDWG wiki. Slighly backwards to say that this is an annotation ontology before we have said what annotations are supposed to accomplish, thus a means of investigating the interest group's concerns.

Bob: What if someone is completely satisfied with Annotea, but they want to exchange annotations with other system, is that something that this interest group needs to address?

Personal tools
Namespaces

Variants
Actions
Navigation
SMW
Toolbox
All Hands Meeting