ApplePieRules

From FilteredPush
Jump to: navigation, search


Contents


This page defines a set of rules about what kind of curation request could be expressed through the annotation in the ApplePie. The questions and the things need to be discussed are marked as red.

Please put your comment in the discussion page.

Rules Pertaining to Annotations

The rules in this category is only about the annotations themselves, like the annotate date, annotator, evidence etc. They're common information of the annotation for all the domains.

AnnotateDateRule

  • Each annotation has zero or one annotate date
  • Each annotation date has information:
    • Required
      • The date time when the annotation is created (this can be the date time when the annotation document is created, which might follow the date time when a person carried out an action, such as a form submission, to create the annotation).(oa:annotatedAt)

AnnotatorRule

The annotator object is in type of foaf:Person.

  • Each annotation has one or more annotators (oa:annotatedBy)
  • Each annotator has information:
    • Required
      • full name (foaf:name)
    • Optional
      • institution (MISSING)
      • phone (foaf:phone)
      • mailbox (foaf:mbox)
      • hash of email address (foaf unique identifier) (foaf:mbox_sha1sum)

MotivationRule

See Motivations .

The motivation object is of type of oa:Motivation. A motivation is an instance, taken where possible from the list of canonical ones listed in FirstParameterMustBePresent Motivations. If textual form is required, it should follow the Template:CNT model, by typing the instance as a

  • Each annotation has zero, one or more motivations (oa:motivatedBy)
  • Each motivation has information
    • Required
      • Plain text describing the motivation. This requires adding rdf:type cnt:ContentAsText to the instance of oa:Motivation, and setting on the instance the value of cnt:chars to the the desired text. (TODO Put an example here).

EvidenceRule

The evidence object is in type of aod:Evidence

  • Each annotation has zero, one or more evidence (aod:hasEvidence)
  • Each evidence has information:
    • Required
      • Plain text describing the evidence (Informally, we expect evidence to be the primary motivation for annotations in science).(aod:asText). TODO: rephrase for CNT.
    • Optional

ExpectationRule

  • Each annotation has one expectation (oad:hasExpectation)
  • Expectation can only be one of the following (subclass of aod:Expectation):
    • Expectation_Insert (oad:Expectation_Insert)
    • Expectation_Update (oad:Expectation_Update)
    • Expectation_Delete (oad:Expectation_Delete)
    • Expectation_Group (oad:Expectation_Group)
    • Expectation_Solve_With_More_Data (oad:Expectation_Solve_With_More_Data)

CardinalityRule

  • Annotations have one and only one oa:Target
  • Annotations have one to many hasBody which declares what should be changed for the Target.
  • Annotation set is not used.

ResponseAnnotationRule

It's an annotation that assert accept/reject of another annotation.

The content and structure of reponse annotation needs to be decided.

  • annotator (optional)
  • annotation date/time (optional)
  • Target (subject of the Annotation)
    • id of annotated annotation
  • hasBody
    • A controlled vocabulary term for Agree/Disagree/Neutral.
  • expectation
    • Expectation_Insert
  • Motivation
    • Optional text describing motivation for Accept/Reject. Statement of application of auto filter rule could go here: "This annotation was accepted because it met the criteria of a filtering rule established by _x_ on _y_" Annotator would then be the software agent applying the filter, rather than the human who set the filter.
  • Evidence
    • Optional text describing evidence for Agree/Reject.

Rules Pertaining to Annotations Type and Relative Information in the Biodiversity Domain

The rules in this category define the biodiversity domain specific information expressed in the annotation.

AnnotationTypeRule

The curation requests supported by ApplePie could be classified into the following annotation types. And for each type, the required information in the annotation is also declared in the following.

Insert_Identification

It's used to add a new determination for a specimen.

Insert_Location

It's used to add a georeference for a locality text.

Insert_Occurence

Used to insert a new specimen record.

Update_Identification

Used to update each field linked to an identification of a collection object. The identification in selector is used to locate to the identification record(s) to be updated. While the identification in Body contains the information expected to be updated to.

Update_Location

Used to update each field linked to an georeference of a collection object. The Location in selector is used to locate to the georeference record(s) to be updated. While the Location in Body contains the information expected to be updated to.

Update_Occurence

Used to update any field linked to a collection object.

Cluster_Duplicates

This annotation is used to assert a group of occurrence objects belong to a duplicate set. The collection managers receive this message as a notification without need to map it into their databases. So mapper doesn't handle this annotation type.

The possible content of this annotation is:

  • Target: list of Occurrence
  • Expectation: Expectation_Group

Show_Inconsistency

This annotation indicates that there's something wrong with the specified occurrence object. But the annotator doesn't know for sure what the problem is.

  • Target: Identifier
    • occurenceID or ((CollectionCode or CollectionId), catalogNumber)
  • Body: describing the inconsistency in plain text instead of DwC concepts
  • Expectation: Expectation_Solve_With_More_Data

Systematic_Error

Use case identifies annotation on data set marking a systematic problem with all records in that data set (such as a transposition of latitude and longitude in the presentation). Such annotation could be initiated anywhere where a systematic error could be present in the data or in the mappings. Other examples could include incorrect mapping of year/month/day, or transposition of day and month (between European and US conventions). It is likely that some data sets will include a mixture of day/month/year and month/day/year data that have been mapped in one way onto year/month/day (making part of the data set incorrectly mapped - a class of error where the error can be detected statistically in the data set, but individual records may not be able to be corrected).

  • Target: Identifier
    • DataSetName or (CollectionCode or CollectionId)
  • Body: plain text describing the problem
  • Expectation: Expectation_Solve_With_More_Data

Delete_Identification

Currently we don't have any good use case for delete stuff. But we won't exclude the possibility. So We keep this one mainly for test purpose.

It's used to delete an identification of a collection object. The Identification in the Body is used to locate the identifiction record(s) to be deleted.

IdentifierRule

A group of identifiers are represented by a Target in type of dwc:references to point to the specific collection object or data collection etc.

  • If it is the Target of Insert_Determination, Insert_Georeference, Update_Determination, Update_Georeference, Update_Occurence, Show_Inconsistency and Delete_Determination, then the identifiers are used to uniquely locate the occurrence records. See also: CodesAndNumbers rule in AppleCore.
    • OccurrenceID (rdf:reference)
    • or DarwinCore Triplet (dwcFP:DwCTripletSelector subtype of oad:KVPairQuerySelector)
      • InstitutionCode (dwc:institutionCode)
      • CollectionCode (Herbarium Acronym) (dwc:collectionCode) or CollectionID (Biocol LSID for the herbarium) (dwc:collectionID)
      • catalogNumber (dwc:catalogNumber)
  • If it's the Target of Insert_Occurrence
    • CollectionCode or CollectionId
  • If it's the Target of Systematic_Error
    • DataSetName or (CollectionCode or CollectionId)

OccurrenceRule

The Occurrence object is in type of dwcFP:Occurrence.

  • If the Occurrence is the Body of Insert_Occurence, the Occurrence individual should contain complete Occurrence information which are:
    • Optional
      • catalogNum (dwc:catalogNumber)
      • RecordedBy (dwc:recordedBy)
      • RecordNumber (see also: CodesAndNumbers rule in AppleCore) (dwc:recordNumber)
      • YearCollected (dwc:year)
      • MonthCollected (dwc:month)
      • DayCollected (dwc:day)
      • DecimalLatitude (dwc:decimalLatitude)
      • DecimalLongitude (dwc:decimalLongitude)
      • Country (dwc:country)
      • StateProvince (dwc:stateProvince)
      • County (dwc:county)
      • ScientificName (dwc:scientificName)
      • ScientificNameAuthorship (dwc:scientificNameAuthorship)
      • GeodeticDatum (dwc:geodeticDatum)
      • Family (dwc:family)
      • Locality (dwc:locality)
  • If the Occurrence is the Body of Update_Occurrence, the Occurrence individual should contain a fragment of occurrence information that the annotator want to change which are:
    • Optional
      • RecordedBy
      • RecordNumber
      • YearCollected
      • MonthCollected
      • DayCollected
      • GeodeticDatum
      • DecimalLatitude
      • DecimalLongitude
      • Country
      • StateProvince
      • County
      • Locality
      • Family
      • ScientificName
      • ScientificNameAuthorship

IdentificationRule

The identification object is in type of dwcFP:Identification.

see also Identification ruleand ScientificName rule in AppleCore.

  • If the identification is the Body of Insert_Determination, the Identification individual should contain complete information about a new determination which are:
    • Required
      • identifiedBy (dwc:identifiedBy)
      • dateIdentified (dwc:dateIdentified)
      • ScientificName (dwc:scientificName)
      • ScientificNameAuthorship (dwc:scientificNameAuthorship)
    • Optional
      • genus (dwc:genus)
      • subgenus (dwc:subgenus)
      • specificEpithet (dwc:specificEpithet)
      • infraspecificEpithet (dwc:infraspecificEpithet)
      • taxonRank (dwc:taxonRank)
      • identificationQualifier (dwc:identificationQualifier)
  • If the identification is the Body of Delete_Determination or Body or selector of Update_Determination, the Identification individual only needs to contain a fragment of determination information which are:
    • Optional
      • identifiedBy
      • dateIdentified
      • ScientificName
      • ScientificNameAuthorship
      • genus
      • subgenus
      • specificEpithet
      • infraspecificEpithet
      • taxonRank
      • identificationQualifier

LocationRule

The location object is in type of dwcFP:Georeference.

  • If the location is the Body of Insert_Georeference, the Location individual should contain complete information about a new coordinates which are:
    • Required
      • decimalLatitude (dwc:decimalLatitude)
      • decimalLongitude (dwc:decimalLongitude)
      • geodeticDatum (dwc:geodeticDatum)
      • coordinateUncertaintyInMeters (dwc:coordinateUncertaintyInMeters)
      • georeferencedBy (dwc:georeferencedBy)
      • georeferenceProtocol (dwc:georeferenceProtocol)
    • Optional
      • coordinatePrecision (dwc:coordinatePrecision)
      • georeferenceSources (dwc:georeferenceSources)
      • georeferenceVerificationStatus (dwc:georeferenceVerificationStatus)
      • georeferenceRemarks (dwc:georeferenceRemarks)
      • verbatimCoordinates (dwc:verbatimCoordinates)
      • pointRadiusSpatialFit (dwc:pointRadiusSpatialFit)
      • footprintWKT (dwc:footprintWKT)
      • footprintSRS (dwc:footprintSRS)
      • footprintSpatialFit (dwc:footprintSpatialFit)
  • If the location is the Body or selector of Update_Georeference, the Location individual only need to contain a fragment of coordinates information which are:
    • Optional
      • decimalLatitude
      • decimalLongitude
      • geodeticDatum
      • coordinateUncertaintyInMeters
      • georeferencedBy
      • georeferenceProtocol
      • coordinatePrecision
      • georeferenceSources
      • georeferenceVerificationStatus
      • georeferenceRemarks
      • verbatimCoordinates
      • pointRadiusSpatialFit
      • footprintWKT
      • footprintSRS
      • footprintSpatialFit

Possible Extension in the Future

We might need to extend the annotation in the following aspects to express curation request with richer content (e.g. for the FNA/mediawiki deliverable).

EvidenceRule

To Be Decided

The following stuff need to be decided.

catalogNum known by curator when insert_occurrence

Do the curators know which catalogNum they should use for the occurrence they want to insert into a collection? If each database has specific rule to make the catalogNum, the curators might don't know that. Then this information should be left for the local database administrator to put in when the annotation is decided.

update request need to be refined

  • What's the proper Target for update_occurrence?
  • For the update_determination and update_georeference, do you think the fields listed in the IdentificationRule and GeoReferenceRule have include all (or mostly used) the possible fields to be updated?
  • For the update_determination and update_georeference, the Target contains the identifier for the collection object while the Body contains the identifier for the identification or georeference. The two identifiers together will determine the target identification or georeference that is to be updated. Ideally the two identifiers should be both in Target pointing to the to be updated target. But in this way, we don't have a proper type for this kind of Target because it's neither an occurrence nor an identification or location. Is it OK to separate the identifier into Target and Body?
  • Does all the dataset expose the identifier for the identification to the curators? The HUH database doesn’t do that (barcode00107080). In this case, how can the curator know the identifier to the identification? The same thing happens with the identifier to the georeference. If the curator don't know these identifiers, shall we still allow them to inject annotations?
  • Is there any other update request we should support? Shall we support update of botanist or taxon name (It's not the taxon name in identification. The example is: There's a family in taxonomy which has name of A. But now we want to change its name to B.)?

one Target constraint vs cluster_duplicates

Although the cluster_duplicates annotation won't go through the Mapper, yet such annotation will still be injected into the network and subscribed by collection managers. So we still need to decide its content and structure.

With one Target constraint, how to model the "cluster_duplicates" annotation?

One solution (proposed by Paul): annotatesData is a set of duplicates, and the hasBody is a list of occurences to add to the set. The question then becomes how to create a set.

  • What does the set mean? Is it an identifier of a duplicate set? What if the annotation tends to create a new duplicate set?


Another solution is we put a list of collection object that should be clustered into one duplicate set in the Target and change the one Target constraint to:

  • If it's cluster_duplicates annotation, then the annotation can has one to many annotatesData (each of which points to one record belonging to this duplicate set).
  • Otherwise each annotation can have one and only one annotatesData

This solution will result with empty Body.

Which solution shall we use? Or do we have other solutions?

allow empty Body?

Usually the annotation contains both Target and Body. Therefore we can say insert a georeference(Body) for the collection object (Target).

But sometimes we might don't need Body, like annotation in type of "insert_occurrence". So we don't need to say the CollectionCode in the Target and say the occurrence data which also contains the CollectionCode in the Body. We can simply put the whole occurrence data in the Target which contains everything we need.

Can we have empty Body? Or do we want to stick to the rule that both Target and Body must appear in the annotation?

need DatasetIdentifier or CollectionIdentifier type?

In the Target of "systematic_error" and "insert_occurrence" annotation, we might need to point to the specific dataset or colleciton.

The Target of "systematic_error" is DataSet and CollectionCode(or CollectionID). Do we need to come up with DataSetIdentifierType to represent these two identifiers in the Target?

The Target of "insert_occurrence" is CollectionCode(or CollectionID). Do we need to come up with CollectoinIdentifierType to represent the collection identifier in the Target instead of simply using String type? And if in this case the Target is a plain string, how can the annotation parser know that it's a collectionCode or collectionId? Currently in the dwcFPModel, there's DataProperty of collectionCode and collectionId. But for here, we might need Class or DataType instead of property.

what does response annotation looks like

When the collection managers receive the annotation, they will decide whether to accept or reject the annotation, and may render a judgment on it. Such response is also expressed as an annotation and injected into the FPush network. It's an annotation of an annotation.

The question is what is the content and structure of the response annotation?

Current discussion seems to favor acceptance resulting in a FP message indicating changed data, and a FP message containing an annotation with the judgement (agree/disagree/neutral). Sending a FP message indicating changed data has a synchronization issue - the data available for harvest may not have changed at the time the change is actually written into the database - informing interested parties that changed data is available for harvest is probably a responsibility of the data provider itself, not the annotation processor.

Available for terms about judgement is An OWL Ontology of Norms and Normative Judgements, but the relevant term here (NormativeJudgement:Evaluative) describes a normative judgment, not a single person's opinion.

Available for terms about opinion is: MARL. MARL has relevant opinion class, hasPolarity, and opinionText.

We could consider using HELIOS resolution to indicate accept/reject (fix/{invalid,duplicate,wontfix}). In an environment where annotations are data change requests, helios' description of issue tracking is relevant. Looking further, the elements we want are available from its parent BOM (bug ontology model).

Proposal

Annotation uses Marl to express opinion, and BOM to express resolution of issue posed by first annotation.

  • Annotation
    • hasTarget: GUID of primary Annotation
    • hasBody: type marl:Opinion
      • marl:hasPolarity {marl:Positive, marl:Neutral, marl:Negative}
      • marl:opinionText annotator's statement of opinion
      • marl:describesObject GUID of primary Annotation
        • type bom:Issue
        • hasResolution type {bom:Fix,bom:Invalid, bom:WontFix, bom:Duplicate, bom:ThirdParty}
    • Expectation type Expectation_Insert
    • Evidence asText annotator's evidence for their opinion TODO:rephrase with CNT

Paul J. Morris 10:50, 6 February 2012 (EST)

support for structured Body

The example that we might need structured Body is "insert_determination" annotation with identification history.

Currently we don't have good solution for supporting structured Body or maybe we don't want to since the flat Body will keep everything simple. So our decision for now is we won't support insert identification history in the "insert_determination" annotation. But we might come back in the future to change this.

Validation Tools

ValidationTools is a page devoted to tools for validating annotations based on Rules or other constraints.

Personal tools
Namespaces

Variants
Actions
Navigation
SMW
Toolbox
All Hands Meeting