Requirements development for QC UI

From Filtered Push Wiki
Jump to: navigation, search

Elaborating on Quality_Control_New_Record

See Also:

Scenarios

With AnnotationProcessor/WebClient UI Implications

Network: A data curator logs into an instance of the annotation processor and launches a quality control job on data that has been harvested from their collection [Is this happening within Specify or other CMS? Seems like we would still be in the web client but "pushing" the results directly into the data store when finished...]. The workflow loads data from their collection, actors in the workflow identify quality control issues in the data, actors in the workflow create and launch into the network annotations describing the data quality issues. The data curator processes the resulting annotations. The data curator can also view the results of the quality control analysis in overview, and in detail examine the provenance traces to understand why particular results were or were not asserted. The data curator can use the results to establish rules for automatic filtering/processing of the annotations produced from the analysis. [Question 1: If a data curator launches a quality control job and receives a result from a georeferencing actor that "Latitude" and "Longitude" do not match "Higher_geography" is this an annotation? It seems like it is more like metadata supporting a potential annotation (decision by curator). Suggest to call this a system-generated notification/alert/(=a type of annotation?)... to differentiate it from a human decision]

Network: A data curator who is using a database system for which a Driver is not available logs into an instance of the annotation processor and launches a quality control job on data that has been harvested from their collection. The workflow loads data from their collection, actors in the workflow identify quality control issues in the data (actors in the workflow create and launch into the network annotations describing the data quality issues). The data curator can view, and sort by issue, the results of the quality control analysis in overview, and in detail examine the provenance traces to understand why particular results were or were not asserted. The data curator can download a row filtered subset of the result spreadsheet for manipulation and ingest into their database system.

Network: A researcher logs into an instance of the annotation processor and launches a job to retrieve and quality control data known to the network that is of interest to them (e.g. they query by higher taxon or by geography). After the workflow has completed, the researcher downloads the spreadsheet of augmented/quality controled results for analysis.

Network: A data curator logs into an instance of the annotation processor and launches a quality control job on data that has been harvested from their collection. The workflow loads data from their collection, actors in the workflow identify quality control issues in the data, no annotation actors are present in the workflow (or they are disabled). The data curator downloads the spreadsheet of results, manipulates it and uses it to augment data in their database. The data curator launches a new quality control job using narrower query parameters to augment a smaller subset of their data using annotations. [We are not sure we understand why this process would take place. Is this because the data is so large or dirty that you just want a first pass to know how bad it is and decide how to break up the problems to resolve them?]

[Suggest that we "type" the different kinds of annotation to allow better communication about them]

Kepler only

Local: A researcher has a data set, runs a Kepler quality control workflow on that data set (using actors that engage network services such as georeferencing and scientific name validation), and outputs the result to a spreadsheet.

Local: A data curator exports data from their database, runs a Kepler quality control workflow on that data set (using actors that engage network services such as georeferencing and scientific name validation), outputs the result to a spreadsheet, and then loads changed results into their database.

Requirements

Assumes: Results of a QC workflow, exclusive of provenance, can be represented as a flat spreadsheet (e.g. flat DarwinCore).

Assumes: Results of a QC workflow consist of (1) the results, (2) a per-row provenance summary for the results, and (3) provenance details.

AnnotationProcessor user can request the launch of a quality control analysis of harvested data.

AnnotationProcessor user can specify query parameters on which harvested data are to be analysed.

AnnotationProcessor user can discover which pre-defined quality control workflows are available.

AnnotationProcessor user can specify which pre-defined quality control workflow is to be run.

AnnotationProcessor user can download a spreadsheet of results (data from query, plus columns added by data augmentation, plus QC summary for each row).

AnnotationProcessor user can download a filtered set of rows of the spreadsheet of results (data from query, plus columns added by data augmentation, plus QC summary for each row).

AnnotationProcessor user can view details of provenance for a row in the spreadsheet of results from an analysis.

AnnotationProcessor user can specify an analysis (by messageid) and operation (e.g. new georeference) as parameters for automatic filtering/actions on annotations.

AnnotationProcessor user can sort or filter rows in the spreadsheet of results from an analysis.