File:SPNHC 2009 eol qc tool.pdf

From Filtered Push Wiki
Jump to: navigation, search
Go to page

Original file(1,654 × 1,239 pixels, file size: 1.49 MB, MIME type: application/pdf, 62 pages)

SPNHC 2009 presentation on quality control on image metadata for the Flickr EOL group pool.


Summary reports for users to improve data quality: Experimenting with tagged biodiversity information in social media

Morris, P.J.

The Encyclopedia of Life (EOL) established a group on the social media site Flickr to allow individuals to contribute images of organisms to the species pages of EOL. To enable uptake of images into EOL, requirements were established that each image contributed to the group needed to licensed with a Creative Commons license suitable for EOL's needs, and that the image needed to be tagged with the scientific name of an organism in machine readable form. Flickr's interfaces readily allow an image to be marked as “All Rights Reserved,” or as licensed under one of six Creative Commons licenses. Flickr's interfaces also allow an image to be associated with both arbitrary text tags and with machine readable tags. The machine tags have a simple structure of namespace:concept=”value”, where namespace, concept, and value can all be arbitrary text strings. EOL chose to require concepts from the preexisting taxonomy namespace, which includes taxonomy:family, and taxonomy:binomial, for example, taxonomy:binomial=”Aporrhais pespelecani”.

I have written an application to examine the data quality of the metadata in EOL's Flickr group pool. The application uses the open source Phlickr library over Flickr's API to obtain a list of images in the group pool, and for each image in the list, cache metadata elements (particularly the machine tags, license, contributor, and geocoding) in a local databases. I then wrote a simple web interface ( to query the database and present quality control reports on both the pool as a whole, and for individual contributors. Some 27% of 13,000 images in the group didn't meet EOL's standards, with incorrect licenses and incorrectly formed machine tags being typical. Individual ontributors to the pool found this central quality control report to helpful. Within a week of its release, about half of existing quality problems had been corrected..

File history

Click on a date/time to view the file as it appeared at that time.

current17:07, 9 July 2009Thumbnail for version as of 17:07, 9 July 20091,654 × 1,239, 62 pages (1.49 MB)Paul J. Morris (Talk | contribs)SPNHC talk as delivered, with speaker's notes included.
13:55, 7 July 2009Thumbnail for version as of 13:55, 7 July 20091,654 × 1,239, 28 pages (1.23 MB)Paul J. Morris (Talk | contribs)July 7 draft of SPNHC 2009 presentation.
  • You cannot overwrite this file.

The following page links to this file: