Data Quality Control
Resources in the Community
Ontology Development 101 by Deborah McGuiness
- The data void in modeling current and future distributions of tropical species, Kenneth J . Feeley And Miles R. Silman, Global Change Biology (2011) 17, 626–630, http://onlinelibrary.wiley.com/doi/10.1111/j.1365-2486.2010.02239.x/pdf
- Number of Living Species in Australia and the world second edition (Report for the Australian Biological Resources Study), D. Chapman, Australian Biodiversity Information Services, Toowoomba, Australia, Sep. 2009, http://www.environment.gov.au/biodiversity/abrs/publications/other/species-numbers/2009/pubs/nlsaw-2nd-complete.pdf
- Distorted Views of Biodiversity: Spatial and Temporal Bias in Species Occurrence Data, Elizabeth H. Boakes, Philip J. K. McGowan, Richard A. Fuller, Ding Chang-qing, Natalie E. Clark, Kim O'Connor and Georgina M. Mace, PLoS Biol. 2010 June; 8(6): e1000385, http://hubs.plos.org/web/biodiversity/article/10.1371/journal.pbio.1000385
- Biological Taxonomy and Ontology Development: Scope and Limitations, Nico M. Franz and David Thau, Biodiversity Informatics. ISSN: 1546-9735. Vol 7 (2010),https://journals.ku.edu/index.php/jbi/article/view/3927/3790
- Dude: The Duplicate Detection Toolkit, Uwe Draisbach and Felix Naumann, QDB 2010 8th International Workshop on Quality in Databases, In conjunction with VLDB 2010, September 13th, 2010. http://www.dbis.prakinf.tu-ilmenau.de/QDB2010/participants/program/papers/Draisbach.pdf (pdf ), http://www.dbis.prakinf.tu-ilmenau.de/QDB2010/participants/program/slides/Draisbach.pdf (slids), http://www.dbis.prakinf.tu-ilmenau.de/QDB2010/participants/program/index.html (workshop paper list)
- Modeling Experts and Novices in Citizen Science Data for Species Distribution Modeling, Yu Jun, Wong Weng-Keen and Hutchinson Rebecca, In Proceedings of The IEEE International Conference on Data Mining (2010) Key: citeulike:8429681, http://web.engr.oregonstate.edu/~yuju/pubs/Yu.Wong.Hutchinson.pdf
Chirita P., Idreos S., Koubarakis M., Nejdl W.: Designing Semantic Publish/Subscribe Networks Using Super-Peers, In "Semantic Web and Peer-to-Peer", Steffen Staab and Heiner Stuckenschmidt (editors), Springer, 2006. Publication Type: Book Chapters
- Comments from Bob:
- This paper describes formal languages for advertisements, subscriptions, and publications in a super-peer P2P network. Subscriptions are queries in a typed First Order Logic (FOL) based language. This and the definitions of Publications and Notifications are defined to insure that all three of pubs, subs, and notifications are network-wise close to the same super-peer. This is not a criterion we need to focus because our initial deployments can have a central location of all the pub/sub mechanism, since we will not be distributing the annotation store either. Scalability to many thousands of nodes might require revisiting the warehouse approach to the pub/sub and annotation store.
- The main disqualifying aspect to actually adopting their FOL language is that subscriptions are, roughly, a conjunction of assertions all having the same subject and of a set of constraints expressed only on that subject and the objects of the assertions in the first set of constructions (Defn 2). This essentially excludes ternary relations such as DwC requires to assign a scientific name to an Occurrence. If O is that occurrence, we can make an assertion that assigns an Identification, I, but then I must become the subject of the assignment of a scientific name. In general, the Simple Darwin Core XML Schema, (CITE) can introduce arbitrary object properties (i.e. relations between classes) between any pair of the eight DwC classes, and this allows providers to cascade the structure above that makes this paper's languages unsuitable for us.
- However, it is likely that its predecessor literature my have ideas for formal Pub/Sub descriptions that will be satisfactory. My guess is that the restrictions on Defn 2 could be relaxed for a warehoused implementation, but I also guess that the general way of doing business in a semantic pub/sub system probably does just that, and we'll quickly find it in the literature.
E. Anceaume, M. Gradinariu, A. K. Datta, G. Simon, and A. Virgillito. 2006. A Semantic Overlay for Self-* Peer-to-Peer Publish/Subscribe. In Proceedings of the 26th IEEE International Conference on Distributed Computing Systems (ICDCS '06). IEEE Computer Society, Washington, DC, USA, 22-. DOI=10.1109/ICDCS.2006.12
- Comments from --Bob Morris 22:35, 22 September 2011 (EDT)
- Self* refers to self-organizing, self-configuring, and self-healing P2P networks. Here too, a main goal is to make a Pub/Sub network in which subscriptions are in closely related peers so that publications do not require notifications widely distributed through the network. Other goals include to insure that departure of nodes does not debilitate a subscription queue (i.e. the network can self repair).
- Sec 2.2. The subscription language comprises conjunctions of triples AF=(name, Op, c) where name is one of the a finite but unbounded universe of typed attributes, Op comes from a finite set of binary operators that depend on the type of the named attribute, and c is a constant of a suitable type for the righthand side of the operator . Example: “occurenceDate < 2004”. Subscriptions are thus filters on the data in the network, and probably in general we should not distinguish between interest filters in the FP sense and subscriptions. A publication comprises an insertion into the network of an event. An event is the conjunction of a set of equalities of the form name = value, where name is the name from the attribute universe, and value is a value taken by the attribute.
- In discussing when an event (i.e. a publication) matches a subscription, Sec 2.2 seems to be suggesting that all the operators are partial orders:
- “An event predicate AV matches a subscription predicate AF […] if the attribute names iare the same inAV and AF , and the attribute value in AV is in the range defined by AF. An event matches a subscription iff for all the predicates in the subscription, a corresponding matching value appears in the event.”
- If I'm right that the constraints express by the operators must all be partial ordiers, then I suspect this model is again not suitable for us. But again, it is my guess that such a restriction is likely to have something to do with tractability in a P2P network, and as a pub/sub language in general could be generalized. And then it may just correspond to SPARQL....
- Christos Tryfonopoulos, Manolis Koubarakis, and Yannis Drougas. 2004. Filtering algorithms for information retrieval models with named attributes and proximity operators. In Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR '04). ACM, New York, NY, USA, 313-320. DOI=10.1145/1008992.1009047
- Comments --Bob Morris 22:35, 22 September 2011 (EDT)
- This paper is dedicated to filtering algorithms on text documents as described in the title. As observed in my comments above on Ancaume2006, it is probably useful to equate subscriptions fo filters. The paper describes efficient Trie-based storage for text documents published into multiple servers, at which clients subscribe by the provision of filters in a formal language described in the paper. The filters (i.e. subscriptions) are based on conjunctions similar to what was described above, but the partial orders in the constraints are expclicit word proximity intervals in the document. For this and the above, my guess is that one could pull off the same thing at least with suitable graph distances in an AOD-based annotation. Or something....At the very least, I guess this paper is about information stores with interesting partial orders on them, not really about text.
- Martin Murth and Eva Kühn. 2009. Knowledge-based coordination with a reliable semantic subscription mechanism. In Proceedings of the 2009 ACM symposium on Applied Computing (SAC '09). ACM, New York, NY, USA, 1374-1380. DOI=10.1145/1529282.1529588
- Comments from --Bob Morris 22:35, 22 September 2011 (EDT)
- This seems even farther from our directions, in that its purpose is to meld pub/sub models with temporal logic, in order to have semantic subscriptions for systems that conceive the subscriptions as coordination constraints on agent tasks, and the publications as events that trigger those tasks, which must be carried out in a semantically and temporally meaningful way. Without the temporal aspect, it appears to me that our interest may lie in some of the predecessor literature like Petrovic et al, G-ToPSS: Fast Filtering of Graph-based Metadata, or Wang et al. An Ontology-Based Publish/Subscribe System. Or revisit Pub/Sub/Hubub??