Manuel Atencia, Jérôme David, François Scharffe, Keys and pseudo-keys detection for web datasets cleansing and interlinking, in: Proc. 18th international conference on knowledge engineering and knowledge management (EKAW), Galway (IE), (Annette ten Teije, Johanna Voelker, Siegfried Handschuh, Heiner Stuckenschmidt, Mathieu d'Aquin, Andriy Nikolov, Nathalie Aussenac-Gilles, Nathalie Hernandez (eds), Knowledge engineering and knowledge management, Lecture notes in computer science 7603, 2012), pp144-153, 2012
This paper introduces a method for analyzing web datasets based on key dependencies. The classical notion of a key in relational databases is adapted to RDF datasets. In order to better deal with web data of variable quality, the definition of a pseudo-key is presented. An RDF vocabulary for representing keys is also provided. An algorithm to discover keys and pseudo-keys is described. Experimental results show that even for a big dataset such as DBpedia, the runtime of the algorithm is still reasonable. Two applications are further discussed: (i) detection of errors in RDF datasets, and (ii) datasets interlinking.
Data Interlinking, Semantic Web, RDF Data Cleaning