Bibliography on ANR-Lindicle (2017-06-06)
Mustafa Al-Bakri, Manuel Atencia, Jérôme David, Steffen Lalande, Marie-Christine Rousset, Uncertainty-sensitive reasoning for inferring sameAs facts in linked data, in: Gal Kaminka, Maria Fox, Paolo Bouquet, Eyke Hüllermeier, Virginia Dignum, Frank Dignum, Frank van Harmelen (eds), Proc. 22nd european conference on artificial intelligence (ECAI), Der Haague (NL), pp698-706, 2016
Discovering whether or not two URIs described in Linked Data -- in the same or different RDF datasets -- refer to the same real-world entity is crucial for building applications that exploit the cross-referencing of open data. A major challenge in data interlinking is to design tools that effectively deal with incomplete and noisy data, and exploit uncertain knowledge. In this paper, we model data interlinking as a reasoning problem with uncertainty. We introduce a probabilistic framework for modelling and reasoning over uncertain RDF facts and rules that is based on the semantics of probabilistic Datalog. We have designed an algorithm, ProbFR, based on this framework. Experiments on real-world datasets have shown the usefulness and effectiveness of our approach for data linkage and disambiguation.
Jérôme Euzenat, Extraction de clés de liage de données (résumé étendu), in: Actes 16e conférence internationale francophone sur extraction et gestion des connaissances (EGC), Reims (FR), (Bruno Crémilleux, Cyril de Runz (éds), Actes 16e conférence internationale francophone sur extraction et gestion des connaissances (EGC), Revue des nouvelles technologies de l'information E30, 2016), pp9-12, 2016
De grandes quantités de données sont publiées sur le web des données. Les lier consiste à identifier les mêmes ressources dans deux jeux de données permettant l'exploitation conjointe des données publiées. Mais l'extraction de liens n'est pas une tâche facile. Nous avons développé une approche qui extrait des clés de liage (link keys). Les clés de liage étendent la notion de clé de l'algèbre relationnelle à plusieurs sources de données. Elles sont fondées sur des ensembles de couples de propriétés identifiant les objets lorsqu'ils ont les mêmes valeurs, ou des valeurs communes, pour ces propriétés. On présentera une manière d'extraire automatiquement les clés de liage candidates à partir de données. Cette opération peut être exprimée dans l'analyse formelle de concepts. La qualité des clés candidates peut-être évaluée en fonction de la disponibilité (cas supervisé) ou non (cas non supervisé) d'un échantillon de liens. La pertinence et de la robustesse de telles clés seront illustrées sur un exemple réel.
Armen Inants, Qualitative calculi with heterogeneous universes, Thèse d'informatique, Université de Grenoble, Grenoble (FR), April 2016
Qualitative representation and reasoning operate with non-numerical relations holding between objects of some universe. The general formalisms developed in this field are based on various kinds of algebras of relations, such as Tarskian relation algebras. All these formalisms, which are called qualitative calculi, share an implicit assumption that the universe is homogeneous, i.e., consists of objectsof the same kind. However, objects of different kinds may also entertain relations. The state of the art of qualitative reasoning does not offer a general combination operation of qualitative calculi for different kinds of objects into a single calculus. Many applications discriminate between different kinds of objects. For example, some spatial models discriminate between regions, lines and points, and different relations are used for each kind of objects. In ontology matching, qualitative calculi were shown useful for expressing alignments between only one kind of entities, such as concepts or individuals. However, relations between individuals and concepts, which impose additional constraints, are not exploited. This dissertation introduces modularity in qualitative calculi and provides a methodology for modeling qualitative calculi with heterogeneous universes. Our central contribution is a framework based on a special class of partition schemes which we call modular. For a qualitative calculus generated by a modular partition scheme, we define a structure that associates each relation symbol with an abstract domain and codomain from a Boolean lattice of sorts. A module of such a qualitative calculus is a sub-calculus restricted to a given sort, which is obtained through an operation called relativization to a sort. Of a greater practical interest is the opposite operation, which allows for combining several qualitative calculi into a single calculus. We define an operation called combination modulo glue, which combines two or more qualitative calculi over different universes, provided some glue relations between these universes. The framework is general enough to support most known qualitative spatio-temporal calculi.
Qualitative calculus, Schröder category, Relation algebra, Ontology alignment
Armen Inants, Manuel Atencia, Jérôme Euzenat, Algebraic calculi for weighted ontology alignments, in: Proc. 15th conference on International semantic web conference (ISWC), Kobe (JP), (Paul Groth, Elena Simperl, Alasdair Gray, Marta Sabou, Markus Krötzsch, Freddy Lécué, Fabian Flöck, Yolanda Gil (eds), The Semantic Web - ISWC 2016, Lecture notes in computer science 9981, 2016), pp360-375, 2016
Alignments between ontologies usually come with numerical attributes expressing the confidence of each correspondence. Semantics supporting such confidences must generalise the semantics of alignments without confidence. There exists a semantics which satisfies this but introduces a discontinuity between weighted and non-weighted interpretations. Moreover, it does not provide a calculus for reasoning with weighted ontology alignments. This paper introduces a calculus for such alignments. It is given by an infinite relation-type algebra, the elements of which are weighted taxonomic relations. In addition, it approximates the non-weighted case in a continuous manner.
Weighted ontology alignment, Algebraic reasoning, Qualitative calculi
Tatiana Lesnikova, Jérôme David, Jérôme Euzenat, Cross-lingual RDF thesauri interlinking, in: Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis (eds), Proc. 10th international conference on Language resources and evaluation (LREC), Portoroz (SI), pp2442-2449, 2016
Various lexical resources are being published in RDF. To enhance the usability of these resources, identical resources in different data sets should be linked. If lexical resources are described in different natural languages, then techniques to deal with multilinguality are required for interlinking. In this paper, we evaluate machine translation for interlinking concepts, i.e., generic entities named with a common noun or term. In our previous work, the evaluated method has been applied on named entities. We conduct two experiments involving different thesauri in different languages. The first experiment involves concepts from the TheSoz multilingual thesaurus in three languages: English, French and German. The second experiment involves concepts from the EuroVoc and AGROVOC thesauri in English and Chinese respectively. Our results demonstrate that machine translation can be beneficial for cross-lingual thesauri interlining independently of a dataset structure.
Cross-lingual data interlinking, owl:sameAs, Thesaurus alignment
Tatiana Lesnikova, RDF data interlinking: evaluation of cross-lingual methods, Thèse d'informatique, Université de Grenoble, Grenoble (FR), May 2016
The Semantic Web extends the Web by publishing structured and interlinked data using RDF. An RDF data set is a graph where resources are nodes labelled in natural languages. One of the key challenges of linked data is to be able to discover links across RDF data sets. Given two data sets, equivalent resources should be identified and linked by owl:sameAs links. This problem is particularly difficult when resources are described in different natural languages. This thesis investigates the effectiveness of linguistic resources for interlinking RDF data sets. For this purpose, we introduce a general framework in which each RDF resource is represented as a virtual document containing text information of neighboring nodes. The context of a resource are the labels of the neighboring nodes. Once virtual documents are created, they are projected in the same space in order to be compared. This can be achieved by using machine translation or multilingual lexical resources. Once documents are in the same space, similarity measures to find identical resources are applied. Similarity between elements of this space is taken for similarity between RDF resources. We performed evaluation of cross-lingual techniques within the proposed framework. We experimentally evaluate different methods for linking RDF data. In particular, two strategies are explored: applying machine translation or using references to multilingual resources. Overall, evaluation shows the effect of cross-lingual string-based approaches for linking RDF resources expressed in different languages. The methods have been evaluated on resources in English, Chinese, French and German. The best performance (over 0.90 F-measure) was obtained by the machine translation approach. This shows that the similarity-based method can be successfully applied on RDF resources independently of their type (named entities or thesauri concepts). The best experimental results involving just a pair of languages demonstrated the usefulness of such techniques for interlinking RDF resources cross-lingually.
Semantic web, Cross-lingual data treatment, Artificial intelligence
Adam Sanchez, Tatiana Lesnikova, Jérôme David, Jérôme Euzenat, Instance-level matching, Deliverable 3.2, Lindicle, 20p., September 2016
This paper describes precisely an ontology matching technique based on the extensional definition of a class as set of instances. It first provides a general characterisation of such techniques and, in particular the need to rely on links across data sets in order to compare instances. We then detail the implication intensity measure that has been chosen. The resulting algorithm is implemented and evaluated on XLore, DBPedia, LinkedGeoData and Geospecies.
Instance-based matching, Ontology alignments
Jérôme David, Jérôme Euzenat, Manuel Atencia, Language-independent link key-based data interlinking, Deliverable 4.1, Lindicle, 21p., March 2015
Links are important for the publication of RDF data on the web. Yet, establishing links between data sets is not an easy task. We develop an approach for that purpose which extracts weak link keys. Link keys extend the notion of a key to the case of different data sets. They are made of a set of pairs of properties belonging to two different classes. A weak link key holds between two classes if any resources having common values for all of these properties are the same resources. An algorithm is proposed to generate a small set of candidate link keys. Depending on whether some of the, valid or invalid, links are known, we define supervised and non supervised measures for selecting the appropriate link keys. The supervised measures approximate precision and recall, while the non supervised measures are the ratio of pairs of entities a link key covers (coverage), and the ratio of entities from the same data set it identifies (discrimination). We have experimented these techniques on two data sets, showing the accuracy and robustness of both approaches.
data interlinking, linked data, link key, candidate link key, coverage, dissimilarity
Jérôme Euzenat, Jérôme David, Angela Locoro, Armen Inants, Context-based ontology matching and data interlinking, Deliverable 3.1, Lindicle, 21p., July 2015
Context-based matching finds correspondences between entities from two ontologies by relating them to other resources. A general view of context-based matching is designed by analysing existing such matchers. This view is instantiated in a path-driven approach that (a) anchors the ontologies to external ontologies, (b) finds sequences of entities (path) that relate entities to match within and across these resources, and (c) uses algebras of relations for combining the relations obtained along these paths. Parameters governing such a system are identified and made explicit. We discuss the extension of this approach to data interlinking and its benefit to cross-lingual data interlinking. First, this extension would require an hybrid algebra of relation that combines relations between individual and classes. However, such an algebra may not be particularly useful in practice as only in a few restricted case it could conclude that two individuals are the same. But it can be used for finding mistakes in link sets.
Context-based data interlinking>, Multilingual data interlinking, Context-based ontology matching, Algebras of relations, Semantic web
Armen Inants, Jérôme Euzenat, An algebra of qualitative taxonomical relations for ontology alignments, in: Proc. 14th conference on International semantic web conference (ISWC), Bethleem (PA US), (Marcelo Arenas, Óscar Corcho, Elena Simperl, Markus Strohmaier, Mathieu d'Aquin, Kavitha Srinivas, Paul Groth, Michel Dumontier, Jeff Heflin, Krishnaprasad Thirunarayan, Steffen Staab (eds), The Semantic Web - ISWC 2015. 14th International Semantic Web Conference, Bethlehem, Pennsylvania, United States, October 11-15, 2015, Lecture notes in computer science 9366, 2015), pp253-268, 2015
Algebras of relations were shown useful in managing ontology alignments. They make it possible to aggregate alignments disjunctively or conjunctively and to propagate alignments within a network of ontologies. The previously considered algebra of relations contains taxonomical relations between classes. However, compositional inference using this algebra is sound only if we assume that classes which occur in alignments have nonempty extensions. Moreover, this algebra covers relations only between classes. Here we introduce a new algebra of relations, which, first, solves the limitation of the previous one, and second, incorporates all qualitative taxonomical relations that occur between individuals and concepts, including the relations "is a" and "is not". We prove that this algebra is coherent with respect to the simple semantics of alignments.
Relation algebra, Ontology alignment, Network of ontologies
Tatiana Lesnikova, Jérôme David, Jérôme Euzenat, Algorithms for cross-lingual data interlinking, Deliverable 4.2, Lindicle, 31p., June 2015
Linked data technologies enable to publish and link structured data on the Web. Although RDF is not about text, many RDF data providers publish their data in their own language. Cross-lingual interlinking consists of discovering links between identical resources across data sets in different languages. In this report, we present a general framework for interlinking resources in different languages based on associating a specific representation to each resource and computing a similarity between these representations. We describe and evaluate three methods using this approach: the two first methods are based on gathering virtual documents and translating them and the latter one represent them as bags of identifiers from a multilingual resource (BabelNet).
data interlinking, cross-lingual link discovery, owl:sameAs
Tatiana Lesnikova, Jérôme David, Jérôme Euzenat, Interlinking English and Chinese RDF data using BabelNet, in: Pierre Genevès, Christine Vanoirbeek (eds), Proc. 15th ACM international symposium on Document engineering (DocEng), Lausanne (CH), pp39-42, 2015
Linked data technologies make it possible to publish and link structured data on the Web. Although RDF is not about text, many RDF data providers publish their data in their own language. Cross-lingual interlinking aims at discovering links between identical resources across knowledge bases in different languages. In this paper, we present a method for interlinking RDF resources described in English and Chinese using the BabelNet multilingual lexicon. Resources are represented as vectors of identifiers and then similarity between these resources is computed. The method achieves an F-measure of 88%. The results are also compared to a translation-based method.
Cross-lingual instance linking, Cross-lingual link discovery, owl:sameAs
Manuel Atencia, Jérôme David, Jérôme Euzenat, Data interlinking through robust linkkey extraction, in: Torsten Schaub, Gerhard Friedrich, Barry O'Sullivan (eds), Proc. 21st european conference on artificial intelligence (ECAI), Praha (CZ), pp15-20, 2014
Links are important for the publication of RDF data on the web. Yet, establishing links between data sets is not an easy task. We develop an approach for that purpose which extracts weak linkkeys. Linkkeys extend the notion of a key to the case of different data sets. They are made of a set of pairs of properties belonging to two different classes. A weak linkkey holds between two classes if any resources having common values for all of these properties are the same resources. An algorithm is proposed to generate a small set of candidate linkkeys. Depending on whether some of the, valid or invalid, links are known, we define supervised and non supervised measures for selecting the appropriate linkkeys. The supervised measures approximate precision and recall, while the non supervised measures are the ratio of pairs of entities a linkkey covers (coverage), and the ratio of entities from the same data set it identifies (discrimination). We have experimented these techniques on two data sets, showing the accuracy and robustness of both approaches.
Manuel Atencia, Michel Chein, Madalina Croitoru, Jérôme David, Michel Leclère, Nathalie Pernelle, Fatiha Saïs, François Scharffe, Danai Symeonidou, Defining key semantics for the RDF datasets: experiments and evaluations, in: Proc. 21st conference on International Conference on Conceptual Structures (ICCS), Iasi (RO), (Graph-Based Representation and Reasoning (Proc. 21st conference on International Conference on Conceptual Structures (ICCS)), Lecture notes in artificial intelligence 8577, 2014), pp65-78, 2014
Many techniques were recently proposed to automate the linkage of RDF datasets. Predicate selection is the step of the linkage process that consists in selecting the smallest set of relevant predicates needed to enable instance comparison. We call keys this set of predicates that is analogous to the notion of keys in relational databases. We explain formally the different assumptions behind two existing key semantics. We then evaluate experimentally the keys by studying how discovered keys could help dataset interlinking or cleaning. We discuss the experimental results and show that the two different semantics lead to comparable results on the studied datasets.
semantics of a key, data interlinking
Manuel Atencia, Jérôme David, Jérôme Euzenat, What can FCA do for database linkkey extraction?, in: Proc. 3rd ECAI workshop on What can FCA do for Artificial Intelligence? (FCA4AI), Praha (CZ), pp85-92, 2014
Links between heterogeneous data sets may be found by using a generalisation of keys in databases, called linkkeys, which apply across data sets. This paper considers the question of characterising such keys in terms of formal concept analysis. This question is natural because the space of candidate keys is an ordered structure obtained by reduction of the space of keys and that of data set partitions. Classical techniques for generating functional dependencies in formal concept analysis indeed apply for finding candidate keys. They can be adapted in order to find database candidate linkkeys. The question of their extensibility to the RDF context would be worth investigating.
Tatiana Lesnikova, Jérôme David, Jérôme Euzenat, Interlinking English and Chinese RDF data sets using machine translation, in: Johanna Völker, Heiko Paulheim, Jens Lehmann, Harald Sack, Vojtech Svátek (eds), Proc. 3rd ESWC workshop on Knowledge discovery and data mining meets linked open data (Know@LOD), Hersounisos (GR), 2014
Data interlinking is a difficult task particularly in a multilingual environment like the Web. In this paper, we evaluate the suitability of a Machine Translation approach to interlink RDF resources described in English and Chinese languages. We represent resources as text documents, and a similarity between documents is taken for similarity between resources. Documents are represented as vectors using two weighting schemes, then cosine similarity is computed. The experiment demonstrates that TF*IDF with a minimum amount of preprocessing steps can bring high results.
Semantic web, Cross-lingual link discovery, Cross-lingual instance linking, owl:sameAs
Tatiana Lesnikova, Interlinking RDF data in different languages, in: Christophe Roche, Rute Costa, Eva Coudyzer (eds), Proc. 4th workshop on Terminology and Ontology: Theories and applications (TOTh), Bruxelles (BE), 2014
Semantic web, Cross-lingual resource discovery, Multi-lingual instance matching, owl:sameAs
Tatiana Lesnikova, Interlinking cross-lingual RDF data sets, in: Proc. conference on ESWC PhD symposium, Montpellier (FR), (Philipp Cimiano, Óscar Corcho, Valentina Presutti, Laura Hollink, Sebastian Rudolph (eds), The semantic web: research and applications (Proc. 10th conference on European semantic web conference (ESWC)), Lecture notes in computer science 7882, 2012), pp671-675, 2013
Linked Open Data is an essential part of the Semantic Web. More and more data sets are published in natural languages comprising not only English but other languages as well. It becomes necessary to link the same entities distributed across different RDF data sets. This paper is an initial outline of the research to be conducted on cross-lingual RDF data set interlinking, and it presents several ideas how to approach this problem.
Multilingual Mappings, Cross-Lingual Link Discovery, Cross-Lingual RDF Data Set Linkage
Tatiana Lesnikova, NLP for interlinking multilingual LOD, in: Proc. conference on ISWC Doctoral consortium, Sydney (NSW AU), (Lora Aroyo, Natalya Noy (eds), Proceedings of the ISWC Doctoral Consortium (Proc. conference on ISWC Doctoral Consortium), Sydney (NSW AU), 2013), pp32-39, 2013
Nowadays, there are many natural languages on the Web, and we can expect that they will stay there even with the development of the Semantic Web. Though the RDF model enables structuring information in a unified way, the resources can be described using different natural languages. To find information about the same resource across different languages, we need to link identical resources together. In this paper we present an instance-based approach for resource interlinking. We also show how a problem of graph matching can be converted into a document matching for discovering cross-lingual mappings across RDF data sets.
Multilingual Mappings, Cross-Lingual Link Discovery, Cross-Lingual RDF Data Set Linkage