The "ontology matching" problem, can be described as follows: given two ontologies, each describing a set of discrete entities (which can be classes, properties, rules, predicates, etc.), find the relationships, e.g., equivalence or subsumption, that hold between these entities. Such a set of correspondences is called an alignment. A variety of methods from the literature may be used for this task, basically they perform pair-wise comparison of entities from each of the ontologies and select the most similar pairs.
We have introduced original matchers and components of matchers. We also have contributed the methodology of matcher evaluation [Euzenat 2004i, Euzenat 2011b] a very important activity.
Goal: Our work on ontology matching aims at understanding and improving matching algorithms.
Evaluation of ontology matching algorithms involves to confront them with test ontologies and to compare the results. For assessing the degree of achievement of actual ontology alignment algorithms we have co-organised, since 2004, a series of evaluation events [Euzenat 2004h, Euzenat 2005d, Euzenat 2006e, Euzenat 2007g, Caraciolo 2008a, Euzenat 2009c, Euzenat 2010c, Euzenat 2011d, Aguirre 2012a, Cuenca Grau 2013a, Dragisic 2014a, Cheatham 2016a, Achichi 2016a] and set up the Ontology Alignment Evaluation Initiative. This effort, was supported by the Knowledge web network of excellence [Euzenat 2004l, Stuckenschmidt 2005a, Shvaiko 2007a] and had received very good participation year after year. All the evaluation results are available online.
Year after year, we have seen the fields as a whole evolve towards [Euzenat 2011b]:
We contributed to automating ontology matching evaluation in the framework of the SEALS project. This involves:
We have also provided a modular test generation framework enabling to generate ontology matching tests from different seed ontologies and with different levels of difficulty [Roşoiu 2011a, Euzenat 2013a]. The test generation process can be taylored by composing different alterators on the seed ontology. We showed that we were able to reproduce the OAEI Benchmark both with the original seed ontology and with other ontologies. We also assessed experimentally the properties of these tests.
In 2015, we handed out the organisation of OAEI to Ernesto Jiménez Ruiz (University of Oxford).
In order to evaluate ontology matching algorithms it is necessary to confront them with test ontologies and to compare the results. The most prominent criteria are precision and recall originating from information retrieval. However, it can happen that an alignment be very close to the expected result and another quite remote from it and they both share the same precision and recall. This is due to the inability of precision and recall to measure the proximity of the results. To overcome this problem, we have proposed a framework for generalizing precision and recall [Ehrig 2005a].
This framework replaces the intersection of retrieved and expected sets by some expression of a similarity between these sets. This similarity must satisfy some constraints (preserving the results obtained by precision and recall, being maximized by the size of the largest set) that guarantee that precision/recall is one of its instances. We have instantiated the framework with three different measures and have shown in a motivating example that the proposed measures are prone to solve the problem of rigidity of classical precision and recall.
Besides relaxing precision and recall, evaluating alignments require evaluation to be semantically accurate, e.g., that two semantically equivalent alignments have the same evaluation. This is not ensured by syntactic measures like precision and recall. For solving this problem, and thanks to the semantics we gave to alignments, we have been able to provide a definition of semantic precision and recall that can be fully seen as degrees of correctness and completeness of the alignments [Euzenat 2007a].
However, these measures can still be cheated. We have further analysed the limits of this semantic evaluation measure. From this study, we proposed two new sets of evaluation measures [David 2008b]. The first one is a semantic extension of relaxed precision and recall. The second one consists of bounding the alignment space to make ideal semantic precision and recall applicable.
|References on evaluation|
We have participated in the creation of a benchmark for multilingual ontology matching, the MultiFarm dataset [Meilicke 2012a]. This dataset is composed of a set of ontologies translated in different languages and the corresponding alignments between these ontologies. It is based on the OntoFarm dataset, which has been used successfully for several years in the Ontology Alignment Evaluation Initiative. By translating the ontologies of the OntoFarm set into eight different languages -- Chinese, Czech, Dutch, French, German, Portuguese, Russian, and Spanish -- we created a comprehensive set of realistic test cases. This new dataset has been used in the OAEI 2012 campaign.
Finally, in the context of the Cameleon project we have been working on the creation of a multilingual comparable corpora using as seed a set of multilingual aligned ontologies. These resources will be exploited in the process of populating and enriching ontologies as well as in the process of cross-lingual ontology alignment.
In order to be able to align ontologies written in OWL-Lite, we developed an algorithm (OLA), adapted from a method for measuring object-based similarity [Euzenat 2003h, Euzenat 2004c]. OLA relies on a universal measure for comparing the entities of two ontologies that combines in a homogeneous way all the knowledge used in entity descriptions: it deals successfully with external data types, internal structure of classes as given by their properties and constraints, external structure of classes as given by their relationships to other classes and the availability of individuals. This is an improvement over methods that take advantage of only a subpart of the language features.
The proposed method does not only compose linearly individual methods for assessing the similarity between entities, it uses an integrated similarity definition that makes them interact during computation. OLA is based on the definition of a distance between entities of two ontologies as a system of equations that has to be solved in order to extract an alignment.
One-to-many relationships and circularity in entity descriptions constitute the key difficulties in this context: These are respectively dealt with through local matching of entity sets and iterative computation of recursively dependent similarities which produce subsequent approximations of the target solution. So doing, OLA copes with the unavoidable circularities that occur within ontologies.
These equations are parameterized by a number of weigths corresponding to the respective importance of different components of ontologies. We introduced a preprocessing step which considers the ontologies to match and evaluate the availability of the corresponding features in order to choose the corresponding weights [Euzenat 2005e]. We also introduced the adapatation of the aggregation function to missing features or term-based comparison of labels [Euzenat 2004d]). These enhancements have been instrumental in providing good results for OLA at the OAEI-2005 and OAEI-2007 campaigns.
The algorithm has now been fully implemented as an instanciation of the Alignment API and tested in the Ontology Alignment Evaluations (see above).
This work has been developed in collaboration with the University of Montréal (Petko Valtchev and Jean-François Djouffak-Kenge).
In addition, we developed work on linguistically-grounded similarity between ontology concepts. While similarity only considers subsumption relations to assess how two objects are alike, relatedness takes into account a broader range of relations (e.g., part-of). In particular, we have presented a framework, which maps the feature-based model of similarity into the information theoretic domain. It introduces a new way to compute information content directly from an ontology structure taking into account the whole set of semantic relations defined in an ontology. The proposed framework enables to rewrite existing similarity measures that can be augmented to compute semantic relatedness. Upon this framework, a new measure called FaITH (Feature and Information THeoretic) has been devised. Extensive experimental evaluations confirmed the suitability of the framework [Pirrò 2010b].
Another aspect that has been investigated is how to compute similarity between sentences exploiting not only noun definitions but also other parts of speech, e.g., verbs, [Pirrò 2010a]. In this respect, as a source of linguistic knowledge, WordNet has been used. Ongoing research concerns how to extend the similarity framework to expressive knowledge representation languages such as description logics.
Context-based matching finds correspondences between entities from two ontologies by relating them to other resources [Euzenat 2013c]. We designed a general view of context-based matching by analysing such existing matchers [Locoro 2014a]. This view is instantiated in a path-driven approach that (a) anchors the ontologies to external ontologies, (b) finds sequences of entities (path) that relate entities to match within and across these resources, and (c) uses algebras of relations for combining the relations obtained along these paths. Parameters governing such a system were identified and made explicit.
We conducted experiments with different parameter configurations in order to assess their influence. In particular, experiments confirm that restricting the set of ontologies reduces the time taken at the expense of recall and F-measure. Increasing path length within ontologies increases recall and F-measure as well. In addition, algebras of relations (see below) allow for a finer analysis, which shows that increasing path length provides more correct or non precise correspondences, but marginally increases incorrect correspondences [Locoro 2014a].
This work started within the NeOn project in cooperation with the Open university (Martha Sabou and Mathieu d'Aquin). It is continued in collaboration with Angela Locoro (Università degli studi di Genova).
Ontology patterns are abstractions of typical configurations within ontologies. They can be instantiated in particular ontologies. We have developed correspondence patterns which abstract typical correspondences between ontologies. They can be expressed in an expressive alignment language like EDOAL.
We have shown how correspondence patterns can be used for guiding the matching process [Zamazal 2009a], for normalising ontologies before alignments or for transforming ontologies. We have introduced the notion of an ontology transformation service. This service is supported by ontology transformation patterns consisting of corresponding ontology patterns, capturing alternative modelling choices, and an alignment between them. The transformation process is made of two steps: a pattern detection and an ontology transformation process. Pattern detection is based on SPARQL [Scharffe 2009b] and transformation is based on an ontology alignment representation with specific detailed information about the transformation [Zamazal 2009b].
We also plan to apply such ontology matching techniques to data interlinking.
|References on knowledge patterns|
Measuring similarity or distances between ontologies can be very important in various tasks with different purposes. In particular, it is useful to know quickly if two ontologies are close or remote before deciding to match them. To that extent, a distance between ontologies must be efficiently computable.
We have distinguished two kinds of ontology measures: ontology space measures which are strictly based on the comparison of the content of ontologies and alignment space measures which are based on how ontologies are related through alignments.
We have studied constraints applying to ontology space measures [Euzenat 2008b] and reviewed several possible ontology distances. Then we evaluated experimentally some of them [David 2008a]. We have carried out experiments on 12 measures in the ontology space against 111 ontologies. This allowed us to identify a triple-based distance of our own, associated with a minimum weight maximal graph matching, as the most accurate measure, but measures based on the vector space model of information retrieval as the most efficient measures.
We have introduced two sets of alignment space measures relying on the existence of path between ontologies (path-based measures) or on the ontology entities that are preserved by the alignments (preservation-based measures). This reflects the possibility to perform actions such as instance import or query translation. Our experiments have shown that preservation-based measures are correlated with the best ontology space measures. Moreover they show a linear degradation with the alteration of alignments, witnessing their robustness [Euzenat 2009b, David 2010b].
All these measures have all been implemented in the OntoSim library, that has been used in experiments.
Part of this work is developed in collaboration with the University of Yeungnam (Jason Jung), Prague university of economics (Ondrej Zamazal) and Open university (Mathieu d'Aquin).
|References on ontology distances and similarities|
|< Ontology alignment||Index||References on matching and alignments||Data interlinking >|