MOSIG Master 2ND YEAR Research
YEAR 2010/2011

MASTER TOPIC PROPOSAL

ADVISOR: Jérôme Euzenat and Cássia Trojahn dos Santos

TEL: 476 61 53 66 and 476 61 53 52 66

EMAIL: Jerome:Euzenat#inrialpes:fr and Cassia:Trojahn#inrialpes:fr

TEAM AND LAB: Exmo team, INRIA & LIG

MASTER PROFILE: Artificial intelligence and the web

TITLE:

Ontology alignment hardness and test generation

Reference number: Proposal n°772

The goal of the semantic web is to take advantage of formalised knowledge (in languages like RDF) at the scale of the worldwide web. In particular, it is based on ontologies which define concepts used for representing knowledge on the web, e.g., for annotating a picture, specifying a web service interface or expressing the relation between two persons. However, it is likely that different information sources and different actors in different contexts will use different ontologies. It is thus necessary to find correspondences between concepts of these ontologies.

The operation of finding correspondences is called ontology matching [1]. It takes two ontologies as input and outputs a set of correspondences between entities of these ontologies. A correspondence is defined by the two related entities, which can be classes, instances, properties, formulas as well as combination of those, the relation between these entities (equivalence, subsumption, incompatibility, etc.) and, if possible, a confidence measure in this correspondence. Alignments are used for importing data from one ontology to another or for translating queries.

There exists many diferent alignment systems, this is why it is useful to know their application condition and their limitations. There exists an international initiative, to which we contribute, for evaluating ontology matchers (Ontology Alignment Evaluation Initiative [2]). For that purpose, we have developed a set of tests for ontology matchers.

These tests provide participants with pairs of ontologies that they have to match and their alignment is compared with a reference alignment. The ontology pairs are automatically and systematically generated from one seed ontology about bibliographic references by altering it, i.e., suppressing particular categories of information, in order to evaluate how much these algorithms relies on this information. For instance, the test #250 deletes names (replaced by random strings), comments and attributes. Hence, the data set does not aim at being realistic but to simulate a variety of situations. The benefit of this procedure is that it is possible to generate the reference alignment automatically without any possible complain.

However, this data set has several drawback:

its generation technology is not satisfactory;
it depends on the seed ontology and cannot be changed every year;
it does not generate realistic tests (it erase all properties, not 20% of them);
it is not based on some intrinsic notion of test case hardness.

Expected results

The goal of this topic is twofold.

On the one hand, we would like to develop a more elaborate test generator (manipulating the ontologies instead of XML files for instance), allowing the generation of alterations randomised on a percentage of the ontology entities and usable with any seed ontology.
On the other hand, we would like to define a notion of test hardness for the ontology matching problem enabling the generation of tests with a specific difficulty. This hardness notion is already used in other domains, e.g., in constraint satisfaction problems or SAT [3]. It will be necessary to study the litterature on this topic and to put forth adaptation to the domain of ontology matching.

Ideally, the test generator should be integrated to our Alignment server [4] in order to collect results of ontology matchers and compare them together.

References

[1] Jérôme Euzenat, Pavel Shvaiko, Ontology matching, Springer, Berlin (DE), 2007
[2] http://oaei.ontologymatching.org
[3] Empirical hardness models for SAT http://www.cs.ubc.ca/labs/beta/Projects/Empirical-Hardness-Models/
[4] https://gitlab.inria.fr/moex/alignapi

MOSIG Master 2E ANNÉE Research
ANNÉE 2010/2011

PROPOSITION DE SUJET DE MASTÈRE

RESPONSABLES: Jérôme Euzenat and Cássia Trojahn dos Santos

TÉL: 476 61 53 66 and 476 61 53 52 66

ADRESSE ÉLECTRONIQUE: Jerome:Euzenat#inrialpes:fr and Cassia:Trojahn#inrialpes:fr

LABORATOIRE ET ÉQUIPE: Équipe Exmo, INRIA & LIG

PROFIL DU PROJET: Parcours web et intelligence artificielle

TITLE:

Dureté et génération de tests d'alignments d'ontologies

Reference number: Proposition n°772

Le web sémantique est une évolution du web mettant en jeu de la connaissance formalisée. Celle-ci est décrite à l'aide d'ontologies spécifiant le vocabulaire et les contraintes pesants sur cette connaissance. Dans différents contextes, différents acteurs utiliseront des ontologies différentes. C'est pourquoi, afin de pouvoir appréhender des sources d'informations annotées par, ou exprimées, dans des ontologies différentes, il est nécessaire de faire le lien entre diverses ontologies.

Une des manières de lier ces ontologies est de trouver les correspondances existant entre les entités de ces deux ontologies [1]. Cette activité est nommée alignement d'ontologies. L'alignement d'ontologies prend en entrée deux ontologies et produit en sortie un ensemble de correspondances entre les entités de chacune des ontologies. Une correspondance est définie par les deux entités reliées (qui peuvent être des classes, des instances, des propriétés, des termes, mais aussi des combinaisons complexes de ceux-ci), la relation liant ces entités (équivalence, subsomption, incompatibilité, etc.) et si possible une mesure de confiance dans cette correspondance.

Il existe beaucoup de systèmes d'alignement différents, c'est pourquoi il est utile d'en connaître les conditions d'application et les limites. Il existe une initiative internationale, à laquelle nous participons, destinée à accomplir cette tâche (Ontology Alignment Evaluation Initiative [2]). Au sein de celle-ci nous avons développé un ensemble de tests pour les systèmes d'alignements.

Ces tests consistent à aligner une ontologie de références bibliographiques avec diverses variations de cette même ontologie. Les ontologies à mettre en correspondance sont systématiquement engendrées à partir de l'ontologie de référence en supprimant certaines catégories d'information afin d'évaluer comment les algorithmes se comportent en l'absence de cette information. Ainsi, le test #250 supprime les noms (remplacés par des chaînes aléatoires), les commentaires et les attributs. Le jeu de test ne cherche donc pas à être réaliste mais à évaluer une variété de situations. L'intérêt de cette procédure est qu'il est possible d'engendrer l'alignement de référence en même temps que l'ontologie à mettre en correspondance, et ce sans contestation possible.

Cependant, ce jeu de test souffre de divers problèmes:

sa technologie de génération est peu fiable;
il est dépendant de l'ontologie et ne peut être changé chaque année;
il n'engendre pas de jeux de test réaliste (ils supprime tous les attributs, pas 20%);
il ne contient pas de notion intrinsèque de difficulté du jeu de test.

Résultats attendus

Le but de ce stage est double. D'une part on voudrait développer un générateur de jeux de tests plus élaboré (manipulant des ontologies et non plus des fichiers XML), permettant d'engendrer des altérations aléatoires sur un pourcentage des objets de l'ontologie, et pouvant être utilisé avec n'importe quelle ontologie. D'autre part on veut pouvoir définir une notion de dureté du problème permettant d'engendrer aléatoirement des jeux de tests d'une difficulté fixée et de comparer les résultats des tests en fonction de leur dureté. Cette notion de dureté est déjà utilisée, par exemple dans les problèmes de satisfaction de contraintes pour engendrer des problèmes [3]. Il s'agira d'étudier la littérature à ce sujet et de proposer des adaptations au domaine de l'alignement d'ontologies.

Idéalement, le logiciel de génération pourra être intégré à notre serveur d'alignement [4] de manière à recueillir les résultats d'exécutions d'algorithmes d'alignement et de les comparer entre eux.

http://exmo.inria.fr/training/M2R-2010-hardness.html

$Id: M2R-2010-hardness.html,v 1.6 2021/08/27 16:47:35 euzenat Exp $