Transformations and properties

A transformation (τ) is an algorithmic manner to generate a representation (τ(o)) from another one (o, not necessarily in the same language). We focus on transformations made of composition of more elementary transformations for which we only know the input, output and assumed properties. These transformations may have been generated from an alignment or by any other means.

A transformation system is characterised by a set of elementary transformations and a set of composition operators. A transformation flow is the composition of elementary transformation instances whose input/output are connected by channels. A transformation flow is itself a transformation. More precisely, our work concerns syntactic transformations of XML ("eXtensible Markup Language") structure encoding knowledge representation languages. We take advantage of the XSLT transformation language ("XML Style Language Transformations" [Clark 1999]) recommended by W3C, for which we put forward a compound transformation language (see "Transformation flow language and processor").

The design of information systems as transformation flows requires the ability to express such flows and to determine their properties. A property is a boolean predicate about the transformation, e.g., "preserving information" is such a predicate - it is true or false of a transformation - and is satisfied if there exists an algorithmic mean to recover o from τ(o)).

We consider more closely preservation properties that can allow the preservation (or anti-preservation) of an order relation between the source representations (o) and the target representations (τ(o)). To that extent, one can identify:

Syntactic properties: based on the organisation of syntactic elements, like the completion (τ(o)≤o, in which ≤ denotes structural subsumption between representations);
Semantic properties: based on model and consequence concepts, like consequence preservation (τ(o)⇒o, i.e., Equation (1) of Semantic interoperability);
Semiotic properties: based on the interpretation as signs of the manipulated objects, like interpretation preservation (let σ be the interpretation rules and |≡_L,i be the interpretation of individual i, ∀δ∈L, ∀i, j, o, σ|≡_L,iδ ⇒ τ(o), τ(σ)|≡_L,jτ(δ).

Our goal is to study transformations based on transformation properties rather than on representations or transformation structures. Properties are not restricted to semantic properties but cover various properties, such as content or structure preservation, traceability, and confidentiality. We also consider properties of transformation systems (given a transformation system, is information preservation decidable and at what cost?). Hereafter we present some applications of this to particular properties. Sheer semantic properties are presented in Semantic interoperability. We try to characterise, given a particular type of property, which transformations leave them invariant and what is the result of applying composition operators.

Goal: Characterising, for some kinds of properties, which transformations leave them invariant and what is the result of the composition of these transformations.

Semantic adaptation of multimedia documents

When a multimedia document is played on platforms with limited resources, e.g., a mobile phone that can only display one image at a time or an interactive display without keyboard, it is necessary to adapt the document to the target device. In order to assess the meaning of adaptation, we have defined a semantic approach [Euzenat 2003b], which considers a model of a multimedia document as one of its potential executions (an execution satisfying its specification). In a first approximation, adaptation reduces the set of models of a specification by selecting those satisfying the adaptation constraints (since it will discard unwanted executions). Adapting amounts to finding this subset of models or, when it is empty, finding a compatible execution as close as possible to the initial execution. For that purpose, we have proposed to express the set of possible interpretation by a resolved relation graph. Each relation of this graph can be a temporal, spatial, spatio-temporal relation, etc.

One of the first outcomes of this framework has been to show that the current approach to adaptation would distort the initial specification, though this is not always necessary. When there is no model satisfying adaptation constraints, it is necessary that the transformation provides specifications whose models satisfy the constraints and are close to a model of the initial specification. A distance between models must thus be defined. We made experiments on the temporal structure of multimedia documents taking advantage of the conceptual neighbourhood structure of the set of qualitative relations [Euzenat 2003b].

While we have developed a framework in general and shown how to use it with Allen temporal algebra, we are further considering the direct translations from currently used languages for specifying multimedia documents. In particular, we have experimented the adaptation of SMIL 2.0 documents. We have shown that it is possible to adapt SMIL documents in the temporal, spatial [Laborie 2006c], spatio-temporal [Laborie 2006b] and hypermedia dimension [Laborie 2006d]. We have studied different qualitative spatial representations. Some are very expressive and precise, like the directional representation, but the computational cost of adapting is high. Others, like the topological representation, can be adapted quickly, but offer poor expressiveness. Thus, we have introduced a new spatial representation which trades off between expressiveness and speed. We have compared experimentally each spatial representation on SMIL examples [Laborie 2006a]. Moreover, we have completed preliminary work to include the logical dimension, i.e., group together some objects in one single element. Thus, our approach takes into account all multimedia dimensions [Laborie 2011a].

We have extended our adaptation approach with the capability to suppress multimedia objects [Laborie 2007a]. For example, a profile may indicate that only a few multimedia objects are allowed in a presentation. When multimedia objects are removed, we forced the adapted document to satisfy properties such as presentation contiguity. Moreover, we presented a method for summarising multimedia documents using a relevance degree for each multimedia object.

A SMIL document cannot be directly adapted so we introduced bijective translations from SMIL to the designed algebra. The adaptation can then take place within the algebra. However, after adaptation, the converse transformation does not necessarily yield a coherent document. So we produced a retrofitting procedure that we proved to yield a coherent document and preserving the neutrality and minimality requirements that characterise adaptation [Laborie 2004a]. In addition, we found that the generic distance we used in our initial proposal was not necessarily the best for all situations of transgressive adaptation. We have thus provided a case analysis of SMIL objects which finds the best distance in all circumstances [Laborie 2008b].

We have implemented our approach in an interactive adaptation system for SMIL documents [Laborie 2008c]. A prototype screencast is also available at http://exmo.inria.fr/people/laborie/Research.html. Moreover, we have studied the computation time of adaptating solutions. We have also considered relation graphs containing mixed quantitative and qualitative relations. To efficiently check the satisfiability of this kind of temporal constraint network, we have to deal with the infinite domains of variables, which can generated an infinite number of candidates during backtrack algorithm. For solving this problem, we rely upon finite partitions of domains using bi-intervals (intervals of intervals) [Baget 2007a]. We have implemented sound and complete backtrack and forward checking algorithms and shown that bi-intervals, used in a hybrid algorithm which also instantiates constraints, improve backtrack efficiency.

Finally, we have considered media adaptation [Laborie 2007b]. For that purpose, we propose to adapt media items by replacing incompatible media items by others found on the web. The adapted media items must convey the same message as the original ones, while satisfying the target profile. We have presented a possible architecture to implement this and we have shown that search engines can already do it to a limited extent. Nonetheless, some results are unsatisfactory because media annotations lack semantics, are partial and are heterogeneous. Hence, we have proposed to use semantic web technologies, such as RDF descriptions, ontologies, ontology merging and matching, in order to select better alternatives, thus improving this adaptation framework.

Evaluating the transformations in function of the maximal preservation of temporal constraints is not always the best strategy. It is necessary to describe temporal scenario with richer structures (like those of rhetorical structure theory [Mann 1988] used in [Rutledge 2000]). We have considered the use of the author discourse in the context of semantic adaptation and have shown that specifying some rhetorical relations between multimedia objects, such as "examplified", may in turn identify implicit spatio-temporal relations between these objects [Laborie 2008b]. Hence, using the author discourse structure can be used to guide the adaptation process by providing adapted documents which are as close as possible from either the explicit document composition or the author discourse structure. Moreover, for SMIL documents, we have shown that this discourse may be specified with RDF triples in the SMIL Metadata Module. There remain a lot of work to be done in the exploitation of rhetorical representations.

We have proposed an extension to SPARQL that allows to generate any kind of XML documents from multiple RDF data and a given XML template [Alkhateeb 2008c]. Thanks to this extension, an XML template can itself contain SPARQL queries that can import template instances. Such an approach allows to reuse templates, divide related information into various templates and avoid templates containing mixed languages. Moreover, reasoning capabilities can be exploited using RDFS, and document adaptation could be achieved using the SPARQL FILTER clause in order to restrict the answers to the set that satisfies the given profile.

This work is a joint work with Nabil Layaïda (WAM project). It has been the DEA of Victor Dias and the thesis of Sébastien Laborie.

References on multimedia adaptation

Self interpreted logics and RDF entailment

We are interested in various knowledge representation languages whose formulas can be represented by graphs and where the truth value of a formula in a model-theoretic interpretation can be expressed by some kind of graph homomorphism. Such languages naturally encompass conceptual graphs or RDF, but also propositional logics, constraint networks or positive Datalog.

Inspired by conceptual graphs, we have adapted the reasoning operation used in this model (basically a graph homomorphism) to obtain sound and complete inferences in the Semantic Web languages RDF [Baget 2003c] and RDFS. These inferences can thus benefit from the efficient graph homomorphism algorithms developed for conceptual graphs [Baget 2003b].

Our work on graph-based knowledge representation languages, such as conceptual graphs and RDF, has led to identify similarities, not only between the algorithms used to compute entailment (different kinds of graph homomorphisms), but also in the way their semantics are expressed. In order to identify the logics that can benefit from this property, we obtained the following results:

For any logic L, the consequence relation is a preorder, and is equivalent to a lattice if L is conjunctive.
For any preorder ≺, we can build a logic L whose logical consequence relation is ≺, and where the interpretations are formulas of the language (we call such a logic a self-interpreted logic).
Any preorder on a countable set of objects is a suborder of graph homomorphism.

Any conjunctive logic having a countable set of formulas is thus equivalent to a self-interpreted logic where both truth values and entailment are expressed via a graph homomorphism. However, the third item above being non constructive, such an equivalent logic cannot be automatically generated. We have instead adopted another approach, where the semantics of a language is described via graph homomorphisms.

Consider a logic, i.e., a set of formulas, a set of interpretations, and a relation expressing which interpretations are models for a logic.

Both formulas and model-theoretic interpretations can be encoded as graphs (respectively formula graphs and interpretation graphs)
Assume a graph homomorphism, called a projection (a mapping of nodes fulfilling particular properties) such that:
- The interpretation I is a model of the formula f if and only if the graph of f projects into the graph of I;
- For every satisfiable formula f, there exists a model I of f such that the graph of I projects into the graph of f.

In that case, the projection considered is a sound and complete entailment mechanism for the logic involved.

Through this framework, we have characterized a class of logics, including RDF or conceptual graphs, whose semantics can be expressed via a graph homomorphism. The main interest of this approach is to automatically generate a decision procedure for the entailment relation from the description of the semantics (as graph homomorphism computation). Proofs of soundness and completeness results for entailment in both the conceptual graphs and RDF formalisms have been unified into this framework [Baget 2004a, 2005a].

This work has interesting consequences with respect to EXMO objectives. Let us suppose two ontologies o and o' in logics expressed in our meta-language and a set of equivalences, e.g., an alignment, between terms of o and o'. A merge of o and o' is an ontology o" that contains both formulas of o and o', that preserves logical consequence in o and o', and where equivalent terms in o and o' are interpreted as the same object in each model. If we can generate this logic, and if the obtained o" is self-interpreted, then we can generate the logical consequence algorithm for o". We have developed a set of transformations ensuring that the obtained logic is self-interpreted, but two questions remain: (1) are these transformations sufficient to handle all cases, and (2) can we automatically compose these transformations to obtain o" ?

Path RDF as a RDF query language

Querying RDF graphs can be reduced to computing entailment. Entailment or semantic consequence in RDF, i.e., determining whether or not an RDF graph is a logical consequence of the RDF graphs representing the database or semantic web, can be computed by a sort of homomorphism of directed labeled graphs, known as conceptual graph projection. Though RDF itself can be used as a query language for an RDF knowledge base, the need for added expressivity in queries has led to the definition of the SPARQL query language. SPARQL queries are defined on top of graph patterns that are basically RDF (and more precisely GRDF) graphs [Alkhateeb 2012a].

Another way to query RDF graphs, which has been used in graph databases, is to query for path expressed by regular expressions holding between nodes (the former allows for full graph branching and cycling as queries, the latter allows for indetermined lengths of paths).

However, some queries that can be expressed in one approach cannot be expressed in the other: a query whose homomorphic image in the database is not a path cannot be expressed by a regular expression, while RDF semantics is not meant to express paths of unknown length. The two kinds of queries do not identify the same set of queries.

To benefit from both approaches, we have defined PRDF, for Path RDF, an extension of RDF that allows regular expressions over relations as labels to the arcs of RDF graphs [Alkhateeb 2007b, c]. This expresses that there is a path combining relations in a particular way between two resources.

We extended the RDF triples syntax by allowing non-atomic regular expressions, i.e., regular expressions generated inductively over URIrefs, to occur in the place of the predicate. We extended the semantics to make two resources of an interpretation support a regular expression in the interpretation. Then we gave a sound and complete inference mechanism for computing the PRDF graph projection over RDF graphs [Alkhateeb 2009a].

This is the PhD thesis of Faisal Alkhateeb [Alkhateeb 2008b].

Constrained Path RDF as a RDFS query language

However, PRDF graphs do not allow expressing constraints on the nodes, e.g. "Moreover, one of the correspondences must provide a wireless connection.". To express these constraints, we have proposed an extension of PRDF, called CPRDF (for Constrained Path RDF) [Alkhateeb 2007e,Alkhateeb 2008b].

For these two extensions of RDF, we have provided an abstract syntax and an extension of RDF semantics. We have characterised query answering (the query is a PRDF or a CPRDF graph, the knowledge base is an RDF graph) as a particular case of PRDF or CPRDF entailment that can be computed using some kind of graph homomorphism. Query answering thus remains an NP-hard problem in all these languages.

Finally, we use these PRDF or CPRDF graphs as graph patterns in SPARQL (the query language for RDF recommended by the W3C), defining the PSPARQL and CPSPARQL extensions of that query language. PSPARQL (and CPSPARQL) extends SPARQL by replacing the basic graph patterns of SPARQL, which are RDF graphs with variables, by PRDF graph patterns (and CPRDF graph patterns). For instance, one can query if there are transportation means by any combination of bus or train between two cities. We have defined the syntax and semantics of PSPARQL (and CPSPARQL) and shown how to use PRDF projection for answering PSPARQL (and CPSPARQL) queries. We provide the necessary algorithms for computing the answer set to a given PSPARQL (and CPSPARQL) query. We have implemented these algorithms in a PSPARQL query evaluator. The efficiency of evaluating queries using this approach has been demonstrated through the use of the Lehigh University Benchmark for generating RDFS graphs and in 2012 it was still the most robust and fastest path query evaluator [Arenas 2012].

The RDF Schema language (RDFS) is an extension of RDF, designed to describe the relationships between resources and properties, e.g. ClassA rdfs:subClassOf ClassB. A naive approach for querying an RDFS graph G with SPARQL or even CPSPARQL is by computing the closure graph of G through the use of RDFS deduction rules. To avoid the computation of such closure graph, which may be costly in space and time; we have proposed a new approach for evaluating queries over the core fragment of [Muñoz 2007]. This approach relies on rewriting the given (CP)SPARQL query Q into a CPSPARQL Q' such that the evaluation of Q over closure(G) is equivalent to the evaluation of Q' over G.

Finaly, we have proposed to use PSPARQL as a basis for a new language for processing alignments [Euzenat 2008d, Euzenat 2008g]. More precisely, we have proposed that for processing expressive alignments generated by patterns [Scharffe 2008a], we needed a mix of the rule language SPARQL++ and PSPARQL.

Abstract interpretation of RDF queries and μ-calculus

In the continuation of our previous work on path-based RDF querying and WAM's work on μ-calculus interpretation of XPATH, we are using the same techniques for interpreting SPARQL queries over RDF.

We study query containment, i.e., determining whether, for any graph, the answers to a query are contained in those of another query. This problem is very important for query optimisation purposes, and will be even more in case of distributed queries. In the SPARQL context, it can be equally useful for distributing federated queries or for implementing schema-based access control.

We have reduced SPARQL query containment to satisfiability in the μ-calculus. To that extent, we proposed an encoding of RDF graphs as labelled transition systems and SPARQL queries and ontologies as propositional μ-calculus formulas. They allow to translate query evaluation to graph traversing through the modalities of the logic. We have proved the correctness of the encoding [Wudage Chekol 2012e].

It is then possible to use solvers of this logic to test query containment of SPARQL queries under RDFS [Wudage Chekol 2012a] and OWL schema [Wudage Chekol 2012b] constraints, with paths [Wudage Chekol 2011a] or under particular entailment regimes [Wudage Chekol 2012a]. We have also implemented the proposed techniques on top of a general μ-calculus solver.

In order to experimentally assess implementation strengths and limitations, we provided a first SPARQL containment test benchmark [Wudage Chekol 2013a]. We studied the query demographics on DBPedia logs to design benchmarks for relevant query containment solvers. We tested available solvers on their domain of applicability on three different benchmark suites and found that (i) tested solutions are overall functionally correct, (ii) in spite of its complexity, SPARQL query containment is practicable for acyclic queries, (iii) state-of-the-art solvers are at an early stage both in terms of capabilities and implementation.

The benchmarks, results and software are available at http://sparql-qc-bench.inrialpes.fr.

This is the PhD thesis of Melisachew Wudage Chekol [Wudage Chekol 2012e], co-supervised with Nabil Layaïada (WAM)

References on semantic web queries

Lattice representation of information

This study takes, as a starting point, other works on the communication of information like information flows [Barwise 1997] or the use of lattices like formal concept analysis [Ganter 1999].

The languages that we use, and particularly those based on terms, can be seen as lattices. Lattices are sets (L) partially ordered (≤) in which any pair of elements has a smallest upper bound for ≤. The order can be understood as specifying that an element is more informative than another (smaller for the order). The absence of information is the maximal element for the order and an element is smaller than another if that one is an approximation of the former (because it contains less information but does not contradict it). Thus, considering two systems exchanging information, it is possible to express that a system knows some information from another system, that it knows it partially or that it does not know it.

In this framework, τ maps a lattice to another. A transformation is information anti-preserving if its adjoin transformation (defined as the smallest upper bound of the element which are mapped in the same element of L') can map its image to a greater element (it will never return a smaller element).

The particular case in which the transformations are endomorphic (τ: L→L), can be seen as the domain of interpretation of a modal logic. In such a case, the interpretation of a formula δ is the set of representations (the elements of the lattice) in which it is true and the interpretation of the K_i modality (known of i, which is tied to a transformation τ_i) is the set of representations which, once transformed by τ_i, satisfy the formula to be known ([K_iδ]={x∈L;τ_i(x)∈[δ]}). Because information increases monotonically with ≤, the interpretation of a formula is a subset of L closed for ≤ [Brunet 2001a].

The axioms of this structure happen to be those of the modal logic IS4 (which, due to the specificity of negation on subsets closed for ≤, does not satisfy the law of the excluded middle, nor Axiom 5 - ¬K_iδ ⇒ K_i¬K_iδ). One of the advantage of this work is to enable the expression of queries related to the knowledge available after transformation, e.g., ∃δ; K_ijδ ∧ ¬K_iδ ∧ ¬K_jδ, or "is it possible, by gathering the knowledge of i and j to infer information not available to any of them?", with ∀δ (K_iδ ⇒ K_ijδ) ∧ (K_jδ ⇒ K_ijδ)). These techniques apply if one wants that some infomation is not deducible from those that are communicated.

The kind of transformation used has to be restricted in order to apply this work to information systems (but, the terms can be XML documents).

< Data interlinking

Index

References

Semantic interoperability >

This part of the doctoral work of Olivier Brunet [Brunet 2002b].

http://exmo.inria.fr/research/proptrans.tml

Feel free to comment to Jerome:Euzenat#inria:fr, $Id: proptrans.html,v 1.43 2017/02/01 21:33:13 euzenat Exp $