MOSIG Master 2ND YEAR Research
YEAR 2013/2014


ADVISOR: Nabil Layaïda, Pierre Genevès, and Jérôme Euzenat

TEL: +33 (0)476 61 52 81

EMAIL: Nabil:Layaida#inria:fr, Pierre:Geneves#inria:fr, Jerome:Euzenat#inria:fr

TEAM AND LAB: Tyrex and Exmo team, INRIA & LIG

MASTER PROFILE: Artificial intelligence and the web

Reference number: Proposal n°1571


SPARQL query rewriting with paths

The goal of the semantic web is to take advantage of formalised knowledge (in languages like RDF) at the scale of the worldwide web. In order to access the published data, it offers a query language named SPARQL. SPARQL has many points in common with SQL except that, instead of dealing with relational tables, it deals with RDF graphs. Hence, its basic components are triples that should match an edge in an RDF graph.

Here is a sample SPARQL query that finds the pairs of cities connected by either a plane or a train trip followed by a bus trip:

SELECT ?x ?y
  { ?x plane ?z . UNION ?x train ?z . }
  ?z bus ?y .

Many studies have been carried out in order to both extend the expressiveness of the language and reduce the complexity of query evaluation.

Concerning expressiveness, SPARQL is being extended in various ways: paths, aggregation, negation, federation, subqueries, and entailment regimes. Obviously, these features augment SPARQL with additional expressivity at the expense of a higher complexity. Path-based languages have been provided and extend the expressiveness of SPARQL at no extra computational cost [Alkhateeb 2009]. Path are now introduced in the new version of SPARQL (1.1). They allow for expressing queries more succintly. For instance, the example above can be rewritten as:

SELECT ?x ?y WHERE { ?x (plane|train)/bus ?y . }

Concerning complexity, the SELECT operator is equivalent to projection in relational algebra. This operator is a source of extra complexity in query evaluation: NP for queries composed of basic graph patterns, if this operator can be suppressed, the complexity reduces to just PTime [Schmidt 2010]. In general, the problem of SPARQL query evaluation is PSPACE-complete [Pérez 2009]. So considering the possibility of rewriting SPARQL queries with projection into SPARQL queries without projection, i.e., eliminating the need for non-distinguished variables (such as ?z in the initial example) is interesting because it would guarantee a lower complexity.

When analysing experimentaly the distributions of SPARQL queries, we observe that such query are very often relatively simple [Chekol 2012; 2013]. In particular, we observed that 87% of the queries posed to are non cyclic. This allows for using optimised engines for some operations (query containment).

This raises the following questions:

The goal of this master topic is to study these questions.

Expected results


[Alkhateeb 2009] Faisal Alkhateeb, Jean-François Baget, Jérôme Euzenat. Extending SPARQL with regular expression patterns (for querying RDF). Journal of Web Semantics 7(2):57-73, 2009.
[Chekol 2012] Melisachew Wudage Chekol. Static analysis of semantic web queries. Thèse d'informatique, Université de Grenoble, 2012
[Chekol 2013] Melisachew Wudage Chekol, Jérôme Euzenat, Pierre Genevès, Nabil Layaïda. Evaluating and Benchmarking SPARQL Query Containment Solvers. Proc. 12th ISWC, Sydney (AU), 2013
[Pérez 2009] Jorge Pérez, Marcelo Arenas, Claudio Gutierrez. Semantics and complexity of SPARQL. ACM Transactions on Database Systems 34(3):16, 2009
[Schmidt 2010] Michael Schmidt, Michael Meier, Georg Lausen. Foundations of SPARQL Query Optimization. In Proc. ICDT, New York (NY, US), pp4-33, 2010

$Id: M2R-2013-query.html,v 1.6 2017/01/13 19:59:25 euzenat Exp $