Faisal Alkhateeb, Querying RDF(S) with regular expressions, Thèse d'informatique, Université Joseph Fourier, Grenoble (FR), June 2008
RDF is a knowledge representation language dedicated to the annotation of resources within the Semantic Web. Though RDF itself can be used as a query language for an RDF knowledge base (using RDF semantic consequence), the need for added expressivity in queries has led to define the SPARQL query language. SPARQL queries are defined on top of graph patterns that are basically RDF graphs with variables. SPARQL queries remain limited as they do not allow queries with unbounded sequences of relations (e.g. "does there exist a trip from town A to town B using only trains or buses?"). We show that it is possible to extend the RDF syntax and semantics defining the PRDF language (for Path RDF) such that SPARQL can overcome this limitation by simply replacing the basic graph patterns with PRDF graphs, effectively mixing RDF reasoning with database-inspired regular paths. We further extend PRDF to CPRDF (for Constrained Path RDF) to allow expressing constraints on the nodes of traversed paths (e.g. "Moreover, one of the correspondences must provide a wireless connection."). We have provided sound and complete algorithms for answering queries (the query is a PRDF or a CPRDF graph, the knowledge base is an RDF graph) based upon a kind of graph homomorphism, along with a detailed complexity analysis. Finally, we use PRDF or CPRDF graphs to generalize SPARQL graph patterns, defining the PSPARQL and CPSPARQL extensions, and provide experimental tests using a complete implementation of these two query languages.
Knowledge Representation Languages, RDF(S), Querying Semantic Web, SPARQL, Graph Homomorphism, Regular Languages, Regular Expressions, SPARQL Extensions, PRDF, PSPARQL, CPRDF, CPSPARQL