2nd year Master of science in informatics, specialty Artificial intelligence and the web

Semantic web: from XML to OWL

Sihem Amer-Yahia (Sihem : Amer-Yahia # imag : fr)
Jérôme Euzenat (Jerome : Euzenat # inria : fr)
Pierre Genevès (Pierre : Geneves # inria : fr)
36h, 6 ETCS
Marks are given after All documents allowed.



The web has been constantly evolving from a distributed hypertext system to a very large information processing machine. As fast as it is, this evolution is grounded on theoretical principles borrowing to several fields of computer science such as programming languages, data bases, structured documentation, logic and artificial intelligence. The smooth operation of the past and future web at a large scale is relying on these foundations. The goal of this course is to present them, the problem that they solve as those that they uncover. It considers three milestones of this evolution: XML, the social web and the semantic web.

The first part introduces the foundations of XML technologies: the XML language for document markup, DTDs as a type system for XML documents, XML query languages (XPath and XQuery) and XML transformation language XSLT. We will consider the major results obtained on each of these languages as well as the open questions. Then we introduce the challenges raised by these technologies to theoretical computer science. This covers formal methods used for grounding these technologies (tree automata, tree logics, their algorithms and complexity) as well as their application to XML query typing and static analysis of XML transformation languages.

The second part summarizes data models and algorithms required to extract, manage and access massive amounts of social content. The course examples are drawn from real-world applications such as URL search and recommendation on Delicious, group recommendation in MovieLens and extracting travel itineraries from Flickr photos. The course goals are: acquire knowledge on scalable algorithms for processing large volumes of social data and extracting value from that data and learn how to run and interpret large-scale user studies.

The third part introduces the semantics of knowledge representation on the web. The semantic web extends the web with richer and more precise information because it is expressed in a formal language using a vocabulary defined in an ontology (a structured vocabulary of concepts and properties defined in a logic). Ontologies are used for describing web resource content and reasoning about these resources formally. We introduce the semantic web languages (RDF, RDFS, OWL) and show their relations with knowledge representation formalisms (conceptual graphs, description logics) and XML. This provides tools for reasoning with ontologies and, in particular, to evaluate queries. However, the distributed nature of the web leads to heterogeneous ontologies which must be matched before using them. We discuss ontology matching and explain how to semantically interpret the relations between ontologies. Finally, this is applied to network of peers using knowledge together.

This year the course will start by the Semantic web part.

Place and time

Lectures are, usually in room H202, on Wednesday from 9h45 to 12h45.

Planning (2016-2017)

This can be consulted on the official timetable in ADE (unfortunately, I am unable to provide you a link...

28/9Semantic web languages (Data: URI, RDF, closure, interpolation lemma)H202JE
5/10Semantic web languages (Ontologies: RDFS and OWL)H202JE
12/10Querying RDF (SPARQL)E100-infoJE
19/10Querying data though ontologies (NSPARQL, PSPARQL, DL-Lite)H202JE
26/10Alignment semantics and networked ontologies + Mid-term examH202JE
9/11Introduction to the social webH202SAY
15/11Search and recommendation in the social webH202SAY
23/11Core XML (XML, Schemas, Parsing)H202PG
30/11Programming with XML (Streaming Validation, XPath, XQuery)H202PG
7/12Foundations of XML Types (Tree Grammars, Tree Automata)H202PG
14/12Tree Logics (FO, MSO)H202PG
4/1Tree Logics continued (μ-calculus)H202PG
01/02Final exam (9h-12h)??

Outline and documents

First part: Semantic web

This part of the course is now collected into a single Lecture notes volume. These notes are always evolving so, avoid printing them until before the exams. It is easier to download (and update) it and browse through the PDF. It is divided in three parts correponding to the main sessions.

  1. Graphs and ontologies (old handout + Semantics of knowledge representation languages: old lecture notes -optional)
    1. Resource description framework
    2. Ontology languages
  2. Queries (old handout (2008), support paper)
    1. Querying RDF with SPARQL
    2. Extending SPARQL
    3. Querying modulo ontologies
  3. Networks of ontologies (slides (2008))
    1. Networks of ontologies and alignments
    2. Alignment semantics
    3. Distributed query evaluation

Dependencies between lecture topics

Second part: Social networks

Third part: XML processing

Slides are available from: http://www.pierresoft.com/pierre.geneves/teaching.htm.


  1. Serge Abiteboul, Ioana Manolescu, Philippe Rigaux, Marie-Christine Rousset, Pierre Senellart, Web Data Management, Cambridge university press (UK), 2011
  2. Philippe Adjiman, Philippe Chatalic, Francois Goasdoué, Marie-Christine Rousset, Laurent Simon, Distributed Reasoning in a Peer-to-Peer Setting : Application to the Semantic Web, Journal of Artificial Intelligence Research (JAIR) Volume 25. 2006.
  3. Philippe Adjiman, Francois Goasdoué, Marie-Christine Rousset, Some RDFS in the Semantic Web. Journal of Data Semantics (JoDS), 2007
  4. Gediminas Adomavicius and Alexander Tuzhilin. Towards the Next Generation of Recommender Systems: A Survey of the State-of-the-Art and Possible Extensions. IEEE Transactions on Knowledge and Data Engineering 17(6), June 2005.
  5. Sihem Amer-Yahia et al. Efficient network aware search in collaborative tagging sites. PVLDB 1(1):710-721, 2008
  6. Sihem Amer-Yahia et al. Group Recommendation: Semantics and Efficiency. PVLDB 2(1):754-765, 2009
  7. Sihem Amer-Yahia et al. It takes variety to make a world: diversification in recommender systems. EDBT 2009.
  8. Sihem Amer-Yahia et al. Getting recommender systems to think outside the box. RecSys 2009.
  9. Sihem Amer-Yahia et al. Efficient Computation of Diverse Query Results. ICDE 2008.
  10. Grigoris Antoniou, Frank van Harmelen, A semantic web primer, The MIT press, 2004 (rev. 2008)
  11. Senjuti Basu Roy, Sihem Amer-Yahia, Ashish Chawla, Gautam Das, Cong Yu. Constructing and exploring composite items. SIGMOD Conference 2010: 843-854
  12. Senjuti Basu Roy, Sihem Amer-Yahia, Gautam Das, Cong Yu. Interactive Itinerary Planning. ICDE 2011.
  13. Hubert Comon, Max Dauchet, Rémi Gilleron, Christof Löding, Florent Jacquemard, Denis Lugiez, Sophie Tison, Marc Tommasi, Tree Automata Techniques and Applications (http://tata.gforge.inria.fr/), October, 12th 2007
  14. Mahashweta Das, Saravanan Thirumuruganathan, Sihem Amer-Yahia, Gautam Das, Cong Yu, Who Tags What? An Analysis Framework, PVLDB 5(11):1567-1578, 2012
  15. Mahashweta Das, Sihem Amer-Yahia, Gautam Das, Cong Yu, MRI: Meaningful Interpretations of Collaborative Ratings PVLDB 4(11):1063-1074, 2011
  16. Jérôme Euzenat, Pavel Shvaiko, Ontology matching, Springer Verlag, Heidelberg (DE), 2007; 2nde edition, 2013
  17. Ronald Fagin, Amnon Lotem, Moni Naor. Optimal Aggregation Algorithms for Middleware. PODS 2001
  18. Pierre Geneves, Nabil Layaida and Alan Schmitt, Efficient Static Analysis of XML Paths and Types, Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation (PLDI), San Diego (CA US), pp342-351, 2007.
  19. Pierre Geneves and Nabil Layaida, A System for the Static Analysis of XPath, ACM Transactions on Information Systems 24(4):475-502, 2006.
  20. Pascal Hitzler, Markus Krötzsch, Sebastian Rudolph, Foundations of semantic web technologies, Chapman & Hall/CRC, 2009
  21. Ian Horrocks, OWL 2: the next generation, 2011
  22. H. Hosoya, Foundations of XML Processing, April 2, 2007
  23. Modal μ-Calculi, Chapter of "Handbook of Modal Logic"
  24. Julia Stoyanovich, Sihem Amer Yahia, Cameron Marlow, Cong Yu. Leveraging Tagging to Model User Interests in del.icio.us. AAAI Social Information Processing workshop 2008.

Previous exams

In previous years, we had 3h exams at the end of the course. Starting in 2010-2011, we have two exams. This aims at being sure that the students know what is expected from them. In addition here are some past exams.

Here are some questions of an exam proposed at EPFL in 2009 and their corrections (in English) for the XML part only.

Here is the exam of 2008-2009 (in French) and its correction (in English) for the semantic web part only.

Here is the exam of 2009-2010 (in French or English) and its correction (in English) for the semantic web part only.

Here is the exam of 2010-2011 (in French or English) and its correction (in English) for the semantic web part only.

Here is the exam of 2012-2013 (in English) for the semantic web and social network part and its correction (in English) for the semantic web part only.

Here is the exam of 2013-2014 (in English) for the semantic web and social network part and its correction (in English) for the semantic web part only.

Here is the exam of 2014-2015 (in English) for the semantic web and social network part and its correction (in English) for the semantic web part only.

Here is the midterm exam of 2015-2016 and its correction for the semantic web part (in English).

Here is the midterm exam of 2016-2017 and its correction for the semantic web part (in English).

$Id: index.html,v 1.108 2017/01/11 16:22:27 euzenat Exp $