Ivan’s private site

March 28, 2011

Empirical study of real-world SPARQL queries

Filed under: Semantic Web,Work Related — Ivan Herman @ 12:21
Tags: , ,

A nice paper I just heard at the USEWOD2011 Workshop at the WWW2011 conference: “Empirical study of real-world SPARQL queries”, by M.A. Gallego and his friends from the Univ. of Valladolid, in Spain. What they did was to analyse the SPARQL queries as issued by various clients to the DBPedia and the Semantic Web Dogfood dataset, to see if some general features appear that RDF triple stores and SPARQL implementers can take into account. This is a workshop paper, i.e., work in progress, so the results must be taken with a pinch of salt. E.g., it seems that DESCRIBE and CONSTRUCT queries are very rarely used (not a big surprise), that the OPTIONAL and UNION are used quite a lot, so their optimization is important, that most of the queries are dead simple, but around half of them rely on FILTER (albeit with one variable only), etc.

The interesting point for me is, however, that some of these data were radically different between these two datasets. E.g., 16% of the queries used OPTIONAL for DBPedia, whereas only 0.41% for the Dogfood dataset. What this tells me is that it is extremely difficult to optimise data stores in general. I.e., the characteristics of the data set, and indeed the application area (e.g., I would expect SPARQL queries to be much more complicated in the health care domain) have to play an important role. What the dimensions of optimizations are is not clear, but the type of research Gallego and his friends are doing might shed some light… Kudos for having started this discussion!



  1. Adding to your conclusion, I would say this paper shows that query optimization in RDF stores has to be implemented in a flexible way: The query engine has to analyze the query workload automatically and adapt its optimization strategies dynamically.


    Comment by Olaf Hartig — March 29, 2011 @ 3:13

  2. […] Herman – a regular contributor in our monthly SemanticLink podcast – posted his thoughts on an interesting new paper by M.A. Gallego and others entitled “Empirical Study of […]

    Pingback by Studying Real-World SPARQL Queries - semanticweb.com — March 29, 2011 @ 22:31

  3. […] that is obviously very personal, maybe the most important takeaway is actually close to the blog I wrote yesterday on the empirical study of SPARQL queries. And this is the general fact that we are at the point […]

    Pingback by LDOW2011 Workshop « Microformats & the semanantic web — March 30, 2011 @ 4:06

  4. […] Empirical study of real-world SPARQL queries (ivan-herman.name) […]

    Pingback by Save the Data « DECISION STATS — April 3, 2011 @ 22:08

RSS feed for comments on this post.

Blog at WordPress.com.

%d bloggers like this: