As referred to in my previous blog on LDOW2012, Hannes Hühleisen and Chris Bizer, but also Peter Mika and Tim Potter, published some findings on structured data in HTML based on Web Crawl results and analysis. Both Hannes’ and Peter’ papers are now on line. Hannes and Chris based their results on CommonCrawl, whereas Peter and Tim rely on Bing.
Although there are some controversies as for the usability of these crawls as well as the interpretation of their results (see Martin Hepp’s mail, and the answer by Peter Mika as well as the resulting thread on the mailing list) I think what is really important is the big picture which emerges from both set of results: no one can reasonably dispute the importance of structured data in HTML any more. Although I vividly remember a time when this was was a matter of bitter discussions, I think we can put this issue behind us now. I do not think I can summarize it better than Peter did in another of his emails:
…both studies confirm that the Semantic Web, and in particular metadata in HTML, is taking on in major ways thanks to the efforts of Facebook, the sponsors of schema.org and many other individuals and organizations. Comparing to our previous numbers, for example we see a five-fold increase in RDFa usage with 25% of webpages containing RDFa data (including OGP), and over 7% of web pages containing microdata. These are incredibly impressive numbers, which illustrate that this part of the Semantic Web has gone mainstream.