Ivan’s private site

September 3, 2007

Yet another RDFa processor…

Filed under: Code,Python,Semantic Web,Work Related — Ivan Herman @ 17:30
Tags: ,

The summer months were quite relaxed, so at some point I decided to write an RDFa processor (in Python). I know, I could have used Elias Torres’ parser (also included in RDFlib), but my goal was a bit different. It was at a time when the RDFa task force had long technical discussion on the details of the main RDFa parsing/processing rules, and I wanted to test whether those rules, as described at that moment, were correct and implementable (they were). And, while I was at it I then decided to properly finish up the implementation to make it generally usable.

The result is a Python package (it can also be downloaded as a compressed tar file) which uses RDFLib to build up the graph as well as for final serialization. To the best of my knowledge the parser follows the latest (not yet published:-( version of RDFa, and I definitely plan to keep it that way in future. There is also a “distiller” that can be used on-line. The implementation (mainly for the distiller) is not complete: indeed, I should work on a proper error handling rather than relying on Python’s xml minidom package simply throwing an exception on the user’s face for, say, an invalid XHTML…

I also decided to test it on something more complicated, so I created an RDFa version of my foaf data. I have now an XHTML file with my foaf data that can be used (either via the distiller or directly using Python) to generate my RDF/XML foaf file. It shows one of the real advantages of RDFa: the foaf data mixes quite a number of various vocabularies, but that is absolutely no problem for something like RDFa. In any case, I do not intend to edit my foaf data in RDF/XML any more…



  1. Congrats! Great job.

    Indeed, a proper error handling (and explanation of errors) might be a good idea …

    When I run my XHTML/RDFa FOAF page (http://sw-app.org/mic.xhtml) through it, currently a nasty error pops up … might be too much for an RDFa novice 😉


    Comment by Michael Hausenblas — September 3, 2007 @ 19:33

  2. so presumably it fails with HTML[45], and 99% of the content on the web.

    i often wonder why people dont use HPricot, jQuery and similar for RDFa parsing, thus piggyback on parsers that work with realworld content – . i guess they like tools that fail..

    Comment by carmen — September 4, 2007 @ 2:52

  3. Michael: yes, I will have to find some time for a more proper error handling… As for the specific error issue: at present, the Curie syntax is not used in RDFa. The rdf typing that you want to achieve should use the instanceof attribute. As I said, the new RDFa syntax should still be published (hopefully in the coming weeks).

    Comment by Ivan Herman — September 4, 2007 @ 8:52

  4. Carmen: no, it is not that I like tools that fail. But, as I said in my blog, the primary purpose for writing this code was to be a proof of concept on the core RDFa processing and, for me, using a proper XML DOM was the quickest way of doing that. I would be thrilled if somebody (you?) wrote a version using some other tools…

    Comment by Ivan Herman — September 4, 2007 @ 8:56

  5. The distiller works great. (Tried it with some documents from eurlex.nu). Are you planning on adding any code samples or tutorials for your RDFa parser? I had a look but can’t find any.

    Comment by Peter Krantz — April 17, 2008 @ 23:39

  6. Peter: indeed, I do not have any. For the basic usage, the RDFa documents (primer, etc) should do; I should indeed write some words on how the distiller could be incorporated into a Python application. On my to-do list…



    Comment by Ivan Herman — April 18, 2008 @ 9:12

RSS feed for comments on this post.

Create a free website or blog at WordPress.com.

%d bloggers like this: