Ivan’s private site

July 5, 2008

Low hanging… dogfood?

Filed under: Semantic Web,Work Related — Ivan Herman @ 10:52
Tags: ,

This should be, actually, a comment on Péter’s comment on my previous blog, but it really becomes a separate topic. Ie, I decided to put it into a separate blog. Besides, it is a bit too long for a comment…

To summarize, the JWS journal has a pre-print service running, as a back end, the openacademia software developed by Péter and his friends. Which also means that the JWS data should be accessible in RDF, probably following the the SWC ontology (although I have not found a pointer on the JWS site).

But, if so, don’t we have a low hanging, hm, dogfood here for the SW community? We begin to have most of the recent SW publications in RDF somewhere on the net. Beyond the JWS papers the Semantic Web Conference Corpus site not only includes the RDF data for ISWC, ESWC, ASWC, and some related workshops, but it also has a SPARQL endpoint. I know that Daniel Schwabe is working on getting the WWW2008 conference material into a similar format and, hopefully, we can have the material available for the WWW200X conferences available somewhere on the Web. I maintain a list of books on a wiki (well, hopefully, the community maintains it…) but I also keep the same list on Bibsonomy, and the list is therefore available in RDF, too (again, using the SWC ontology). And there might be other resources that I do not know about.

So… the easy thing to do is to integrate all this RDF data via some SPARQL endpoint. Because the data is already in RDF, that does not cost anything (although I am not 100% sure all the data follow the same vocabulary, so querying might be a bit tricky). But what I would love to see is to have a general service with a nice user interface on top of the data. I want to be able to search easily through the data without writing SPARQL queries or dive into the RDF graph directly with an RDF browser. The scale can be tricky. A few weeks ago David Huynh created a nice exhibit page for the ESWC2008 data. It really looks great and helps a lot in searching the data. However… as an experiment I copied his file, and added a few more datasets from the SW Corpus. Well… it turned out to be too much for Exhibit (I may have made a mistake somewhere, of course, but I do not believe Exhibit is good enough for that amount of data). Ie, a more dedicated interface should be created to provide this service for end users (maybe along the lines of openacademia?).

And, of course, it is easy to have nice ideas on how to add new features with all the data around… For example, the book wiki page has references to Chris Bizer’s bookmashup data via the ISBN numbers. We could use DBpedia and Geonames to access information on conference cities, FOAF data on authors and editors… We could use some good service (like MOAT) to have a uniform tagging system for the papers’ topic, or use Ed Summers’ Library of Congress Subject headings in SKOS… In other words, this could become a nice LOD application, too! (Hm, maybe it is not such a low hanging dogfood after all?)

What I would really like is to get a comment on this blog saying “you uninformed fool, this already exists here and here!”. I would humbly stand corrected, and would happily use the service. Anyone with this comment?



  1. Hi Ivan,

    you uninformed fool, this already has been done for ESWC2008 conference and can be accessed here http://data.semanticweb.org/conference/eswc/2008/html

    1. The conference location is interlinked with DBpedia and Geonames. See http://beckr.org/marbles?lang=en&uri=http%3A%2F%2Fdata.semanticweb.org%2Fconference%2Feswc%2F2008

    2. Papers are interlinked with Revyu and the Semantic Web Community Wiki. See for instance

    3. Paper topics are interlinked with DBpedia. See URL above.

    4. Authors are interlinked with their FOAF profiles (not all, but a fair amount of them). See

    The same is currently being done for the WWW2008 conference. The data is not yet available as Linked Data, but there is already a dump at http://data.semanticweb.org/dumps/conferences/www-2008-complete.rdf

    There is also a PHP script for converting conference metadata from the EasyChair conference management system to Linked Data which will be published shortly on the Dogfood page, so that other conferences and workshops can do the same if they like.



    Comment by Chris Bizer — July 5, 2008 @ 11:56

  2. I really think scholarship is a perfect use case for the semantic web, and that you’re pushing in the right direction. There’s also the RDF-based publishing platform Ambra, which is used by PLOS One.

    But more broadly, on the dogfood question, I’m still waiting to see when semweb bloggers are doing to exploit the potential of RDFa. Having SPARQL plug-ins for WordPress is one thing, but it doesn’t really go very far for end users. Maybe a JQuery plugin that could layer additional SPARQL-derived information on top of content links? That could also ultimately add a lot of value to scholarly content.

    Comment by Bruce D'Arcus — July 5, 2008 @ 15:33

  3. It is nice we have more and more ways to publish our publications 🙂 in RDF.
    I would like to point you to JeromeDL solution developed in DERI, as another low hanging dog food 🙂
    DERI has a pre-print server for our scientific publications (http://library.deri.ie/) and for our library of physical books (http://books.deri.ie/).
    Everything is RDF (both legacy semantics and social annotations) – check out RDF buttons 🙂 and with the upcoming version of JeromeDL 2.5 (to be delivered by mid-autumn 2008) we will also feature SPARQL endpoint (due to transition from Sesame 1.0 to Rdf2Go/Sesame 2.0), OAI-ORE and LOD support.

    I would appreciate any comments on what more would the community like to see in the next releases of JeromeDL.

    Comment by Sebastian — July 6, 2008 @ 17:20

  4. Hi Ivan,

    A bit of background about openacademia.org:

    It started with us getting tired of having to put together a joint publication list for our group at the VU. Before openacademia, this typically went by all of us mailing his/her BibTex to our secretary, who than did the merging with varying success. openacademia first replaced this process by having two components: one for turning BibTex into RDF [1] and a smusher for merging publication metadata. Then we realized we might as well use this data to also publish the joint list on the group’s website, and once we are at it, why not create an RSS feed as well? openacademia got a third component, with which you can execute a SPARQL query on the store and it outputs an RSS 1.0 feed which contains the metadata and it also preserves the ordering of query results (so that for example you can subscribe to latest publications). This is pretty cool: now you can add this to your favorite feed reader and watch what your immediate colleagues are writing.

    And then why not give the same functionality for individual researchers? If you have a BibTex file you can use openacademia to generate HTML from just that single file. Yep, the HTML will contain RDFa. You can also do this dynamically, so that whenever you update your web-accessible BibTex file, your homepage updates as well.

    But this is not all… since we had an RDF crawler, we threw that in as well… so all you have to do to make yourself part of the openacademia universe is to add to your FOAF profile. Alternatively, you can also point to the service at [1] and say

    <rdfs:seeAlso resource=”http://www.cs.vu.nl/cgi-bin/mcaklein/bib2swrc2.pl?url=http%3A%2F%2Fwww.example.org%2Fmypubs.bib”

    but slightly more techie…

    If our crawler finds it, it will be added to the central openacademia repository. Btw, you can also use FOAF’s Group class to define your group, so that others can subscribe to your group (and you can independently change who is in the group and who is not). At the VU we went all the way to generate FOAF group definitions from our LDAP database. Btw, you can also subscribe to the latest publications of all your friends, based on foaf:knows relations in your profile.

    And lastly we built the search interface that is openacademia.org… a simple search/browsing/visualization interface for the central repository. There are a few other nice pieces such as the automated tagging of publications using term extraction or the importing of comments from Technorati. This latter one is actually kind of fun, because it gives an easy way to discover what people are blogging about your pubs. Yes, you can subscribe to publications by yourself, recently commented on by others…

    Some people asked already if all this is LOD compatible… the answer is that as it is no: the service at [1] generates URIs like


    where key is the publication key from the bibtex file. This is problematic because most people host their Bibtex files along with their homepage and they have no access to configuring the apache server of their university, research institute etc. Obviously, JWS could do something like


    However, you have to convince a publisher… conferences might be easier targets.

    And now the sad part about openacademia: we have no time any more to work on it. I’m 110% busy (but loving it). Hoping someone will pick it up. It’s open source and nicely modularized, everything is a web service, and we went to some extremes to use standards wherever we could. (The search result display you see on the search interface is generated by XSLT on the RSS feed, on the client side… that’s why you can choose the presentation template, e.g. list view, full view, google view…)

    Looking for someone to pick up the ball and start running with it 😉

    [1] http://www.cs.vu.nl/~mcaklein/bib2rdf/

    Comment by Peter Mika — July 6, 2008 @ 19:52

RSS feed for comments on this post.

Blog at WordPress.com.

%d bloggers like this: