November 14, 2007

ISWC2007 — first day

It is always difficult to write about a conference… One has to make a selection of sessions, there is jet-lag involved (big time:-), hallway conversations instead of sessions, etc. This means that one will clearly unfairly forget some papers and presentations. Well, my apologies, but that is the way it is…

That being said: the keynote of Brewster Kahle on the open library project was certainly interesting. For those who may not know, these guys have no less of a goal than to make most of the culture of this World (books, music, video, etc) available on the Web, for eternity, and free of charge. Fascinating goal, though they are clearly facing huge hurdles, least of those being technical. The really tough problems, as Brewster emphasized several times, are legal: copyright and similar issues…

I must admit that such projects always make be a little bit, well, suspicious. Cleary, Brewster and his team are making huge efforts to be as international as possible, they work with people in Egypt, China, countries in Africa and South America, etc… nevertheless, I am always a bit afraid about other cultures being left behind by such an endevour. With the best will of all involved, the result will be highly influenced by those who really have the means and the possibilities to participate; ie, with the way the World is shaped today, it will be dominated by Anglo-Saxon, primarily American view of the World. What about small languages and cultures? Of course, this is already a problem: who has read the poetry of, say, Miklós Radnóti? A Hungarian poet, whose war-time poetry is one of the most dramatic account of life in work camps during the war… but he wrote in Hungarian. Difficult if to translate properly (the wiki page refers to some translations, but, well…). On the other hand, imagine a world where this open library is the reference point for, say, literature around the World, including for Hungarians, merely because this will be the reference on the Web? One can already see a similar effect with Wikipedia; in spite of the local language versions, the English version dominates, and, well, not only “if it is on Wikipedia, it must be true” but also “if it is not on Wikipedia, it does not exist”…

Well, enough wining. I am really hopeful that these guys will prove me wrong. However, another point is more technical and exciting; let me quote here directly from the Linking Open Data Wiki page:

Brewster Kahle from The Internet Archive gave a most inspiring talk on “Universal Access to Human Knowledge”. He proposed a challenge to the SemWeb community to work with them to interlink their Open Library project into the SemWeb. It is a gold mine of data for the LOD group. TomHeath & I (DavidPeterson) had a quick chat with Brewster and he is extremely interested in this work and opening a line of communication. Even to the point of putting a new /RDF/ style link into their URIs for books (ex: http://archive.org/details/owlandpussycat00leariala). They already have a mass of metadata.

And it seems that this discussion has already started on the LOD mailing list. Yes, it would be a formidable addition to the Open Data set!

I quite liked the RDFSync paper of Morbidoni et al. It is one of those papers which have a relatively simple and nevertheless powerful message: one can partition an RDF graph with Minimal Spanning Graphs (MSG-s); each of those partitions can be individually digitally signed and checksumed (using Jeremy Carroll’s algorithm); the full graph can therefore be represented (uniquely!) by a lexicographically ordered set of such checksums. Why is that good? Because if copies of RDF graphs are to by synchronized, one can check those individual cheksums, and move over only those parts of the graphs where those checksums are different (ie, the underlying partition is different). When data are duplicated, for example, the win can be huge. As I said, a basically simple and nevertheless very powerful approach. Check it out. It is worth it.

I also quite liked the paper of Alani et al., reporting on a study on a pilot project with various governmental organizations in the UK to use SW technologies in their operations. This is a typical use case paper: what are the (possibly non technical) hurdles to overcome, how to create a value proposition without scaring these organizations away, etc. I think it is a good reference paper for everyone who intends to use SW technologies in practical projects with participation of non-SW savy partners.

There were, obviously, a number of papers where I got the feeling “there is something to check out here at some point”, but I could not necessarily follow all the technical details during the presentations. This was the case, for example, for the paper of Zaginis et al on computing deltas of RDF Models, or the details of another interesting case study of Srinivas et al on how to choose clinical trial candidates using complex medical ontologies, patient records and (highly non-trivial) inferencing. The paper of Tamilin et al. on Heterogeneous Ontology Environments was also interesting; I must admit it is the first time I saw a formalism on distributed DL processing. All these leave me some homework…

See what the next day will bring!


