July 4, 2009

Dagstuhl Workshop on Semantic Web

Dagstuhl castleI have just come back from the Workshop “Semantic Web: Reflections and Future Directions”, held in Dagstuhl, Germany. Organized by John Domingue, Rudi Studer, Jim Hendler, and Dieter Fensel, the workshop positioned itself as the “second release” of a similar workshop that was held at the same place 10 years ago.

The first two days of the workshop were more traditional, in the sense that it was series of presentations and panels. This was the “reflection” part of the workshop: looking back to 10 years’ of history as well a peek into the current state of the art. It was interesting but, for my taste, a bit too long; the programme of the two days could have been compressed into one or, say, one and a half days. That would have given more time to the “future directions” part, ie, discussions in break out groups on various topics. I enjoyed those a lot: free flowing discussions on various topics, helping to exchange ideas, experiences, pointers at other works and results, and crystallizing possible future R&D issues. These discussions took place in a very pleasant, relaxed atmosphere among people who mostly knew one another already, ie, we could really concentrate on issues. Each group formulated a number of research goals for the years to come; some group also came up with more practical steps and goals.

As far as I know, the workshop organizers plan to collect all those research issues in some more coherent form, so we should watch this space. In what follows I just collect some issues that I took away from the workshop without the goal of being exhaustive; indeed, there were 6-7 parallel break out groups.

Issues around Web scale. This is clearly one of the major topics of the day. What happens when one has to deal with data containing billions of triples, when the data (ie, the triples) are “dirty”, ie, inconsistent, faulty, etc. Think of the Linked Open Data cloud, of data coming from sensor networks, mobiles, etc. Do we have to re-think all the notions that the Semantic Web inherited from the logic world, ie, completeness, meaning and consequences of consistency, what it means to get results for a query, etc? This is one area where opinions tend to diverge a lot. Some would prefer to completely put aside the traditional logic approaches (rules, descriptions logic, ontologies, OWL, etc), while others may argue that the advances in computing, in reasoning engines and methods are (and are expected to be) such that these methods should still be just as usable as before. As always, I hate any black-and-white statements… I do not think dismissing an area of technology is the right way but, also, other avenues, or new viewpoints should to be explored, too (e.g., how to react on inconsistencies, trying to get possibly incomplete results but whatever can be obtained within, say, 2 minutes, that sort of things). What approach would be used is very much dependent of the application. Anyway… Web scale is a major issue, everybody agrees on that!

Interaction. This is one of the break out groups that I did not attend, unfortunately. And obviously a hugely important direction of future R&D. Many Semantic Web applications today are such that their user interface is just standard because all Semantic Web related work happens behind the scenes, usually on the server side. However, on long term, there is a clear need for programs that could somehow directly show the data in some friendly way, programs that self-adapt themselves to the nature of the data. Not only for experts, but also for laypeople. Such environments may not only include extensions of current browsers but, eg, full desktop environments. Sort of intelligent, data-oriented user interfaces. A major research problem (user interface methodology is always a major problem, whether related to Semantic Web or not…), but also a hugely exciting research and development opportunities!

Vocabularies. There was a separate group on the management of vocabularies, which has identified a number of R&D issues: how does one describe a vocabulary, its interdependence with other vocabularies, how does one rank vocabularies… These are all fundamental question to solve to be able to find vocabularies for a specific purpose, to make specialized search. There are also issues around archiving, providing stable URI-s; last but not least (and this goes way beyond vocabularies only) major legal issues on what type of attribution, copyright or other legal machinery are to be used with vocabularies (it was good to have Tom Heath, who could tell us a bit about the datacommons’ approach). As an example of the many technological problems arising, the break-out groups coined the term “cherry picking of terms”. Although OWL has a mechanism for import, the practice of the RDF world is to use (ie, “cherry pick”) vocabulary terms (predicates, classes, etc) from various different vocabularies without necessarily taking the whole vocabulary, and certainly without using the owl:import predicates (think of routine usage of dc:title without importing the full Dublin Core vocabulary). How would a reasoner treat those? It may be a little bit easier to use a more rule based approach (like OWL RL) although it is not obvious how to cherry pick just the right amount of information on a, say, predicate. But Ian Horrocks also drew my attention on formal ontology modularization work that might be very relevant here; item added to my “to-be-read” list…

Provenance (and trust). One of the issues that popped up in all other break out groups; in consequence a separate one was formed on the second day of discussions. It is indeed one of the questions that anyone who talks about Semantic Web gets; in my personal view, having a clear “story” to tell about provenance is essential for a further deployment of this technology. The discussion in the group was really interesting because this issue raises a number of other questions, like the overall relationship of cryptographic techniques and the Semantic Web, what it means to have trust in context, what are the relationships to temporal or uncertainty reasoning, etc, etc, etc. It was also interesting for me to hear about other works, like the Open Provenance Model, albeit some of these were not necessarily done by Semantic Web people (eg, by the database community). We agreed that a Wiki page will be created (probably at RPI, set up by Deb McGuinnis) to collect information on this subject, and forming a W3C Incubator Group might also be in the books to provide a more thorough state-of-the-art. A long list of additional items to my “to-be-read” pile is coming…

And, of course, it was also good to meet a bunch of people, discuss things at lunch or dinner. This type of interaction is really fruitful. And there was also intensive twittering going on (using the #swdag2009 tag, pointing to a bunch of other reseources) although this time I did not twitter too much because I had problems with my wireless card:-(

It was a good meeting; thanks for the organizers. Would be good not to wait another 10 years for the next incarnation of this event…


