Ivan’s private site

October 30, 2009

ISWC2009 4-5

Filed under: Semantic Web,Work Related — Ivan Herman @ 0:59
Tags: , , , , , ,

Fourth day

Shame on me, but I missed the morning keynote… I was a bit late arriving to the conference site and I got stuck in a conversation at breakfast. Things happen…

The most notable event in the morning, at least for me, was the SPARQL WG panel. All members of the Working Group (me included) were on the panel and the room was full. I mean, full, people were standing in the back. And I regard that as a success by itself, it shows not only the overall importance of SPARQL, but the real interest around the new version, ie, SPARQL 1.1 (in case you have missed it, the first working draft has just been published a few days ago). Lee Feigenbaum (co-chair of the group) gave a quick overview of the new features and then questions came.

The difficulty of the SPARQL 1.1 work is that it has to find a balance between what is realistic to standardize in a relatively short time frame and what could be good to see in a new query language. As a consequence, there are features that the community has discussed but have not made it into the document, or only in a simple format. That came up during the discussion but I had the impression that the audience, by and large, understood this balance. Actually, for some, the set of new features were even too much for an efficient implementation. I have the feeling that  the WG will have to publish a separate conformance document (a bit like OWL 2 has), because there is a certain confusion on whether a conforming SPARQL implementation will have to implement, say, update or inference regimes or not. That clearly came up through the questions. Anyway, remember one email address (yes, it is a bit of a mouthful): public-rdf-dawg-comments@w3.org this is where comments have to be sent on SPARQL 1.1!

I chaired a session on the use track in the afternoon.  The paper of Daniel Elenius et al on reasoning about resources (for military exercises) was interesting to me because it was based on reasoning with relatively large OWL ontologies plus rules. The OWL ‘side’ was not very complex (Daniel referred to DLP, today I would say probably OWL 2 RL) but extended with extra rules. What this shows that when RIF will be finished and published, the combination of OWL with RIF may become very important for tons of practical applications. (As an aside, a nice little joke from Daniel: what is the system used by the military today when planning for exercises? The system is called BOGSAT. It stands for ‘Bunch Of Guys Sitting Around a Table’…)

Roland Stuhmer gave a very different style presentation on how user events (clicks, combination of clicks, etc) can be collected, categorized, and integrated into an application, analyzed with some rules for, eg, targeted ads. The system is based on harvesting not only the structure of the Web page, but annotations appearing in the Web page via RDFa. The result is an RDF structure describing the events that can be sent to a server, analyzed locally, distributed, etc. Nice usage of RDFa, but also important to have a Javascript API that can retrieve the RDF triplets from the RDFa structure attached to a specific node. (B.t.w., the old graphics standards of the 80’s and 90’s, called GKS or PHIGS, had notions of combined event structures with different event types. I do not remember all the details any more, but may it be worth looking at those again in a modern setting?)

Personally, the highlight of the day was the presentation of the semantic web challenge finalists. I was member of the jury, which meant that I had to review the submissions in advance and we had two very enjoyable discussions with the rest of the jury on the submissions. We had the first selection the day before, and this time all finalists gave their presentations and demos. And it was a tough task to choose (that is why we had such long discussions:-) because, well, the submissions were great overall. I do not really want to analyze each of the entries; I do not think it would be appropriate for me in this position. But the winner entry for the challenge, namely TrialX, really made a great impression on me. In short, the application is a consumer-centric tool through which patients can find matching clinical trials where they want to participate; it also helps those who organize those trials, etc. It is some sort of a matchmaking tool using all kinds of medical ontologies and vocabularies, public health record data and the like. We should realize the importance of this: here is a great Semantic Web application, winner of the challenge, which is really an application, not only demonstration, already deployed on the Web (soon as an iPhone app, too), and, to be a bit dramatic, may (and possibly has already) save lives. What else to we want as a proof that this technology is not only an academic exercise any more?

Fifth day

Only a partial day for me, as far as the conference goes, because I had to fly out before the end… But I could listen to the last keynote of the conference, ie, that of Nova Spivack.

Not surprisingly, Nova talked about Twine-2, a.k.a. T2. I did not really know what T2 was to be, I only heard that Twine, ie, T1, is moribund. As Nova acknowledged, it is too complicated, it is too hard for users to really figure it out; in fact, most of the users used it for search. Which is not the strongest feature of T1 in the first place.

So T2 is (well, will be) all about semantically backed search. It semantically indexes the Web, with an attempt to extract semantic information from the pages. The user interface would then be some sort of, essentially, faceted interface that would automatically classify the search hit results into different tabs; the user can use these tabs, drill down along other categories, etc. So far nothing radically new, though the user interface Nova showed was indeed very clean and nice. All this is done, internally, via vocabularies/ontologies, using RDF, RDFS, or OWL.

The interesting aspect of T2 (at least as far as I am concerned) is the incorporation of collective knowledge. First of all, T2 will include a system whereby users can add vocabularies that T2 will use in categorization. Users can get back those ontologies in OWL/RDF, they can improve them, etc. The other tool they will provide is a means to help semantically index pages that are, by themselves, not semantically annotated. This can be done via a Firefox extension; users can identify parts of the web pages (I presume, essentially, the DOM nodes) and associate these with classes of specific ontologies. The extension produces an XSLT transformation that can be sent back to the T2 system. Some social mechanism should of course be set up (eg, webmasters annotating their own pages should get a higher priority than third party annotators) but, essentially, it is some sort of a GRDDL transformation by proxy: T2 will have information on how to find transformation to semantically index specific pages without requiring the modification of the pages themselves (in contrast to GRDDL where such transformation is to be referred to from the page itself).

Of course, the system was a bit controversial in this community; indeed, it was not clear whether T2 would make use of the semantic information that do exist in pages already (microformats, RDFa, …) let alone the Linked Open Data information that is already out there. When asked, Nova did not seem to give a clear answer though, to be fair, he did not specifically say no and he also said that the semantic index might be put back to the public in the form of linked data. To be decided. It is also not fully clear whether those proxy-GRDDL transformations would be available for the community at large (hopefully the answer is yes…). It will be interesting to see how it plays out (T2 comes out in beta sometimes early 2010). Certainly a project to keep an eye on.

From a slightly more general point of view it is also interesting to note that two out of the three Semantic Challenge winners are also semantic search engines with different user interfaces (though sig.ma and VisiNav definitely do use the LOD cloud, no question there…). Definitely an area on the move!

I had the time and, frankly, the energy to really listen to only one more paper in the regular track, namely the paper on functions of RDF language elements, by Bernhard Schandl. A nice idea: imagine a traditional spreadsheet, where each cell is a collection of resources from an RDF Graph, or functions that can manipulate those resources (extract information, produce new set of resources, etc). Just like a spreadsheet, if you modify the underlying graph, ie, the resources in a cell, everything is automatically recalculated. Because, just like for a spreadsheet, a function can refer to the result of another function in another cell, one can do fairly complicated transformation and information extraction quite easily. Neat idea, to be tried out from their site.

That is it for ISWC2009. I obviously missed a lot of papers, partly because social life and hallway conversations sometimes had the upper hand, and sometimes simply because there were too many parallel sessions. But it was definitely an enriching week… See you all, hopefully, at ISWC2010, in Shanghai!


1 Comment

  1. Ivan, thanks for blogging about Tripcel. For your convenience, here is a more direct link: http://www.ifs.univie.ac.at/schandl/2009/06/tripcel/
    Best, Bernhard

    Comment by Bernhard Schandl — November 1, 2009 @ 21:24

RSS feed for comments on this post.

Create a free website or blog at WordPress.com.

%d bloggers like this: