Ivan’s private site

June 19, 2009

SemTech2009 impressions

The first and possibly most important aspect of SemTech 2009 is that… it happened! I must admit that back in April-May, when the conference’s Web Site did not include any news of the program yet, I was a bit concerned that the general economic malaise would kill this year’s conference. O.k., I might have been paranoiac, but I think some level of concern was indeed legitimate. And… not only did the conference happen as planned, but the numbers were essentially the same as last year’s (over 1000). I think that by itself is an important sign of the interest in Semantic Technologies. Kudos to the organizers!

A general trend that was reaffirmed this year: by now, Semantic Web technologies are the obvious reference points for almost all presentations, products, etc, that were presented at the event. RDF(S), RDFa, OWL, SPARQL, etc, have become household names; newer specs like SKOS or POWDER may not have been as widely referred to yet, but I am sure that will come, too. Linked Data (and, more specifically, the Linked Open Data cloud) were almost ubiquitous this year while I do not believe that it was even mentioned last year. That is a huge change (although I still miss real “user facing” applications of LOD to show up; some, like Talis’ system deployed at UK universities, were presented but not as part of the regular conference). All that being said, I somehow seem to have missed more sessions than last year, which make my impressions more patchy. There were several journal interviews that I could not escape, hallway discussions that were great but made me miss a presentation here and there… I guess this is what happens when you have such a number of people around!

Tom Tague (from Open Calais) gave a very nice opening keynote. His talk was actually not on Open Calais (he did that in 2008), but rather on his experience in talking to different people who tried to start up new ventures in the Semantic Web area (a quote from his talk: “in 80% of the discussions I did not understand what the vendors wanted, and I walked away with my cheque book intact… Simplify!”). The main areas that he looked at were tools, social, advertising, search, publishing, user interface. One of the remarks I liked was on search: in his view (and I think I agree with that) Semantic Technologies may not be really interesting for general search (where the statistical, i.e., brute force methods work well) but for specialized, area-specific search tools (things like GoPubMed or applications deployed at, eg, Eli Lilly or experimented with at Elsevier come to my mind as good examples). Similarly, these technologies are not necessarily of interest for general, “robotic” publication tools like Google’s news, but for high quality publishing, with possible editorial oversight (reducing costs and difficulties).

(He also had a nice text on one of his slides: “Web2.0: Take Web 1.0, add a liberal dash of social, generous amounts of user generated content, atomize your content assets and stir until fully confused”:-)

Tom Gruber talked about his newest project: SIRI. A super-duper personal assistant running on an iPhone with conversational (voice directed) interface. The group behind it integrates a bunch of info on the Web (the “usual” stuffs like restaurants and travel sites), categorize them, and hide the complexities behind a sexy user interface. The problem I have is that I just do not see how this would scale. I see one of the major promises of the Semantic Web getting data in RDF out there so that such, essentially mash-up applications would become much easier to create and maintain. Until then, it is really tedious… On a more personal note, I am not sure I would like the voice conversational interface. I know that I have never used the voice commands on my phone for example; I do not feel comfortable with it. But, well, that is probably only me…

Chime Ogbuji made a really nice presentation on the system they have developed at the Cleveland Clinic. Great combination of RDF, OWL, and SPARQL. The interesting aspect (for me) was that usage of a medical expert system called Cyc, which is used to convert the doctor’s question in natural language (insofar as a question full of medical jargon can be considered as “natural”:-) into, essentially, a SPARQL query. The medical ontologies are used to direct this conversion process, and then the triple store could be queried through the generated query. Impressive work. (Part of it was documented in a W3C use case, but this presentation had a different emphasis.)

Unfortunately, I had to skip Peter Mika’s presentation on the SearchMonkey experiences, I will have to look at his slides… But, as a last minute addition to the program, the organizers succeeded in getting Othar Hansson and Kavi Goel to talk about Google’s rich sniplets. I have already blogged on this a few weeks ago but this presentation made the goal of the project way more understandable. Essentially, by recognizing specific microformat or RDFa vocabularies, they can improve the user experience by adding extra information on the search result. It is interesting to observe the difference between Yahoo! and Google in this respect: both of them use microformats/RDFa for the same general goal but, whereas Yahoo! relies on the community providing applications and on users personalizing their own search result page, Google controls the output in a generic way that does not require further user actions. It will be interesting to see how these differences influence people’s usage patterns. There were some discussion on the Google’s choice on vocabularies; the presenters made it quite clear that they are perfectly happy using other vocabularies (eg, vCard or FOAF) if they become pervasive, and this is a discussion that Google plans to engage with the community. There is of course a chicken-and-egg issue there (if a vocabulary is known by Google, then it will be more widely used, too), and this is cleary an area to discuss further. But these are details. The very fact that both Yahoo! and Google look at microformats and RDFa is what counts! Who would have thought just about a year ago?

I was not particularl impressed by the Semantic Search panel. I had the impression that the participants did not really know what they should say and talk about:-(

Nice presentation by Jeffrey Smitz from Boeing on a system called SPARQL Server pages. Essentially: the user can use similar structures like, say, a PHP page, ie, a mixture of HTML tags and server “calls”, except that this “calls” refer to SPARQL queries against a triple store on the server. Their system also includes some rule based OWL reasoning on the server side, although I am not sure I got all the details. All in all, the system seemed a bit complex, but the general approach is interesting! And it is nice to see that a company like Boeing seems to make good use of RDF+OWL+SPARQL; it would be good to know more…

I missed Zepheira’s presentation on freemix which is a shame, but, well, it happens. But I did play with freemix before travelling to San Jose;  I called it “Exhibit for the masses”. And this, I think, is a fair characterization. David Huynh’s exhibit is a really nice tool, but it is not easy to use it. On the other hand, it took me about 2 minutes to make a visualization of a json data set I used for an exhibit page elsewhere…

Andraz Tori talked about Common tag, a small vocabulary that, for example, can be used when marking up texts with tags (something that engines like Zemanta or Open Calais do). Bringing the RDF and the tagging worlds together is really important; I am very curious how successful this initiative will be…

The keynote on the last day was from the New York Times (by Evan Sandhaus and Robert Larson). It was quite interesting to see how a reputable journal like the NYT has developed a tradition of indexing, abstracting, cataloging articles, how these are archived and searched. Impressive. It is also great that the NYT Annotated Corpus has been released to the Research community. I did not know about that and, I presume, this must be a great resource for a lot of people active in the are of, say, natural language processing. Finally they announced their intention to release their thesaurus in a Semantic Web format, to add a “blob” to the Linked Data Cloud. They still have to work out the details (and expect feedback from the community) and I would hope they would publish a SKOS thesaurus and might even annotate the news items on their web site using this thesaurus in RDFa. But something in this space will happen, that is for sure! Other reputable newspapers, like Le Monde, the Guardian, NRC Handelsblatt,  el Pais, will you follow?

I also had my share of talking: gave an intro tutorial to SW, gave an overview of what is happening at W3C (quite a lot this year, including the finalization of POWDER, OWL 2, and SKOS!) and participated at an OWL 2 panel (with Mike Smith, Zhe Wu, Deb McGuinnis, and Ian Horrocks). I was quite happy with the tutorial and the way the panel went; the audience for the talk could have been a bit larger. But, well…

It was a long week, long trips, not much sleep… but well worth it!

Reblog this post [with Zemanta]


  1. Hi Ivan,

    I really like the idea that linked data technology should make it easier for us to create our own SIRI-like apps. SIRI is built on the APIs of other services; centralized documentation of all those APIs would be very cool. I guess part of their added value is the reliability that comes from them working out QOS agreements with those vendors, but still, I’d rather have the set of building block than a finished toy that may or may not fit my needs.

    >Unfortunately, I had to skip Peter Mika’s presentation on
    >the SearchMonkey experiences, I will have to look at his slides

    I looked around on http://www.semantic-conference.com but couldn’t find any links to slides. Are these publicly available?



    Comment by Bob DuCharme — June 20, 2009 @ 15:02

  2. Hi Bob,

    Unfortunately, the slides are not public (yet?). Participants have received a CD with the slides, that is what I have. Maybe if many people ask, then the slides will be put into the public (although I could see a problem in having to ask the permission from all speakers which is always more difficult after the event). 😦

    Comment by Ivan Herman — June 20, 2009 @ 15:15

  3. The organizers told me that attendance was actually up by more than 20% to about 1,300. Not bad given the economy. I quite agree that the search panel was a bit of a damp squib. I am not sure that the next generation of compelling applications will come out of search at all, though I am a big fan of what Yahoo is trying to do. Google seems to be caught in the trap of its own success – it can only take serious things it can see doing at webscale, LOD (Linked Open Data) and other semantic apps may be more local and specific opportunities. The Semantic MediaWiki sessions were usful to me.

    Comment by Steven Forth — June 21, 2009 @ 13:28

  4. […] SemTech2009 impressions (ivan-herman.name) […]

    Pingback by Looking back at the Semantic Technology Conference, and the rest of my week in the Valley | Paul Miller - The Cloud of Data — June 24, 2009 @ 15:34

  5. […] SemTech2009 impressions (ivan-herman.name) […]

    Pingback by How Open is ‘Open’ ? | Paul Miller - The Cloud of Data — July 2, 2009 @ 14:24

RSS feed for comments on this post.

Blog at WordPress.com.

%d bloggers like this: