The last day of a long week (I also had a W3C AC meeting before the conference…).
The day began with a keynote of Dick Hardt on what he calls “Identity 2.0”. (Everybody seems to live in 2.0 these days. I wonder whether I should not call myself “Ivan 2.0”, just to sound fashionable:-). His presentation was very enjoyable, well prepared, but a bit too abstract. I have the slight impression that he reused a presentation originally prepared for a non-technical audience, which means that it stayed on an abstract level without giving any details on the technology underneath. Pity. Anyway, the vision he has is to use a centralized “security agent” that knows all necessary data about you, about your different identities, etc, which would automatically check the claim with issuers of other identity elements (eg, did I really go to school at the place I claim) and would also send out identity information to relying parties. Without more details on how this would be done, where that security agent would be, etc, all this sounded a bit scary to me. On the other hand, Dick Hardt is (as far as I know) one of the editors of the OpenID spec, so he obviously had to think about all the pitfalls such an approach would have both socially and technically. This is why it was a real pity he did not take his time to go more into the technology details. Oh well…
The paper on Yago from Fabian Suchanek et al was pretty interesting. By analyzing wikipedia category entries, and reinforced by an analysis of the terms using WordNet, they create semi-automatically a large knowledge base with about 900,000 entities and with around 6 million facts on those entities. It has query interface, but can also be downloaded from their site. The system also includes a rule-like inference engine through which new facts can be found. All that is great and impressive but they have developed this in isolation, by using non-Semantic Web technologies. Ie, they did not use RDF (though they have triplets, in fact), they do not use OWL or Rules, SPARQL… as it stands, the system is completely disjoint from the rest of the Semantic Web. The only technical reason I heard at the presentation is that they had difficulties to time-stamp their facts, so they needed an alternative structure. I do not want to minimize the problem around timed RDF statements, and it is of course their right use whatever technologies they want for their research, but it would be a pity if it stayed that way. Luckily, this may not be the case. I asked this question after the presentation and Fabian said they would combine this somehow with dbpedia; and I actually found a reference on Chris Bizer’s latest slides on exactly this. My understanding is that Fabian and Chris found some ways of binding Yago to RDF during the conference. If so, Yago may become an impressive addition to the available Semantic Web knowledge bases!
(Note on 2007-06-07: Fabian Suchanek contacted me drawing my attention on two misunderstandings. I am happy to copy them here:
- “The YAGO model (as described in the paper) is basically RDFS plus some additional semantics. YAGO can also be downloaded in RDF. ”
- “The reason why we did not use OWL was that OWL does not support acyclic transitive relations – which are important for YAGO. Thus, we basically use RDFS and add acyclic transitive relations, but this is just the semantics.”
And yes, since then, Yago has been incorporated into dbpedia! Yey!)
Another paper I found particularly interesting was the paper “Analysis of Topological Characteristics of Huge Online Social Networking Services”, by Yong-Yeol Ahn et al. It is not really a Semantic Web paper (although it was part of the Semantic Web track) but more in direction of the Web Science discussion that took place on the first day. They retrieve connection data from social sites like MySpace, Orkut, or cyworld (the latter is a Korean social contact system) and create a huge graph from the “who knows whom” like relationships. They use then social network analysis to analyze the graphs. They found, for example, that in all these graphs they could spot a number of nodes with a huge number of links, way beyond the number one would have with “normal” social contacts (they referred to these nodes as “super users”). As an answer to a question it turns out that these super users are persons, and not various types of organizations or associations that regularly appear on these sites. They could also analyze how the graphs evolved over time. Beyond the specific result, I think the line of research is really interesting and important. By the way, the next WWW conference in Beijing (WWW2008) will have a “Social networks and Web 2.0” track; worth keeping an eye on that one.
Yong-Yeol also showed a nice cartoon on social spaces, an image that Sandro also discovered the other day:
As I said before: it was a long but interesting and fruitful week (and even longer trip, because I spent a week in Boston before coming to Banff). And i still have to go through the conference proceedings to see what else is interesting that I could not attend. It is time I go back home tomorrow…