Ivan’s private site

July 29, 2007

From Wikipedia URI-s to DBpedia URI…

Filed under: Semantic Web,Work Related — Ivan Herman @ 10:49

I hit a small issue the other day, which was clarified by the DBpedia folks. I guess others may have, or will hit a similar issue; it may be worth therefore to document it.

The question I had: what URI should I use for J.S. Bach’s Well-Tempered Clavier? This is a non-trivial question if one wants to use, say, the Music Ontology in cataloging one’s music: indeed there is nothing like, say, ISBN-s in classical music. I am not sure who suggested in the past that using the corresponding Wikipedia URI might be the best option we have. Well, now that we have DBpedia, I thought that the corresponding DBpedia URI (the non-informational resource one, that is) should be a much better choice. But what is the URI?

I typed in “Well tempered clavier” in my Wikipedia search box, and this led directly to

http://en.wikipedia.org/wiki/Well_Tempered_Clavier

which displayed the corresponding article. Fine, so I thought by using a simple replacement, yielding

http://dbpedia.org/resource/Well_Tempered_Clavier

would give me the URI. Wrong, 404…

The issue (and that is where I was wrong) is that Wikipedia uses a funny, non-HTTP based redirection. It displays the right content for the search result, it keeps a kind of phony URI in the browser’s address bar, it just puts a small note into the article saying “(Redirected from XYZ)”. DBpedia does noe keep track of all possible search possibilities, so a URI minted from the redirected page (“redirected” in the Wikipedia sense) is not the right answer.

What to do? Here is what Richard Cyganiak proposed as a general approach:

  1. Get your search term, have your article displayed
  2. If the wikipedia in use is not the English one, click on the English link in the left sidebar (a Richard put it: if there is no English link, you are out of luck… 😦
  3. If the page is redirected (look for the note like the one I referred to), hit the “Article” button on the top of the page. This will redisplay the same article content, but with the “canonical” URI in the browser’s address bar.
  4. Take that URI, and replace the leading http://en.wikipedia.org/wiki by http://dbpedia.org/resource

Voilà! You then get to

http://dbpedia.org/resource/Well-Tempered_Clavier

which can be used as a non-informational resource for this piece of music (note the difference in a hyphen in the URI!).

Thanks to Richard, Sören, and Georgi for enlightening me…

Advertisements

7 Comments

  1. Shouldn’t one of the conclusions from this episode also be to add these alternative urls as sameAs uris to dbpedia?

    Comment by Valentin — July 29, 2007 @ 11:55

  2. I do not think so. Wikipedia generates a separate (somewhat “fake”) for each search term. Keeping to this example, if I type “well tempered”, Wikipedia will display the same information with the URI:

    http://en.wikipedia.org/wiki/Well_Tempered

    In other words, if DBPedia went down that line, it would be forced to generate aliases for all possible search terms that lead to the same page, which would lead to an explosion of URI aliases. I am not sure this would be realistic to expect nor that it would be good…

    Comment by Ivan Herman — July 29, 2007 @ 13:35

  3. Not exactly: the first page (http://en.wikipedia.org/wiki/Well_Tempered_Clavier) is an explicit redirect, i.e. something where people have added that they consider indeed this article when you typed that, whereas the second page (http://en.wikipedia.org/wiki/Well_Tempered) leads you to a generic search page, because it is not part of the database. dbpedia could check for that difference, and I think Valentin is pretty right.

    By the way, your first link links to dbpedia though it says Wikipedia.

    Comment by denny — July 29, 2007 @ 18:21

  4. Hm. There may be even more the various Wikipedia actions than what I know… I use, in Firefox, the search engine box on the address bar with the search target as implemented by Firefox. If I type in “well tempered” there as a search item, I do get to the article page with the URI

    http://en.wikipedia.org/wiki/Well_tempered

    displayed in the address bar. Ie, it bypasses the generic search page, ie, it jumps over one step… Which means that, as a lambda user, I do not know which are the URI aliases that Wikipedia really has and which are generated on the fly.

    Well, I let the DBpedia folks look at that. I still believe that adding all URI-s (whatever they are) with something like sameAs might be an overkill…

    Comment by Ivan Herman — July 29, 2007 @ 20:13

  5. Valentin und Denny are right, we should have these alternative redirected Wikipedia URIs in DBpedia. But I don’t think that they should be sameAs. The redirected URIs are often old cruft, old names that didn’t meet the naming guidelines of Wikipedia and were therefore changed. Wikipedia keeps the old names working, to keep links from breaking, but their use is discouraged in the guidelines. In DBpedia, we should capture the fact that the use of one name is discouraged while the other is preferred. And owl:sameAs doesn’t do this. I’m not sure how we should model this.

    Comment by Richard Cyganiak — July 30, 2007 @ 0:15

  6. Actually the “redirected” URIs of Wikipedia are “synonym URIs”, dealing with URIs the same way people who build thesauri deal with preferred (USE) and synonym (UF)terms (or descriptors). In a thesaurus-based search engine, you use the preferred terms for indexing, and the search using synonyms brings back resources indexed on matching preferred terms. So what we need here is something as a notion of “prefURI” and “altURI”.

    Comment by Bernard Vatant — July 30, 2007 @ 10:36

  7. Ivan, thank you for the tip! That could potentially have been an issue for me had you not informed about it.

    As for potential solutions (in the “synonym URI” case), I think that having only one URI for a concept is very good from a usability perspective (to limit the risk of people using different URIs for the same thing). Though formally solvable through ontology mechanisms, multiple URIs would complicate many integration scenarios. To me DBPedia is a great step towards avoiding such matters.

    Perhaps dbpedia could link to an information page much like this one from their 404 page to generically limit the confusion the present state may cause.

    Comment by Niklas Lindström — July 31, 2007 @ 18:21


RSS feed for comments on this post.

Create a free website or blog at WordPress.com.

%d bloggers like this: