Martin Hepp drew my attention on one of his upcoming publications; some related thoughts…
The issues around URI-s come up regularly on the various SW related mailing list and discussion fora leading, sometimes, to passionate discussions. That is all right, it is indeed a complicated. But, somehow, the question of where to find the URI-s for various concepts does not always get enough attention (at least in my view). I remember a while ago, when Frederick, Yves, and some others were working on the Music Ontology, my question was: all that is fine, but what is the authoritative URI for, say, Beethoven’s 7th symphony?
Of course, in some areas, communities are working on such naming schemes for their own constituencies. LSID-s are a prime example in the Life Science domain. On line catalogs of digital libraries (see my earlier blog referring to RDA-s, for example) might provide us with another rich source of stable URI-s. As yet another example the lingvoj site of Bernard Vatant (just updated a few days ago) might establish itself as a set of stable URI-s for spoken languages (ie, the URI
http://www.lingvoj.org/lang/hu might become the URI for Hungarian). A number of similar datasets appear, for example, through the Open Linked Data project that could, eventually, play similar roles.
Yes but, in the meantime, what happens to the vast number of other “things”? What is the answer to my 7th symphony question? An idea I heard before: why not using the Wikipedia URI-s for that purpose? And that sounds like a good idea indeed. However, for that to work, a number of questions should be answered. Eg, how stable are those URI-s? How reliable are they? And this is where Martin et al.’s paper come in. They do a series of statistical measurements and analysis on the evolution of Wikipedia entries (they rely on data of this year). Their measurements indicate, for example, that the Wikipedia URI-s are indeed stable enough. To be more precise, their measurement show that this year around 93% of the URI-s on Wikipedia had a stable meaning (ie, the text of the corresponding article may have changed in some details, but the URI can still be considered as referring to the same notion). Given the large number of articles, this seems pretty o.k. to me… There are also some other statistical details in the paper (on the subject of the articles, for example), as well as further references, but, succinctly, that is probably the most important result. I am sure that further analysis on Wikipedia is still necessary (and I am also sure it will happen); this paper is certainly an interesting one among those!
So, should we rely on Wikipedia for the 7th symphony? Almost. If we go this direction, my choice would be to use DBpedia instead. DBpedia being a dump of Wikipedia, it inherits all the stability results that Martin et al. describe. Also, the current DBPedia setup makes a clear difference between a non informational resource URI and its RDF representation (an issue raised as a problem in  for Wikipedia URI-s). Last, but certainly not least, the RDF graphs in DBpedia are linked to an increasing number of other data sets via the Open Linked Data setup that applications may also exploit. Ie, a suitable URI for the 7th symphony might be:
Of course, this is not a silver bullet. There can be lots of criticisms for the topics treated in Wikipedia (or not). To continue my example, the list of Beethoven’s work is fairly well covered by Wikipedia articles, but this is less true for, say, Robert Schumann. New, more systematic vocabularies might appear in which case we may have URI aliases on our hands. Etc. However… do we have another, existing choice for today? I would be curious to hear…
(Note that the URI alias issue might be solved by automatically adding
owl:sameAs predicates wherever appropriate. For example, the lingvoj data already includes such a link for each language, linking to… the corresponding DBpedia URI.)
 M. Hepp, K. Siorpaes, and D. Bachlechner, “Harvesting Wiki Consensus Using Wikipedia Entries as Vocabulary for Knowledge Management,” IEEE Internet Computing, vol. 11, pp. 54-65, 2007. Also
available on-line at Martin’s site.