Experts developing Web sites for various cultures and languages know that it is way better to include such features into Web pages at the start, i.e., at the time of the core design, rather than to “add” them once the site is done. What is valid for Web sites is also valid for data deployed on the Web, and that is especially true for Linked Data whose mantra is to combine data and datasets from all over the place.
Why do I say all this? I had the pleasure to participate, earlier this week, at the MultilingualWeb Workshop in Rome, Italy. One of the topics of the workshop was Linked (Open) Data and its multilingual (and, also, multicultural) aspects. There were a number of presentations at a dedicated session (the presentations are online, linked from the Workshop Page; just scroll down and look for a session entitled “Machines”), and there was also a separate break-out session (the slides are not yet on-line, but they should be soon). There are also a number of interesting projects and issues in this area beyond those presented at the event; for example, the lemon model or the (related) Monnet EU project as examples.
All these projects are great. However, the overall situation in the Linked Data world is, in this respect, not that great, at least in my view. If one looks at the various Linked Data (or Semantic Web) related mailing lists, discussion fora, workshops, etc, multilingual or multicultural issues are almost never discussed. I did not make any systematic analysis of the various datasets on the LOD cloud, but I have the impression that only a few of them are prepared for multilingual use (e.g., by providing alternative labels and other metadata in different languages). URI-s are defined in English, most of the vocabularies we use are documented in only one language; they may be hard to use for non-English speakers. Worse, vocabularies may not even be properly prepared for multicultural use (just consider the complexity of personal names which is hardly ever properly reflected in vocabularies). And this is where we hit the same problem as for Web sites; with all its successes we are still at the beginning of the deployment of Linked Data: our community should have much more frequent discussions on how to handle this issue now, because after a while it may be too late.
B.t.w., one of the outcomes of the break-out session at the Workshop was that a W3C Community Group should be created soon to produce some best practices for Multilingual Linked Open Data. There is already some work done in the area, look at the page set up by José Emilio Labra Gayo, Dimitris Kontokostas, and Sören Auer; this may very well be the starting point. Watch this space!
It is hard. But it will be harder if we miss this particular boat.