Ivan’s private site

January 4, 2014

Data vs. Publishing: my change of responsibilities…

Fairly Lake Botanical Garden, Shenzhen, China

There was an official announcement, as well as some references, on the fact that the structure of data related work has changed at W3C. A new activity has been created called “Data Activity”, that subsumes what used to be called the Semantic Web Activity. “Subsumes is an important term here: W3C does not abandon the Semantic Web work (I emphasize that because I did get such reactions); instead, the existing and possible future work is simply continuing within a new structure. The renaming is simply a sign that W3C has also to pay attention to the fact that there are many different data formats used on the Web, not all of them follow the principles and technologies of the Semantic Web, and those other formats and approaches also have technological and standardization needs that W3C might be in position to help with. It is not the purpose of this blog, however, to look at the details; the interested reader may consult the official announcements (or consider Tim Finin’s formula: Data Activity  ⊃ Semantic Web  ∪  eGovernment 🙂

There is a much less important but more personal aspect of the change, though: I will not be the leader of this new Data Activity (my colleague and friend, Phil Archer, will do that). Before anybody tries to find some complicated explanation (e.g., that I was fired): the reason is much more simple. About a year ago I got interested by a fairly different area, namely Digital Publishing. What used to be, back then, a so-called “headlight” project at W3C, i.e., an exploration into a new area, turned into an Activity on its own, with me as the lead, last summer. There is a good reason for that: after all, digital publishing (e.g., e-books) may represent one of the largest usage areas of the core W3C technologies (i.e., HTML5, CSS, or SVG) right after browsers; indeed, for those of you who do not realize that (I did not know that just a year and a half ago either…) an e-book is “just” a frozen and packaged Web site, using many of the technologies defined by W3C. A major user area, thus, but whose requirements may be special and not yet properly represented at W3C. Hence the new Activity.

However, this development at W3C had its price for me: I had to choose. Heading both the Digital Publishing and the Data Activities was not an option. I have lead W3C’s Semantic Web Activity for cca. 7 years; 7 years that were rich in events and results (the forward march of Linked Open Data, a much more general presence and acceptation of the technology, specifications like OWL 2, RDFa, RDB2RDF, PROV, SKOS, SPARQL 1.1, with RDF 1.1 just around the corner now…). I had my role in many of these, although I was merely a coordinator for the work done by other amazing individuals. But, I had to choose, and I decided to go towards new horizons (in view of my age, probably for the last time in my professional life); hence my choice for Digital Publishing. As simple as that…

But this does not mean I am completely “out”. First of all, I will still actively participate in some of the data activity groups (e.g., in the “CSV on the Web WG”), and have a continuing interest in many of the issues there. But, maybe more importantly, there are some major overlapping areas between Digital Publishing and Data on the Web. For example, publishing also means scientific, scholarly publishing, and this particular area is increasingly aware of the fact that publishing data, as part of reporting of a particular scientific endeavor, becomes as important as publishing a traditional paper. And this raises tons of issues on data formats, linked data, metadata, access, provenance, etc. Another example: the traditional publishing industry makes an increasingly heavy usage of metadata. There is a recognition among publishers that a well chosen and well curated defined metadata for books is a major business asset that may make a publication win or loose. There are many (overlapping…) vocabularies and relationships to libraries, archival facilities, etc., come to the fore. Via this metadata the world of publishing may become a major player of the Linked Data cloud. A final example may be annotation: while many aspects of the annotation work is inherently bound to Semantic Web (see, e.g., the work W3C Community Group on Annotation), it is also considered to be one of the most important areas for future development in, say, the educational publishing area.

I can, hopefully, contribute to these overlapping areas with my experience from the Semantic Web. So no, I am not entirely gone, just changed hats! Or, as on the picture, acting (also) as a bridge…


March 16, 2013

Multilingual Linked Open Data?

Filed under: Semantic Web,Work Related — Ivan Herman @ 14:13
Tags: , , ,

Logo of the EU Multilingual Web ProjectExperts developing Web sites for various cultures and languages know that it is way better to include such features into Web pages at the start, i.e., at the time of the core design, rather than to “add” them once the site is done. What is valid for Web sites is also valid for data deployed on the Web, and that is especially true for Linked Data whose mantra is to combine data and datasets from all over the place.

Why do I say all this? I had the pleasure to participate, earlier this week, at the MultilingualWeb Workshop in Rome, Italy. One of the topics of the workshop was Linked (Open) Data and its multilingual (and, also, multicultural) aspects. There were a number of presentations at a dedicated session (the presentations are online, linked from the Workshop Page; just scroll down and look for a session entitled “Machines”), and there was also a separate break-out session (the slides are not yet on-line, but they should be soon). There are also a number of interesting projects and issues in this area beyond those presented at the event; for example, the lemon model or the (related) Monnet EU project as examples.

All these projects are great. However, the overall situation in the Linked Data world is, in this respect, not that great, at least in my view. If one looks at the various Linked Data (or Semantic Web) related mailing lists, discussion fora, workshops, etc, multilingual or multicultural issues are almost never discussed. I did not make any systematic analysis of the various datasets on the LOD cloud, but I have the impression that only a few of them are prepared for multilingual use (e.g., by providing alternative labels and other metadata in different languages). URI-s are defined in English, most of the vocabularies we use are documented in only one language; they may be hard to use for non-English speakers. Worse, vocabularies may not even be properly prepared for multicultural use (just consider the complexity of personal names which is hardly ever properly reflected in vocabularies). And this is where we hit the same problem as for Web sites; with all its successes we are still at the beginning of the deployment of Linked Data: our community should have much more frequent discussions on how to handle this issue now, because after a while it may be too late.

B.t.w., one of the outcomes of the break-out session at the Workshop was that a W3C Community Group should be created soon to produce some best practices for Multilingual Linked Open Data. There is already some work done in the area, look at the page set up by José Emilio Labra Gayo, Dimitris Kontokostas, and Sören Auer; this may very well be the starting point. Watch this space!

It is hard. But it will be harder if we miss this particular boat.

November 26, 2012

Nice RDFa 1.1 example…

Filed under: Semantic Web,Work Related — Ivan Herman @ 23:20
Tags: , , ,

Cover page for Ghosh's novel, the Sea of PoppiesI know I had seen that before, but I ran into this again: the WorldCat.org site (a must for book lovers…) has a nice structure using RDFa 1.1. Let us take an example page for a book, say, one of the latest books of Amitav Ghosh, the “Sea of poppies”. The page itself has all kinds of data; what is interesting here is that the formal, bibliographical data is also encoded in RDFa 1.1. Running, for example, an RDF distiller on the page you get the bibliographical data. Here is an excerpt in JSON-LD):

    "@context": {
        "library": "http://purl.org/library/", 
        "oclc": "http://www.worldcat.org/oclc/", 
        "skos": "http://www.w3.org/2004/02/skos/core#", 
        "schema": "http://schema.org/", 
        . . .
    "@id": "oclc:216941700", 
    "@type": "schema:Book", 
    "schema:about": [
            "@id": "http://id.worldcat.org/fast/1122346", 
            "@type": "skos:Concept", 
            "schema:name": {
                "@value": "Social classes‍", 
                "@language": "en"
        . . .
    "schema:bookEdition": {
        "@value": "1st American ed.", 
        "@language": "en"
    "schema:inLanguage": {
        "@value": "en", 
        "@language": "en"
    "library:placeOfPublication": {
        "@type": "schema:Place", 
        "schema:name": {
            "@value": "New York :", 
            "@language": "en"
    . . .

Note that WorldCat.org uses the schema.org vocabulary, where appropriate, but mixes it with a number of other vocabularies; exactly where the power of RDFa lies! Great for bibliographic applications that can use this type of data, possibly mixed with data coming from other libraries…

By the way, I was reminded to look at the site by a recent document just published by the Library of Congress: “Bibliographic Framework as a Web of Data: Linked Data Model and Supporting Services”. It is still a draft, and there are quite some discussions around it in the library community, but the overall picture is what counts: the library community may (let us be optimistic: will!) become one of the major actors in the Linked Data world, as well as users of structured data on the Web, most probably RDFa. Yay!

April 17, 2012

Linked Data on the Web Workshop, Lyon

(See the Workshop’s home page for details.)

The LDOW20** series have become more than workshops; they are really a small conferences. I did not count the number of participants (the meeting room had a fairly odd shape which made it a bit difficult) but I think it was largely over a hundred. Nice to see…

The usual caveat applies for my notes below: I am selective here with some papers which is no judgement on any other paper at the workshop. These are just some of my thoughts jotted down…

Giuseppe Rizzo made a presentation related to all the tools we know have to tag texts and thereby being able to use these resources in linked data (“NERD meets NIF: Lifting NLP Extraction Results to the Linked Data Cloud”), i.e., the Zemanta or Open Calais services of this World. As these services become more and more important, having a clear view of what they can do, how one can use them individually or together, etc., is essential. Their project, called NERD, will become an important source for this community, bookmark that page:-)

Jun Zhao made a presentation (“Towards Interoperable Provenance Publication on the Linked Data Web”) essentially on the work of the W3C Provenance Working Group. I was pleased to see and listen to this presentation: I believe the outcome of that group is very important for this community and, having played a role in the creation of that group, I am anxious to see it succeed. B.t.w., a new round of publication coming from that group should happen very soon, watch the news…

Another presentation, namely Arnaud Le Hors’ on “Using read/write Linked Data for Application Integration — Towards a Linked Data Basic Profile” was also closely related to W3C work. Arnaud and his colleagues (at IBM) came to this community after a long journey working on application integration; think, e.g., of systems managing software updates and error management. These systems are fundamentally data oriented and IBM has embarked into a Linked Data based approach (after having tried others). The particularity of this approach is to stay very “low” level, insofar as they use only basic HTTP protocol reading and writing RDF data. This approach seems to strike chord at a number of other companies (Elsevier, EMC, Oracle, Nokia) and their work form the basis of a new W3C Working Group that should be started this coming summer. This work may become a significant element of palette of technologies around Linked Data.

Luca Costabello talked about Access Control, Linked Data, and Mobile (“Linked Data Access Goes Mobile: Context-Aware Authorization for Graph Stores”). Although Luca emphasized that their solution is not a complete solution for Linked Data access control issues in general, it may become an important contribution in that area nevertheless. Their approach is to modify SPARQL queries “on-the-fly” by including access control clauses; for that purpose, an access control ontology (S4AC) has been developed and used. One issue is: how would that work with a purely HTTP level read/write Linked Data Web, like the one Arnaud is talking about? Answer: we do not know yet:-)

Igor Popov concentrated on user interface issues (“Interacting with the Web of Data through a Web of Inter-connected Lenses”): how to develop a framework whereby data-oriented applications can cooperate quickly, so that lambda users could explore data, switching easily to applications that are well adapted to a particular dataset, and without being forced to use complicated programming or use too “geeky” tools. This is still an alpha level work, but their site-in-development, called Mashpoint is a place to watch. There are (still) not enough work on user-facing data exploration tools, I was pleased to see this one…

What is the dynamics of Linked Data? How does it change? This is the question Tobias Käfer and his friends try to answer in future (“Towards a Dynamic Linked Data Observatory”). For that, data is necessary, and Tobias’ presentation was on how to determine what collection of resources to regularly watch and measure. The plan is to produce a snapshot of the data once a week for a year; the hope is that based on this collected data we will learn more about the overall evolution of linked data. I am really curious to see the results of that. One more reason to be at LDOW2013:-)

Tobias’ presentation has an important connection to the last presentation of the day, made by Axel Polleres (OWL: Yet to arrive on the Web of Data?) insofar as what he presented was based on the analysis of the Linked Data out there. The issue has been around, with lots of controversy, for a while: what level of OWL should/could be used for Linked Data? OWL 2 as a whole seems to be too complex for the amount of data we are talking about, both in terms of program efficiency and in terms of conceptually complexity for end users. OWL 2 has defined a much simpler profile, called OWL 2 RL, which does have some traction but may be still too complex, e.g., for implementations. Axel and his friends analyzed the usage of OWL statements out there, and also established some criteria on what type of rules should be used to make OWL processing really efficient; their result is another profile called OWL LD. It is largely a subset of OWL 2 RL, though it does adopt some datatypes that OWL 2 RL does not have.

There are some features that are left out of OWL 2 RL which I am not fully convinced of; after all their measurement was based on data in 2011, and it is difficult to say how much time it takes for new OWL 2 features to really catch up. I think that keys and property chains should/could be really useful on the Linked Data, and can be managed by rule engines, too. So the jury is still out on this, but it would be good to find a way to stabilize this at some point and see the LD crowd look at OWL (i.e., the subset of OWL) more positively. Of course, another approach would be to concentrate on an easy way to encode Rules into RDF which might make this discussion moot in a certain sense; one of the things we have not succeeded to do yet:-(

The day ended by a panel, on which I also participated; I would let others judge whether the panel was good or not. However, the panel was preceded by a presentation of Chris on the current deployment of RDFa and microdata which was really interesting. (His slides will be on the workshop’s page soon.) The deployment of RDFa, microdata, and microformats has become really strong now; structured data in HTML is a well established approach out there. RDFa and microdata covers now half of the cases, the other half being microformats, which seems to indicate a clear shift towards RDFa/microdata, ie, a more syntax oriented approach (with a clear mapping to RDF). Microdata is used almost exclusively with schema.org vocabularies (which is to be expected) whereas RDFa makes use of a larger palette of various other vocabularies. All these were to be expected, but it is nice to see being reflected in collected data.

It was a great event. Chris, Tim, and Tom: thanks!

January 24, 2012

Nice reading on Semantic Search

I had a great time reading a paper on Semantic Search[1]. Although the paper is on the details of a specific Semantic Web search engine (DERI’s SWSE), I was reading it as somebody not really familiar with all the intricate details of such a search engine setup and operation (i.e., I would not dare to give an opinion on whether the choice taken by this group is better or worse than the ones taken by the developers of other engines) and wanting to gain a good image of what is happening in general. And, for that purpose, this paper was really interesting and instructive. It is long (cca. 50 pages), i.e., I did not even try to understand everything at my first reading, but it did give a great overall impression of what is going on.

One of the “associations” I had, maybe somewhat surprisingly, is with another paper I read lately, namely a report on basic profiles for Linked Data[2]. In that paper Nally et al. look at what “subsets” of current Semantic Web specifications could be defined, as “profiles”, for the purpose of publishing and using Linked Data. This was also a general topic at a W3C Workshop on Linked Data Patterns at the end of last year (see also the final report of the event) and it is not a secret that W3C is considering setting up a relevant Working Group in the near future. Well, the experiences of an engine like SWSE might come very handy here. For example, SWSE uses a subset of the OWL 2 RL Profile for inferencing; that may be a good input for a possible Linked Data profile (although the differences are really minor, if one looks at the appendix of the paper that lists the rule sets the engine uses). The idea of “Authoritative Reasoning” is also interesting and possibly relevant; that approach makes a lot of pragmatic sense, I wonder whether this is not something that should be, somehow, documented for a general use. And I am sure there are more: In general, analyzing the experiences of major Semantic Web search engines on handling Linked Data might provide a great set of input for such pragmatic work.

I was also wondering about a very different issue. A great deal of work had to be done in SWSE on the proper handling of owl:sameAs. On the other hand, one of the recurring discussions on various mailing list and elsewhere is on whether the usage of this property is semantically o.k. or not (see, e.g., [3]). A possible alternative would be to define (beyond owl:sameAs) a set of properties borrowed from the SKOS Recommendation, like closeMatch, exactMatch, broadMatch, etc. It is almost trivial to generalize these SKOS properties for the general case but, reading this paper, I was wondering: what effect would such predicates have on search? Would it make it more complicated or, in fact, would such predicates make the life of search engines easier by providing “hints” that could be used for the user interface? Or both? Or is it already too late, because the ubiquitous usage of owl:sameAs is already so prevalent that it is not worth touching that stuff? I do not have a clear answer at this moment…

Thanks to the authors!

  1. A. Hogan, et al., “€œSearching and Browsing Linked Data with SWSE: the Semantic Web Search Engine”€, Journal of Web Semantics, vol. 4, no. December, pp. 365-401, 2011.
  2. M. Nally and S. Speicher, “Toward a Basic Profile for Linked Data”, IBM developersWork, 2011.
  3. H. Halpin, et al. “When owl:sameAs Isn’t the Same: An Analysis of Identity in Linked Data”, Proceedings of the International Semantic Web Conference, pp. 305-320, 2010

November 7, 2011

November 2, 2011

Some notes on ISWC2011…

The 10th International Semantic Web Conference (ISWC2011) took place in Bonn last week. Others have already blogged on the conference in a more systematic way (see, for example, Juan Sequeda’s series on semanticweb.com); there is no reason to repeat that. Just a few more personal impression, with the obvious caveat that I may have missed interesting papers or presentations, and the ones I picked here are also the results of my personal bias… So, in no particular order:

Zhishi.me is the outcome of the work of a group from the APEX lab in Shanghai and Southeast University: it is, in some ways, the Chinese DBPedia. “In some ways” because it is actually a mixture of three different Chinese, community driven encyclopedia, namely the Chinese Wikipedia, Baidu Baike and Hudong Baike. I am not sure of the exact numbers, but the combined dataset is probably a bit bigger than DBpedia. The goal of Zhishi.me is to act as a “seed” and a hub for Chinese linked open data contributions, just like DBpedia did and does for the LOD in general.

It is great stuff indeed. I do have one concern (which, hopefully, is only a matter of presentation, i.e., may be a misunderstanding on my side). Although zhishi.me is linked to non-Chinese datasets (DBPedia and others), the paper talks about a “Chinese Linked Open Data (COLD)”, as if this was something different, something separate. As a non-English speaker myself I can fully appreciate the issues of language and culture differences but I would nevertheless hate to see the Chinese community develop a parallel LOD, instead of being an integral part of the the LOD as a whole. Again, I hope this is just a misunderstanding!

There were a number of ontology or RDF graph visualization presentations, for example from the University of Southampton team (“Connecting the Dots”), on the first results of an exploration done by a Magnus Stuhr and his friends in Norway, called LODWheel (the latter was actually at the COLD2011 Workshop), or another one from a mixed team, led by Enrico Motta, on a visualization plugin to the NeOn toolkit called KC-Viz. I have downloaded the latter, and have played a bit with it already, but I haven’t had the time to have a really informed conclusion on it yet. Nevertheless, KC-Viz was interesting for me for a different reason. The basic idea of the tool is to use some sort of an importance metric attached to each node in the class hierarchy and direct the visualization based on that metric. It was reminiscent to some work I did in my previous life on graph visualization, though the metric was different, the graph was only a tree, the visualization approach was different, but nevertheless, there was a similar feel to it… Gosh, that was a long time ago!

The paper of John Howse et al. on visualizing ontologies was also interesting. Interesting because different: the idea is a systematic usage of Euler diagrams to visualize class hierarchies combined with some sort of a visual language for the presentation of property restrictions. In my experience property restrictions is a very difficult (maybe the most difficult?) OWL concept to understand without a logic background; any tool, visual or otherwise, that helps teaching and explaining this can be very important. Whether John’s visual language is the one I am not sure yet, but it may well be. I will consider using it the next time I give a tutorial…

I was impressed by the paper of Gong Cheng and his friends from Nanjing, “Empirical Study of Vocabulary Relatedness…”. Analyzing the results of a search engine (in this case Falcons) to draw conclusion on the nature, the usage, the mutual relationship, etc., of vocabularies is very important indeed. We need empirical results, bound to real life usage. This is not the first work in this direction (see, for example, the work of Ghazvinia et al, from ISWC2009), but there is still much to do. Which reminds me of some much smaller scale work Giovanni, Péter and I didon determining the top vocabulary prefixes for the purpose of the RDFa 1.1 initial context (we used to call it default profile back then). I should probably try to talk to the Nanjing team to merge with their results!

I think the vision paper of Marcus Cobden and his friends (again at the COLD2011 Workshop) on a “Research Agenda for Linked Closed Data” is worth noting. Although not necessarily earthshaking, the fact that we can and we should speak about Linked Closed Data alongside Linked Open Data is important if we want the Semantic Web to be adopted and used by the enterprise world as well. One of the main issue, which is not really addressed frequently enough (although there have been some papers published here and there) is access control. Who has the right to access data? Who has the right to access a particular ontology or rule set that may lead to the deduction of new relationships? What are the licensing requirements, how do we express them? I do not think our community has a full answer to these. B.t.w., W3C organizes a Workshop concentrating on the enterprise usage of Linked Data in December…

Speaking about research agenda… I really liked Frank van Harmelen’s keynote on the second day of the conference. His approach was fresh, and the question he asked was different: essentially, after 10 or more years of research in the Semantic Web area, can we derive some “higher level” laws that describe and govern this area of research? I will not repeat all the laws that he proposed, it is better to look his Web with the HTML version of his slides. The ones that is worth repeating again and again are that “Factual knowledge is a graph”, “Terminological knowledge is a hierarchy”, and “Terminological knowledge is much smaller than the factual knowledge”. Why are these important? To quote from his keynote slides:

  1. traditionally, KR has focussed on small and very intricate sets of axioms: a bunch of universally quantified complex sentences
  2. but now it turns out that much of our knowledge comes in the form of very large but shallow sets of axioms.
  3. lots of the knowledge is in the ground facts, (not in the quantified formula’s)

Which is important to remember when planning future work and activities. “Reasoning”, usually, happens on a huge set of ground facts in a graph, with a shallow hierarchy of terminology…

I was a little bit disappointed by the Linked Science Workshop; probably because I had wrong expectations. I was expecting a workshop looking at how Linked Data in general can help in the renewal of the scientific publication process as a whole (a bit along the lines of the Force11 work on improving the future of scholarly communication). Instead, the workshop was more on how different scientific fields use linked data for their work. Somehow the event was unfocussed for me…

As in some previous years, I was again part of the jury for the Semantic Web Challenge. It was interesting how our own expectations have changed over the years. What was really a wow! a few years ago, has become so natural that we are not excited any more. Which is of course a good thing, it shows that the field is maturing further, but we may need some sort of a Semantic Web Super-Challenge to be really excited again. That being said, the winners of the challenge really did impressive works, I do not want to give the impression of being negative about them… It is just that I was missing that “Wow”.

Finally, I was at one session of the industrial track, which was a bit disappointing. If we wanted to to show the research community that the Semantic Web technologies are really used by industry, then the session did not really make a good job on that. With one exception, and a huge one at it: the presentation of Yahoo! (beware, the link is to a PowerPoint slidedeck). It seems that Yahoo! is building an internal infrastructure based on what they call “Web of Objects”, by regrouping pieces of knowledge in a graph-like fashion. By using internal vocabularies (superset of schema.org) and using the underlying graph infrastructure they aim at regrouping similar or identical knowledge pieces harvested on the Web. I am sure we will hear more about this.

Yes, it was a full week…

Enhanced by Zemanta

April 9, 2011

Announcement on rNews

Filed under: Semantic Web,Work Related — Ivan Herman @ 6:38
Tags: , ,
Semantic Web Bus / Bandwagon

Image by dullhunk via Flickr

A few days ago IPTC published a press release on rNews: “Standard draft for embedding metadata in online news”. This is, potentially, a huge thing for Linked Data and the Semantic Web. Without going into too much technical details (no reason to repeat what is on the IPTC pages on rNews, you can look it up there) what this means is that, potentially, all major online news services on the globe, from the Associated Press to the AFP, or from the New York Times to the Süddeutsche Zeitung, will have have their news items enriched with metadata, and this metadata will be expressed in RDFa. In other words, the news items will be usable, by extracting RDF, as part of any Semantic Web applications, can be mashed up with other types of data easily, etc. In short, news item will become a major part of the Semantic Web landscape with the extra specificity to be an extremely dynamic set of data that is renewed every day. That is exciting!

Of course, it will take some time to get there, but we should realize that IPTC is the major standard setting body in the news publishing world. I.e., rNews has a major chance to be largely adopted. It is time for the Semantic Web community to pay attention…

Enhanced by Zemanta

September 28, 2010

ICT2010 Event Brussels, 2nd day: eGov (#ict2010eu for twitter…)

The main event today, as far as I am concerned, was the Governmental Linked Data session that some of us organized under the auspices of the Open Knowledge Foundation. The idea was to talk about the goals, dreams, and problems of Governmental Linked Data to the non-initiated (and the non-converted:-). I believe (although one is never objective about one’s own child) that the session went really well. There were cca. 140 people in the audience which, frankly, exceeded my expectation. Josema gave a nice overview of his “dreams”, i.e., what are the goals and promises of this whole move; this was followed by Jonathan’s dreams that were, of course, largely identical to Josema’s, but he also gave some data and facts about what is happening in Europe these days (e.g., in the area of data catalogues). He also referred to the upcoming European data catalogue project (PublicData.eu) which will be a great asset when it comes. Jeni talked not only about her dreams but also some of the practical experiences in deploying that stuff; as somebody deeply involved in the UK governmental project, i.e., as a person in the trenches, so to say, Jeni was really a great person to talk about that. The fourth and last speaker was Andreas, showing some existing applications on linked governmental data, and also talking about his dream of an application that would, e.g., help in the discussion on problematic societal issues like the Stuttgart 21 project. (Actually, Andreas had the temerity of using the Internet for live demos; with the absolutely awful quality network at the conference I would not have dared to do so!) There was also a lively discussion and questions after the presentations, both as part of the official session as well as after it. It is difficult to say how many people we “reached”, of course, but I think we were successful in getting the idea of Governmental Linked Data more accepted by a wider audience. (B.t.w., there is also a page with all the slide references.) It was interesting that, later in the day, I had a chat with A colleague who claimed that by now the very idea of linked data, and of governmental linked data, is widely accepted by everybody as a way to go, though, of course, lots of details have to be fleshed out. I may not be so up-beat than he is, but, well, it may just be my usual pessimism…

Other than this session, I also listened to several session on the Future Internet. There is now a new funding round on this topic (with a deadline mid January), so it obviously drew quite some attention. In spite of the fact that it is quite difficult to grasp what this think is all about. The goals described by various speakers were putting an emphasis on the societal aspects of upcoming works, on trying to understand what the profound, societal consequences of the ubiquitous internet presence are, what social changes will that bring, how can we understand, via interdisciplinary work, the evolutions, etc. These are all really exciting questions although also very difficult. What bothered me a little bit that all this sounded very familiar: it was the same set of goals outlined by the Web Science Initiative, these days Web Science Trust: just make a global change of “internet” to “Web”, and you got the same! This was all the more disturbing that, when asked about other organizations doing similar work, the representative of the Commission referred to “a UK project called Web Science Initiative, you know, started by Wendy Hall and Tim Berners-Lee…”, i.e., they completely missed the fact that WST is not a UK thing… Missing communications here?

I ranted yesterday on some of the oddities of the conference organization. Sorry, I have to add some more: we (the organizers of the session) sent them the detailed program of the session a few weeks ago. They did put it up on the Web in… Microsoft Word format. What would have costed them to convert that at least into PDF (or ask us to do it, if necessary), let alone turning it into HTML. At a time when everybody is talking about mobile devices and mobile internet, putting up a piece of information that no mobile phone, for example, can read… (B.t.w., they distributed the program of the conference on a USB stick, which is fine, but with a bunch of programs running on Windows only… When will such organizers learn that there are people out there using Linux or a Mac? Sigh…)

B.t.w.: if you have not realized yet, the #ict2010eu twitter feed contains a huge number of entries, a bunch of them are related to our session…

July 12, 2010

Experiences of LOD publication

Filed under: Semantic Web,Work Related — Ivan Herman @ 10:39
Tags: , ,

Frank van Harmelen’s tweet drew my attention on a paper of Jan Hannemann and Jürgen Kett on Linked Data for Libraries. I hope Jan and Jürgen will not be upset if I copy some quotes from their paper, but I thought that giving more publicity to some of their experiences in deploying linked data at the German National Library is worthwhile. Reproduced here without change though somewhat shortened:

  • Setting up a service is not trivial. […] the essential software solutions (tools) involved have not reached full maturity yet. […] documentation may be lacking the required depth. […] multiple software components need to be setup to work together  […] which requires appropriate expertise.[…]
  • Data modeling can be complex. When publishing data on the web, it is advantageous to use existing, registered ontologies. Unfortunately, these ontologies do not always match the data representation of each individual library […] the definitions of individual properties can vary considerably. […] There is no simple answer to the question which is the right thing to do.[…]
  • Open data exchange mentality does not exist everywhere. Even before linked data, libraries have exchanged and aligned their data sets. The results of such projects could be prime information sources for connecting linked data sets. Sadly, not all institutions involved share the open exchange mentality, and shared ownership may make it difficult to publish these results.
  • Best practices are seen as rules. Linked open data is based largely on best practices rather than rules. However, this pragmatic aspect is not seen as essential in all areas of the linked data community. Deviations from perceived standards tend to be criticized, which can cause institutions new to the semantic web to doubt their decisions – even if they make sense for the organization in question. Libraries should not be deterred by such feedback and rather see this as a motivation to contribute their own experiences and knowledge to the community. Guidelines and best practices should be carefully considered in the context of each institution’s needs, especially in this early forming phase of the semantic cloud.[…]
  • Properly modeled data is very useful. Once the data modeling is completed and the data made available, it can be used by others. A colleague at the Technical University of Braunschweig has shown that with properly modeled data, this can result in very useful applications: within a day, he imported our data into a database, added a web interface and had thus created a searchable access to our data.
Next Page »

Blog at WordPress.com.