Ivan’s private site

April 1, 2011

2nd Last Call for RDFa 1.1

Filed under: Semantic Web,Work Related — Ivan Herman @ 2:58
Tags: , ,

The W3C RDFa Working Group published a “Last Call” for RDFa 1.1 back at the end of October last year. This was meant to be a “feature freeze” version and was asking for public comments. Well, the group received quite a number of those. Lots of small things, requiring changes of the documents in many places to make them more precise even in various corner cases, and some more significant ones. In some ways, it shows that the W3C process works, ensuring quite an influence of the community on the final shape of the documents. Because of the many changes the group decided to re-issue a Last Call (yes, the jargon is a bit misleading here…), aimed at a last check before the document goes to its next phase on the road of becoming a standard. Almost all the changes are minor for users, though important for, e.g., implementers to ensure interoperability. “Almost all”, because there is one new and, I believe, very important though controversial new feature, namely the so-called default profiles.

I have already blogged about profiles when they were first published back in April last year. In short, profile documents provide an indirection mechanism to define prefixes and terms for an RDFa source: publishers may collect all the prefixes they deem important for a specific application and authors, instead of being required to define a whole set of prefixes in the RDFa file itself, can just refer to the profile file to have them all at their disposal. I think the profile feature was the feature stirring the biggest interest in the RDFa 1.1 work: they are undeniably useful, and undeniably controversial… Indeed, in theory at least, profiles represent yet another HTTP round when extracting RDF from and RDFa file, which is never a good thing. But a good caching mechanism or other implementation tricks can greatly alleviate the pain… (B.t.w., the group has also created some guidelines for profile publishers to help implementers.)

This draft goes one step further by introducing default profiles. These are profiles just like any other, but they are defined with fixed URI-s (namely http://www.w3.org/profile/rdfa-1.1 for RDFa 1.1 in general, and, additionally, http://www.w3.org/profile/html-rdfa-1.1 for the various HTML variants) and the user does not have to declare them in an RDFa source. Which means that a very simple HTML+RDFa file of the sort:

    <p about ="xsd:maxExclusive" rel="rdf:type" resource="owl:DatatypeProperty">
      An OWL Axiom: "xsd:maxExclusive" is a Datatype Property in OWL.

(note the missing prefix declarations!) will produce that RDF triple that you might expect. Can’t be simpler, can it?

Why? Why was it necessary to introduce this? Well, the experience shows that many HTML+RDFa authors forget to declare the prefixes. One can look, for example, at the pages that include Facebook’s Open Graph Protocol RDFa statements: although I do not have an exact numbers, I would suspect that around 50% of these pages do not have them. That means that, strictly speaking, those statements cannot be interpreted as RDF triples. The Semantic Web community may ask, try to convince, beg, etc., the HTML authors (or the underlying tools) to do “the right thing”, and we certainly should continue doing so, but we also have to accept this reality. A default profile mechanism can alleviate that, thereby greatly extending the amount of triples that can become part of a Web of Data. And even for seasoned RDF(a) users not having to declare anything for some of the common prefixes is a plus.

Of course, the big, nay, the BIG issue is: what prefixes and terms would those default profiles declare? What is the decision procedure? At this time, we do not have a final answer yet. It is quite obvious that all the vocabularies defined by W3C Recommendations and official Notes and that have a fixed prefix (most of them do) should be part of the list. We may want to add Member Submissions to this list. If you look at the default profile, these are already there in the first table (i.e., the code example above is safe). The HTML variant would add all the traditional @rel values, like license, next, previous, etc.

But what else? At the moment, the profiles include a set of prefixes and terms that are just there for testing purposes (although they do indicate a tendency), so do not take the default profile as the final content. For the HTML @rel values, we would, most probably, rely on any policy that the HTML5 Working Group will define eventually; the role of the HTML default profile will simply be to reflect those. That seems quite straightforward However, the issues of default prefixes is clearly different. For those, the Working Group is contemplating two different approaches

  1. Set up some sort of a registration mechanism, not unlike the xpointer registry. This would also include some accompanying mailing lists where objections can be raised against the inclusion of a specific prefix, etc.
  2. Try to get some information from search engines on the Semantic Web (Sindice, Yahoo!, anyone else?) that may provide with a list of, say, the top 20 prefixes as used on the Semantic Web. Such a list would reflect the real usage of vocabularies and prefixes. (We still have to see whether this is an information these engines can provide or not.)

At this moment it is not yet clear which way is realistic. Personally, I am more in favour of the second approach (if technically feasible), but the end result may be different; this is a policy that W3C will have to set up.

Apart from the content, another issue is the change mode and frequency of the default profile. First of all, the set of default prefixes can only grow. I.e., once a prefix has made it on the default profile, it has to stay there with an unchanged URI. That is obviously important to ensure stability. I.e., new prefixes coming to the fore by virtue of being used by the community can be added to the set, but no prefix can be removed. As for the frequency: a balance has to be found between stability, i.e., that RDFa processors can rely (e.g., for caching) on a not-too-frequent change of the default profiles, and relevance, i.e., that new vocabularies could find their way into the set of default prefixes. Again my personal feeling is that an update of the profiles once every 6 months, or even once a year, might strike a good balance here. To be decided.

As before, all comments are welcome but, again as before, I would prefer if you sent those comments to the RDFa WG’s mailing list rather than commenting this blog: public-rdfa-wg@w3.org (see also the archives).

Finally: I have worked on a new version of my RDFa distiller to include all the 1.1 features. This version of the distiller is now public, so you can try out the different new features. Of course, it is still not a final release, there are bugs, so…

Enhanced by Zemanta

Create a free website or blog at WordPress.com.

%d bloggers like this: