Ivan’s private site

November 9, 2008

Open Archive Initiative’s aggregation vocabulary

A few days ago I read through the OAI’s aggregation vocabulary (officially: “Object Reuse and Exchange”). One of those very targeted, relatively small RDF vocabularies that may, nevertheless, become important in practice. Instead of getting into a description in my own words, let me be lazy 🙂 and copy from the introduction of the ORE Primer:

In the physical world we create, use, and refer to aggregations of things all the time. We collect pictures in a photo album, read journals that are collections of articles, and burn CDs of our favorite songs. In this physical world these aggregations are frequently tangible – we can hold the photo album, journal, and CD […].

This practice of aggregating extends to the Web. We accumulate URL’s in bookmarks or favorites lists in our browser, collect photos into sets in popular sites like Flickr […]. Despite our frequent use of these aggregations, their existence on the Web is quite ephemeral. One reason for this is that there is no standard way to identify an aggregation. We often use the URI of one page of an aggregation to identify the whole aggregation. For example, we use the URI of the first page of a multi-page Web document to identify the whole document, or we use the URI of the HTML page that provides access to a Flickr set to identify the entire set of images. But those URIs really just identify those specific pages, and not the union of pages that makes up the whole document, or the union of all images in a Flickr set, respectively. In essence, the problem is that there is no standard way to describe the constituents or boundary of an aggregation, and this is what OAI-ORE aims to provide.

The ORE introduces therefore a strict separation between an Aggregation (which is supposed to be a non Informational Resource, in TAG speak) and a separate Resource Map which is an informational resource describing a specific Aggregation. Ie, the core idea of the ORE spec is to follow the Linked Data principles (and to be in line with the Cool URIs for the Semantic Web note) to describe abstract aggregations.

I tried out ORE in practice. A typical example is my talks. When I make a presentation (like the one I gave in Ghent a few months ago) I publish the slides in different formats (ODP, PDF, and HTML in this case). In this sense the talk itself, which is itself an “abstract” thing, is also an aggregation of the three specific slide sets. The following statements describe the situation in the ORE sense:

<http://www.w3.org/2008/Talks/0822-Ghent-IH/> a ore:ResourceMap ;
     dc:title "Detailed introduction into RDF and the Semantic Web" ;
     ore:describes  <http://www.w3.org/2008/Talks/0822-Ghent-IH/#talk>.

<http://www.w3.org/2008/Talks/0822-Ghent-IH/#talk> a cc:Work, ore:Aggregation ;
     cc:license  a ore:AggregatedResource ;
     ...
     ore:aggregates <http://www.w3.org/2008/Talks/0822-Ghent-IH/HTML/>,
                    <http://www.w3.org/2008/Talks/0822-Ghent-IH/Slides.odp>,
                    <http://www.w3.org/2008/Talks/0822-Ghent-IH/Slides.pdf> . 

<http://www.w3.org/2008/Talks/0822-Ghent-IH/Slides.pdf> a ore:AggregatedResource ;
     dc:format "application/pdf" .
...

The …08-22-Ghent/#talk is the URI for the “abstract” thing, ie, the talk. When dereferenced, that URI yields ...08-22-Ghent/ which is the “resource map” in ORE talk (and is an informational resource that returns, depending on the required format, HTML or RDF). The RDF above is actually encoded using RDFa, with Apache set up to deliver the different formats. The resource map portion is conveniently put into the header of the HTML file, and the body describes the real aggregation, ie, the talk. (The full RDF content encoded in the HTML file can be accessed directly either in RDF/XML or in Turtle; it contains additional information on the talk, not directly relevant for this blog.)

Time will tell whether this vocabulary will catch up; some of its design decisions can also lead to further discussions. But it certainly is an interesting and potentially important addition to the overall vocabulary landscape!

P.S.: the reason I used the example of my presentation in Ghent is because that is where I first heard about this vocabulary in more details thanks to a short tutorial given by Herbert Van de Sompel, from Los Alamos National Laboratory, one of the co-editors of the ORE spec.

Advertisements

Blog at WordPress.com.

%d bloggers like this: