Ivan’s private site

January 18, 2014

Some W3C Documents in EPUB3

Filed under: Code,Digital Publishing,Python,Work Related — Ivan Herman @ 13:04
Tags: , ,

I have been having fun the past few months, when I had some time, with a tool to convert official W3C publications (primarily Recommendations) into EPUB3. Apart from the fact that this helped me to dive into some details of the EPUB3 Specification, I think the result might actually be useful. Indeed, it often happens that a W3C Recommendation consists, in fact, of several different publications. This means that just archiving one single file is not enough if, for example, you want to have those documents off line. On the other hand, EPUB3 is perfect for this; one creates an eBook contains all constituent publications as “chapters”. Yep, EPUB3 as complex archiving tool:-)

The Python tool (which is available in github) has now reached a fairly stable state, and it works well for documents that have been produced by Robin Berjon’s great respec tool. I have generated, and put up on the Web, two books for now:

  1. RDFa 1.1, a Recommendation that was published last August (in fact, there was an earlier version of an RDFa 1.1. EPUB book, but that was done largely manually; this one is much better).
  2. JSON-LD, a Recommendation published this week (i.e., 16th of January).

(Needless to say, these books have no formal standing; the authoritative versions are the official documents published as a W3C Technical Report.)

There is also draft version for a much larger book on RDF1.1, consisting of all the RDF 1.1 specifications to come, including all the various serializations (including RDFa and JSON-LD). I say “draft”, because those documents are not yet final (i.e., not yet Recommendations); a final version (with, for example, all the cross-links properly set) will be at that URI when RDF 1.1 becomes a Recommendations (probably in February).


January 4, 2014

Data vs. Publishing: my change of responsibilities…

Fairly Lake Botanical Garden, Shenzhen, China

There was an official announcement, as well as some references, on the fact that the structure of data related work has changed at W3C. A new activity has been created called “Data Activity”, that subsumes what used to be called the Semantic Web Activity. “Subsumes is an important term here: W3C does not abandon the Semantic Web work (I emphasize that because I did get such reactions); instead, the existing and possible future work is simply continuing within a new structure. The renaming is simply a sign that W3C has also to pay attention to the fact that there are many different data formats used on the Web, not all of them follow the principles and technologies of the Semantic Web, and those other formats and approaches also have technological and standardization needs that W3C might be in position to help with. It is not the purpose of this blog, however, to look at the details; the interested reader may consult the official announcements (or consider Tim Finin’s formula: Data Activity  ⊃ Semantic Web  ∪  eGovernment 🙂

There is a much less important but more personal aspect of the change, though: I will not be the leader of this new Data Activity (my colleague and friend, Phil Archer, will do that). Before anybody tries to find some complicated explanation (e.g., that I was fired): the reason is much more simple. About a year ago I got interested by a fairly different area, namely Digital Publishing. What used to be, back then, a so-called “headlight” project at W3C, i.e., an exploration into a new area, turned into an Activity on its own, with me as the lead, last summer. There is a good reason for that: after all, digital publishing (e.g., e-books) may represent one of the largest usage areas of the core W3C technologies (i.e., HTML5, CSS, or SVG) right after browsers; indeed, for those of you who do not realize that (I did not know that just a year and a half ago either…) an e-book is “just” a frozen and packaged Web site, using many of the technologies defined by W3C. A major user area, thus, but whose requirements may be special and not yet properly represented at W3C. Hence the new Activity.

However, this development at W3C had its price for me: I had to choose. Heading both the Digital Publishing and the Data Activities was not an option. I have lead W3C’s Semantic Web Activity for cca. 7 years; 7 years that were rich in events and results (the forward march of Linked Open Data, a much more general presence and acceptation of the technology, specifications like OWL 2, RDFa, RDB2RDF, PROV, SKOS, SPARQL 1.1, with RDF 1.1 just around the corner now…). I had my role in many of these, although I was merely a coordinator for the work done by other amazing individuals. But, I had to choose, and I decided to go towards new horizons (in view of my age, probably for the last time in my professional life); hence my choice for Digital Publishing. As simple as that…

But this does not mean I am completely “out”. First of all, I will still actively participate in some of the data activity groups (e.g., in the “CSV on the Web WG”), and have a continuing interest in many of the issues there. But, maybe more importantly, there are some major overlapping areas between Digital Publishing and Data on the Web. For example, publishing also means scientific, scholarly publishing, and this particular area is increasingly aware of the fact that publishing data, as part of reporting of a particular scientific endeavor, becomes as important as publishing a traditional paper. And this raises tons of issues on data formats, linked data, metadata, access, provenance, etc. Another example: the traditional publishing industry makes an increasingly heavy usage of metadata. There is a recognition among publishers that a well chosen and well curated defined metadata for books is a major business asset that may make a publication win or loose. There are many (overlapping…) vocabularies and relationships to libraries, archival facilities, etc., come to the fore. Via this metadata the world of publishing may become a major player of the Linked Data cloud. A final example may be annotation: while many aspects of the annotation work is inherently bound to Semantic Web (see, e.g., the work W3C Community Group on Annotation), it is also considered to be one of the most important areas for future development in, say, the educational publishing area.

I can, hopefully, contribute to these overlapping areas with my experience from the Semantic Web. So no, I am not entirely gone, just changed hats! Or, as on the picture, acting (also) as a bridge…

August 11, 2013

The value of community driven content (OSM vs. Google Map)

Filed under: Links,Social aspects,Work Related — Ivan Herman @ 12:51

Notre dame de la gardeThis is just a nice little example which might be worth noting for those who do not know Open Street Map (I am also a relatively new user of it).

I had a nice walk in Marseille yesterday, which included going down from the big cathedral on the top of the hill (“Notre Dame de La Garde“) to the seaside. There is a not-very-well-known path behind the church that one can take which is, for my taste anyway, a gorgeous way of doing it.

The path of course appears on Google’s Map: look at the small path going from the church to the “Rue du Bois Sacré”. However: look at the same area using Open Street Map: not only is the path there, but it gives a bunch of details. Indeed, it is not really a simple path: it is a long series of steps, i.e., do not try to drive even a bike there:-( And because it is a hot city, it is also good to know that there is small public fountain along the path (and, indeed, it is there and it works!)…

It is not really Google’s fault. They probably got the material from some sort of an official mapping system (they could not get their camera cars or bikes up there…) and there is no way a company, even as huge as Google, can cover such details. But a community-driven site can: people can add such details easily. (Actually, there was part of the path that was missing, and I will add it soon using my GPS readings.) Therein lies the power:-)

March 16, 2013

Multilingual Linked Open Data?

Filed under: Semantic Web,Work Related — Ivan Herman @ 14:13
Tags: , , ,

Logo of the EU Multilingual Web ProjectExperts developing Web sites for various cultures and languages know that it is way better to include such features into Web pages at the start, i.e., at the time of the core design, rather than to “add” them once the site is done. What is valid for Web sites is also valid for data deployed on the Web, and that is especially true for Linked Data whose mantra is to combine data and datasets from all over the place.

Why do I say all this? I had the pleasure to participate, earlier this week, at the MultilingualWeb Workshop in Rome, Italy. One of the topics of the workshop was Linked (Open) Data and its multilingual (and, also, multicultural) aspects. There were a number of presentations at a dedicated session (the presentations are online, linked from the Workshop Page; just scroll down and look for a session entitled “Machines”), and there was also a separate break-out session (the slides are not yet on-line, but they should be soon). There are also a number of interesting projects and issues in this area beyond those presented at the event; for example, the lemon model or the (related) Monnet EU project as examples.

All these projects are great. However, the overall situation in the Linked Data world is, in this respect, not that great, at least in my view. If one looks at the various Linked Data (or Semantic Web) related mailing lists, discussion fora, workshops, etc, multilingual or multicultural issues are almost never discussed. I did not make any systematic analysis of the various datasets on the LOD cloud, but I have the impression that only a few of them are prepared for multilingual use (e.g., by providing alternative labels and other metadata in different languages). URI-s are defined in English, most of the vocabularies we use are documented in only one language; they may be hard to use for non-English speakers. Worse, vocabularies may not even be properly prepared for multicultural use (just consider the complexity of personal names which is hardly ever properly reflected in vocabularies). And this is where we hit the same problem as for Web sites; with all its successes we are still at the beginning of the deployment of Linked Data: our community should have much more frequent discussions on how to handle this issue now, because after a while it may be too late.

B.t.w., one of the outcomes of the break-out session at the Workshop was that a W3C Community Group should be created soon to produce some best practices for Multilingual Linked Open Data. There is already some work done in the area, look at the page set up by José Emilio Labra Gayo, Dimitris Kontokostas, and Sören Auer; this may very well be the starting point. Watch this space!

It is hard. But it will be harder if we miss this particular boat.

March 1, 2013

RDFa 1.1, microdata, and turtle-in-HTML now in the core distribution of RDFLib

This has been in the works for a while, but it is done now: the latest (3.4.0 version) of the python RDFLib library has just been released, and it includes and RDFa 1.1, microdata, and turtle-in-HTML parser. In other words, the user can add structured data to an HTML file, and that will be parsed into RDF and added to an RDFLib Graph structure. This is a significant step, and thanks to Gunnar Aastrand Grimnes, who helped me adding those parsers into the main distribution.

I have written a blog last summer on some of the technical details of those parsers; although there has been updates since then, essentially following the minor changes that the RDFa Working has defined for RDFa, as well as changes/updates on the microdata->RDF algorithm, the general approach described in that blog remains valid, and it is not necessary to repeat it here. For further details on these different formats, some of the useful links are:


February 22, 2013

Browsers and eBook Readers

Filed under: Digital Publishing,Work Related — Ivan Herman @ 23:02
Tags: , , ,
eBook Readers Galore

eBook Readers Galore (Photo credit: libraryman)

My last week was all around digital publishing: first, I was at the W3C Workshop on eBooks and the Open Web Platform, that I helped to organize. If I extrapolate from the discussions at the W3C Workshop, there are good prospects that this topic will become more important at the W3C, and that it will also keep us busy (in addition to my role on the Semantic Web). By the way, the minutes of the W3C Workshop (both for the 1stand the 2nd days) and the presentations are public; a somewhat more detailed workshop report should also be available soon.

The Workshop was followed by O’Reilly’s Tools of Change (TOC) conference: a first time for me. And it was extremely interesting to find myself in a new environment where I have never been before. I have seen some great keynotes (e.g., Mark Waid’s on “Reinventing Comics And Graphic Novels For Digital”, or Maria Popova’s, from Brain Pickings), learned a lot at some of the session (for example, at Bill Rosenblatt’s session on some of the legal aspects surrounding eBooks).

My interest in this whole area is, primarily, on how digital publishing in general, and electronic books in particular, relate to technologies developed at W3C. For those of you who may not realize that: if an electronic book uses the ePUB standard (and more and more books do) than the book is, in fact, a “frozen” Web site (depending on the ePUB version either based on XHTML1 or HTML5). Technically, it is a zip file containing all the files necessary to render the content, plus some ePUB specific files to manage table of content, to help readers to display the content even more quickly, etc. Actually, as far as I know, most of the ePUB readers are based on the same core technology as many of the Web browsers, namely Webkit). The strong relationship between publishing in general, and eBooks in particular, was emphasized several times at the conference, especially by the keynote of Jeff Jaffe, the CEO of W3C.

But then… if so, why do we need separate eBook readers, either in hardware or in software? (Let us put aside for now the issue of DRM, vendor lock-in, etc; these are of course reasons but let us hope the business will evolve towards a more open environment where those issues will be less relevant.) Do we really need a separate ePUB reader software on, say, my iPad, or should we simply rely on the browsers taking care of ePUB files either directly or through some extensions? (There is, for example, a project called Readium to add such capabilities to Chrome.) And the answer is not obvious, there are proponents of both approaches. My 2 cents here is: it is not a core technology issue, but a user experience and interface one. Reading a book, electronic or otherwise, is a different intellectual activity than an average Web page. Here are some differences that I feel are important, and I am sure there are more, much more:

  • A book must be available off-line; this is, actually, its natural state. This difference is obvious, but worth noting: for example, the user interface for books has to be able to list what is and what is not available at a given moment (all readers have some sort of an imitation of a traditional bookshelf).
  • The amount of “information” you want to absorb is different. A typical Web page is not terribly long; even the more detailed Wikipedia articles, when printed, are rarely longer than 4-5 pages. Compare that to an average book that may be hundreds or even thousands of pages. What this mean in practice is that, whereas a Web page is usually read, understood, “absorbed” in one go, reading a single book may take several days or weeks. This has all kinds of consequences on how one navigates, uses traditional bookmarks (not the ones browsers usually provide, i.e., to store URL-s, but what used to be bookmarks in the past), tables of content, indexes, glossaries, etc. These features are essential for books but much less so for an average Web page.
  • Modern Web pages have more and more interactive features, they are related to various social sites like Twitter or Facebook; very often these pages are Web applications with very complex features (think of gmail, for example). Obviously, browsers have to be prepared for a high level of interactivity and have to be optimized to offer an optimal user experience. Books are much less interactive. Although newer generations of books may include some level of interactivity, and these are important for, say, the educational book markets or for children’s books, but it is a far cry compared to what Web sites do. Also, some readers (like Kobo’s) try to include some level of Social Web facilities (sharing information about books with friends, that sort of things); to be honest, I never found those social features interesting or important (o.k., I may just be old-skool). Reading a book for me remains a linear reading activity, whether it is a fiction, poetry, history, or politics. I want my eBook reader to optimize on that, and avoid distractions.
  • There are some features that a good eBook reader should offer and browsers do not traditionally do. A prime example is annotation facilities. Many people like to scribble on their books, underline full sentences, highlight words; I still have not found any tools to do that properly in a Web browser, although all the eBook readers that I have tested so far have such functionality. This is a typical user interface difference that comes from different demands. (Another example that comes to my mind is a quick on access to a dictionary, to an encyclopedia, etc.)
  • Some sort of a payment/right management system must be part of the reader. I personally consider the current DRM system, as used in the eBook world as fundamentally broken insofar as may drive people away from this market. However, I recognise that something should be available that allows authors of books to get some reward for their work. Whether that is some sort of a watermarking, social DRM, or whatever, I do not know, but something is needed, and the reader environment has to handle this.

I realize, of course, that this is a continuum: with ePUB3 we have the ability to make eBooks much more interactive, possibly with scripts, multimedia, etc.; in effect, electronic books are becoming more and more like Web applications. I.e., some of these differences may disappear or become less important. Nevertheless, I believe there will always be a difference in user expectations, in the emphasis that a software (or hardware) may have. eBook readers are not browsers, although electronic books are, in fact, part of the Web just like other types of Web contents. Is it a sign that we may need a more diverse landscape of accessing the Web than we have today?

December 24, 2012

Mountain Lion Installation woes…

Filed under: Links,Mac,Work Related — Ivan Herman @ 15:38
Tags: , , , ,
Cougar / Puma / Mountain Lion / Panther (Puma ...

Mountain Lion. Philadelphia Zoo (Photo credit: Wikipedia)

It is December and, just as last year, it is the time for an upgrade of OS X. Last year it was Lion (and I did write down my experiences back then); this time it is Mountain Lion. I decided to make a short note of my experiences because, maybe, by sharing those I will save some time and energy to somebody else. In general, I have not hit any major issues, I must say, just nuisances, but it did take me some time to get around those…

1. The installation process itself was fairly straightforward except that… it was nerve wrecking some times. While installing, the screen duly had a progress bar with a text underneath, saying something like “the remaining time is 25 minutes”, “the remaining time is 5 minutes”, “the remaining time is less than a minute”, then… it stuck. Stuck for a long time. Nothing moved, the progress bar was full. And then an even stranger thing happened: it said something like “the remaining time is -20 minutes”. WTF? Because I have experienced quite some crashes in the 30 years that I am in this business, of course I got nervous. Should I reboot? What will happen then? Is my disk fully destroyed now?

Luckily, I had the instinct not to do anything but take my iPad and look up the Web. And sure thing: there are reports elsewhere saying that the progress bar implementation of the installer, including the time estimate, is buggy, and that I should just wait and things would turn out to be all right. And they did indeed, after around 30 extra minutes. Phew!

2. Everything installed, get to login… and it seems that there is still some installation and/or file adaptation to do at that time, because it took about 4-5 minutes after having typed in my password before any of my windows showed up. Again, WTF? I became wiser, and just waited, and things got back to normal. Note that, since then, everything is fine when I wake up the machine, although I have not rebooted it yet to see if a login would again lead to such a delay.

3. I knew that, in Mountain Lion, Apple decided to remove the simple system preference flag to start up a local apache automatically (having the local apache running is essential for me: I have a partial copy of a Web site on my machine to test pages before they go public). Although I never understood why this decision had been taken, I was prepared; there are a number of sites giving advice on what to do (e.g., the one I looked at), as well as an extra small preference that one can install.

What I did not count on is that that the installation would wipe out the old apache configuration file (i.e. /etc/apache2/httpd.conf). (I do not think the Lion installation did that, at least I do not remember.) To make things even more difficult, that director is not accessible through the time machine (why?) so I had to reproduce my changes. It took me a certain time because I adapted that file for my needs three years ago and I forgot all about it, of course. Advise: make a copy of that file before upgrading!

4. I need some command line tools like gcc or cvs. That means I had to install a new version of Xcode; I counted on that. However… cvs was still not there after installation. Sigh… did they remove cvs as an obsolete tool? But no, gcc was not available either.

As usual, the Web and Google are your friends; I found a note with an explanation. It turns out that Apple no longer installs the “developer” command line tools by default. That includes compilers, make, cvs, and the like. You have to install them explicitly: start up Xcode, and then look for Xcode→Preferences→Downloads→Components and click on the install button next to the command line tools. (Again the same question: why this arbitrary decision?)

5. I was pleased to see that the Note application is now available, and is supposed to synchronise with the note application on my iPhone and iPad. I knew that, and I was looking forward to that. On Lion, the notes were bound to the email accounts and appeared in the Mail application; I always found that setup odd.

But… things are not that simple because Apple again made some unexplainable decision. On Lion, I could assign notes to the various email accounts I had, I could do the same on, say, my iPhone, and things worked properly. Not so in Mountain Lion; indeed (as I understood after some google-ing…) Apple has discontinued this synchronisation except for iCloud. Ie, you have to regroup all your notes under the iCloud account (if you have one, that is) to achieve a smooth synchronisation with your mobile devices. It is not that bad at the end, because you can define folders for notes that you can use those for your own categorisation; but, until I realised all that and got everything running, I again lost quite some time, had some dead ends, etc. Sigh…

6. I also had some small woes with the latest Safari. For reasons that again I do not understand, there is no more preference setup in Safari to set the right font size. The only way is to do that is through a CSS style sheet (see also a relevant note I found). Although my personal problem was that the default character size was way too big for my taste, as the author of the note rightfully said, not having the possibility to adapt the size easily can be a major accessibility issue for some.

Frankly… I love my Mac, and I still find it vastly superior in usability than other machines. It is, nevertheless, disappointing to see Apple making such arbitrary decisions and making the transition to a new system unnecessarily tedious. This should not happen.

(By the way, this just reinforced me in my selfish decision not to upgrade to a new system right away. Having waited half a year meant that all my issues were solved relatively easily by looking at notes published by others…)

November 26, 2012

Nice RDFa 1.1 example…

Filed under: Semantic Web,Work Related — Ivan Herman @ 23:20
Tags: , , ,

Cover page for Ghosh's novel, the Sea of PoppiesI know I had seen that before, but I ran into this again: the WorldCat.org site (a must for book lovers…) has a nice structure using RDFa 1.1. Let us take an example page for a book, say, one of the latest books of Amitav Ghosh, the “Sea of poppies”. The page itself has all kinds of data; what is interesting here is that the formal, bibliographical data is also encoded in RDFa 1.1. Running, for example, an RDF distiller on the page you get the bibliographical data. Here is an excerpt in JSON-LD):

    "@context": {
        "library": "http://purl.org/library/", 
        "oclc": "http://www.worldcat.org/oclc/", 
        "skos": "http://www.w3.org/2004/02/skos/core#", 
        "schema": "http://schema.org/", 
        . . .
    "@id": "oclc:216941700", 
    "@type": "schema:Book", 
    "schema:about": [
            "@id": "http://id.worldcat.org/fast/1122346", 
            "@type": "skos:Concept", 
            "schema:name": {
                "@value": "Social classes‍", 
                "@language": "en"
        . . .
    "schema:bookEdition": {
        "@value": "1st American ed.", 
        "@language": "en"
    "schema:inLanguage": {
        "@value": "en", 
        "@language": "en"
    "library:placeOfPublication": {
        "@type": "schema:Place", 
        "schema:name": {
            "@value": "New York :", 
            "@language": "en"
    . . .

Note that WorldCat.org uses the schema.org vocabulary, where appropriate, but mixes it with a number of other vocabularies; exactly where the power of RDFa lies! Great for bibliographic applications that can use this type of data, possibly mixed with data coming from other libraries…

By the way, I was reminded to look at the site by a recent document just published by the Library of Congress: “Bibliographic Framework as a Web of Data: Linked Data Model and Supporting Services”. It is still a draft, and there are quite some discussions around it in the library community, but the overall picture is what counts: the library community may (let us be optimistic: will!) become one of the major actors in the Linked Data world, as well as users of structured data on the Web, most probably RDFa. Yay!

November 6, 2012

RDFa 1.1 and microdata now part of the main branch of RDFLib

Filed under: Code,Python,Semantic Web,Work Related — Ivan Herman @ 21:34
Tags: , , , ,

A while ago I wrote of the fact that I have adapted my RDFa and microdata to RDFlib. Although I did some work on it since then, nothing really spectacular happened (e.g., I have updated the microdata part to the latest version of the microdata->RDF conversion note, and I have also gone through the tedious exercise to make the modules usable for Python3).

Nevertheless, a significant milestone has been reached now, but this was not done by me but rather by Gunnar Aastrand Grimnes, who “maintains” RDFlib: the separate branch for RDFa and microdata has now been merged with the master branch of RDFLib on github. So here we are; whenever the next official release of RDFLib comes, these parsers will be part of it…

August 31, 2012

RDFa, microdata, turtle-in-HTML, and RDFLib

For those of us programming in Python, RDFLib is certainly one of the RDF packages of choice. Several years ago, when I developed a distiller for RDFa 1.0, some good souls picked the code up and added it to RDFLib as one of the parser formats. However, years have gone by, and have seen the development of RDFa 1.1, of microdata, and also the specification of directly embedding Turtle into HTML. It is time to bring all these into RDFLib…

Some times ago I have developed both a new version of the RDFa distiller, adapted for the  1.1 RDFa standard, as well as a microdata to RDF distiller, based on the Interest Group note on converting microdata to RDF. Both of these were packages and applications on top of RDFLib. Which is fine because they can be used with the deployed RDFLib installations out there. But, ideally, these should be retrofitted into the core of RDFLib; I have used the last few quiet days of the vacation period in August to do just that (thanks to Niklas Lindström and Gunnar Grimes for some email discussion and helping me through the hooplas of RDFLib-on-github). The results are in a separate branch of the RDFLib github repository, under the name structured_data_parsers. Using these parsers here is what one can do:

g = Graph()
# parse an SVG+RDF 1.1 file an store the results in 'g':
g.parse(URI_of_SVG_file, format="rdfa1.1") 
# parse an HTML+microdata file an store the results in 'g':
g.parse(URI_of_HTML_file, format="microdata")
# parse an HTML file for any structured conent an store the results in 'g':
g.parse(URI_of_HTML_file, format="html")

The third option is interesting (thanks to Dan Brickley who suggested it): this will parse an HTML file for any structured data, let that be in microdata, RDFa 1.1, or in Turtle embedded in a <script type="text/turtle">...</script> tag.

The core of the RDFa 1.1 has gone through a very thorough testing, using the extensive test suite on rdfa.info. This is less true for microdata, because there is not yet an extensive test suite for that one yet (but the code is also simpler). On the other hand, any restructuring like that may introduce some extra bugs. I would very much appreciate if interested geeks in the community could install and test it, and forward me the bugs that are still undeniably there… Note that the microdata->RDF mapping specification may still undergo some changes in the coming few weeks/months (primarily catching up with some development around schema.org); I hope to adapt the code to the changes quickly.

I have also made some arbitrary decisions here, which are minor, but arbitrary nevertheless. Any feedback on those is welcome:

  • I decided not to remove the old, 1.0 parser from this branch. Although the new version of the RDFa 1.1 parser can switch into 1.0 mode if the necessary switches are in the code (e.g., @version or a RDFa 1.0 specific DTD), in the absence of those 1.1 will be used. As, unfortunately, 1.1 is not 100% backward compatible with 1.0, this may create some issues with deployed applications. This also means that the format="rdfa" argument will refer to 1.0 and not to 1.1. Am I too cautious here?
  • The format argument in parse can also hold media types. Some of those are fairly obvious: e.g., application/svg+xml will map on the new parser with RDFa 1.1, for example. But what should be the default mapping for text/html? At present, it maps to the “universal” extractor (i.e., extracting everything).

Of course, at some point, this branch will be merged with the main branch of RDFLib meaning that, eventually, this will be part of the core distribution. I cannot say at this point when this will happen, I am not involved in the day-to-day management of the RDFLib development.

I hope this will be useful…

Next Page »

Blog at WordPress.com.