This has been in the works for a while, but it is done now: the latest (3.4.0 version) of the python RDFLib library has just been released, and it includes and RDFa 1.1, microdata, and turtle-in-HTML parser. In other words, the user can add structured data to an HTML file, and that will be parsed into RDF and added to an RDFLib Graph structure. This is a significant step, and thanks to Gunnar Aastrand Grimnes, who helped me adding those parsers into the main distribution.
I have written a blog last summer on some of the technical details of those parsers; although there has been updates since then, essentially following the minor changes that the RDFa Working has defined for RDFa, as well as changes/updates on the microdata->RDF algorithm, the general approach described in that blog remains valid, and it is not necessary to repeat it here. For further details on these different formats, some of the useful links are:
- For RDFa, there is a new version of an RDFa 1.1 Primer in preparation. It is probably worth keeping an eye on the editor’s draft of the primer. The primer has the links to the official recommendations if one wants to look up the gory details. Alternatively, look at the RDFa community page!
- For microdata, the official specification is of course available; the conversion to RDF is the subject of a separate W3C Note.
- For turtle-in-HTML, you can look at the latest version of the Turtle spec.