OWL 2 has just been published as a Proposed Recommendation (yay!) which means, in laymen’s term, that the technical work is done, and it is up to the membership of W3C to accept it as a full blown Recommendation.
As I already blogged before, I did some implementation work on a specific piece of OWL 2, namely the OWL 2 RL Profile. (I have also blogged about OWL 2 RL and its importance before, nothing to repeat here.) The implementation itself is not really optimized, and it would probably not stand a chance for any large scale deployment (the reader may want to look at the OWL 2 implementation report for other alternatives). But I can hope that the resulting service can be useful in getting a feel for what OWL 2 RL can give you: by just adding a few triples into the text box you can see what OWL 2 RL means. This is, by the way, an implementation of the OWL 2 RL rule set, which means that it can also accepts triples that are not mandated by the Direct Semantics of OWL 2 (a.k.a. OWL 2 DL). Put it another way, it is an implementation of a small portion of OWL 2 Full.
The core of my implementation turned out to be really easy straightforward: a forward chaining structure directly encoded in Python. I use RDFLib to handle the RDF triples and the triple store. Each triple in the RDF Graph is considered, compared to the premises of the rules; if there is a match then new triples are added to the Graph. (Well, most of the rules contain several triples to match with, and the usual approach is to pick one and explore the Graph deeper check against additional matches. Which one to pick is important, it may affect the overall speed, though.) If, through such a cycle, no additional triples are added to the Graph then we are done, the “deductive closure” of the Graph has been calculated. The rules of OWL 2 RL have been carefully chosen so that no new resources are added to the Graph (only new triples), ie, this process eventually stops.
The rules themselves are usually simple. Although it is possible and probably more efficient to encode the whole process using some sort of a rule engine (I know of implementations based on, eg, Jena’s rules or Jess), one can simply encode the rules using the usual conditional constructs of the programming language. The number of rules is relatively high but nothing that a good screen editor would not manage with copy-paste. There were only a few rules that required a somewhat more careful coding (usually to take care of lists) or many searches through the graph like, for examples, the rule for property chains (see rule prp-spo2 in the rule set). It is also important to note that the higher number of rules does really not affect the efficiency of the final system; if no triple matches a rule then, well, it just does not fire. No side effect of the mere existence of an unused rule.
So is it all easy and rosy? Not quite. First of all, this implementation is of course simplistic in so far as it generates all possible deducted triples that include a number of trivial triples (like
?x owl:sameAs ?x for all possible resources). That means that the resulting graph becomes fairly big even if the (optional) axiomatic triples are not added. If the OWL 2 RL process is bound to a query engine (eg, the new version of SPARQL will, hopefully, give a precise specification of what it means to have OWL 2 RL reasoning on the data set prior to a SPARQL query) then many of these trivial triples could be generated at query time only, thereby avoiding an extra load on the database. Well, that is one place where a proof-of-concept and simple implementation like mine looses against a more professional one:-)
The second issue was the contrast between RDF triples and “generalized” RDF triples, ie, triples where literals can appear in subject positions and bnodes can appear as properties. OWL 2 explicitly says that it works with generalized triples and the OWL 2 RL rule set also shows why that is necessary. Indeed, consider the following set of triples:
ex:X rdfs:subClassOf [ a owl:Restriction; owl:onProperty [ owl:inverseOf ex:p ]; owl:allValuesFrom ex:A ].
This is a fairly standard “idiom” even for simple ontologies; one wants to restrict, so to say, the subjects instead of the objects using an OWL property restriction. In other words that restriction combined with
ex:x rdf:type ex:X . ex:y ex:p ex:x .
ex:y rdf:type ex:A .
Well, this deduction would not occur through the rule set if non-generalized RDF triples were used. Indeed, the inverse of
ex:p is a blank node, ie, using it in a triple is not legal; but using that blank node to denote a property is necessary for the full chain of deductions. In other words, to get that deduction to work properly using RDF and rules, the author of the vocabulary would have to give an explicit URI to the inverse of
ex:p. Possible, but slightly unnatural. If generalized triples are used, then the OWL 2 RL rules yield the proper result.
It turns out that, in my case, having bnodes as properties was not really an issue, because RDFLib could handle that directly (is that a bug in RDFLib?). But similar, though slightly more complex or even pathological examples can be constructed involving literals in subject positions, and that was a problem because RDFLib refused to handle those triples. What I had to do was to exchange all literals in the graph against a new bnode, perform all the deductions using those, and exchange the bnodes “back” against their original literals at the end. (This mechanism is not my invention; it is actually described by the RDF Semantics document, in the section on Datatype entailment rules.) B.t.w., the triples returned by the system are all “legal” triples, generalized triples play a role during the deduction only (and illegal triples are filtered out at output).
Literals with datatypes were also a source of problems. This is probably where I spent most of my implementation time (I must thank Michael Schneider who, while developing the test cases for OWL 2 RDF Based Semantics, was constantly pushing me to handle those damn datatypes properly…). Indeed, the underlying RDFLib system is fairly lax on checking the typed literals against their definition by the XSD specification (eg, issues like minimum or maximum values were not checked…). As a consequence, I had to re-implement the lexical to value conversion for all datatypes. Once I found out how to do that (I had dive a bit into the internals of RDFLib but, luckily, Python is an interpretative language…) it became a relatively straightforward, repetitive, and slightly time consuming work. Actually, using bnodes instead of “real” literals made it easier to implement datatype subsumptions, too (eg, the fact that, say, an
xsd:byte is also a
xsd:integer). This became important so that the rules would work properly on property restrictions involving datatypes.
Bottom line: even for a simple implementation literals, mainly literals with datatypes, are the biggest headache. The rest is really easy. (This is hardly the discovery of the year, but is nevertheless good to remember…)
I was, actually, carried away a bit once I got a hold on how to handle datatypes, so I also implemented a small “extension” to OWL 2 RL by adding datatype restrictions (one of the really nice new features of OWL 2 but which is not mandated for OWL 2 RL). Imagine you have the following vocabulary item:
ex:RE a owl:Restriction ; owl:onProperty ex:p ; owl:someValuesFrom [ a rdfs:Datatype ; owl:onDatatype xsd:integer ; owl:withRestrictions ( [ xsd:minInclusive "1"^^xsd:integer ] [ xsd:maxInclusive "6"^^xsd:integer ] ) ] .
which defines a restriction on the property
ex:p so that some its values should be integers in the
[1,6] interval. This means that
ex:q ex:p "2"^^xsd:integer.
ex:q rdf:type ex:RE .
And this could be done by a slight extension of OWL 2 RL; no new rules, just adding the datatype restrictions to the datatypes. Nifty…
That is it. I had fun, and maybe it will be useful to others. The package can also be downloaded and used with RDFLib, by the way…