Integrity Constraints for OWL

We’ve previously talked about our NIST-funded work to support integrity constraints in OWL. We plan to release very soon a prototype integrity validator in Pellet that can interpret OWL ontologies with Closed World Assumption (CWA) to detect constraint violations. So it’s time to explain what integrity constraints do and how we can interpret OWL axioms as integrity constraints—in other words, how Pellet now supports using OWL as an expressive schema language for Semantic Web data.

Why Integrity Constraints?

Using OWL as an expressive schema language—i.e., giving OWL an integrity constraints semantics—addresses a general use case that neither OWL nor OWL2 can handle. In fact, people new to OWL often misunderstand this point, thinking that it’s an expressive constraint language that can be used to validate instance data as in relational databases or XML tools. People have a similar expectation for RDFS, too.

After all, it sounds like RDF Schema should be to RDF what XML Schema is to XML. But it turns out that, due to the Open World Assumption (OWA) adopted by RDF and OWL, the axioms in an ontology are meant to infer new knowledge rather than trigger an inconsistency.

The simplest way to demonstrate this misconception in the context of RDFS is


:isManufacturedBy rdfs:range :Manufacturer .
:product1 :isManufacturedBy :ACME .

where the range restriction causes the inference that ACME is a Manufacturer, rather than causing a range violation—since ACME is not explicitly defined to be a Manufacturer instance. The same kind of unexpected inferences occur when using OWL constructs; for example, with cardinality constraints, missing values do not cause an inconsistency and extra values cause two individuals to be inferred owl:sameAs.

The key point isn’t that people don’t understand OWL; it’s that their expectations point to real use cases relevant to some kinds of applications. For example, you can use OWL axioms as integrity constraints in order to validate message structures exchanged in Supply Chain Management (SCM) systems. Similar use cases arise in an Enterprise Service Bus (ESB) when you exchange messages encoded in RDF/OWL rather than XML in order to perform “semantic validation.” Likewise, there are use cases for OWL as a kind of schema language for RDF instance data in Linked Data apps where the natural affinity between OWL and RDF—which the OWL2 working group has worked very hard to preserve, while evolving OWL2—has been difficult to achieve because of, in part, the lack of integrity constraint semantics for OWL.

Interpreting OWL axioms as constraints

Our approach to all of this is to give an alternative semantics for OWL axioms so that they are interpreted with CWA and a weak form of Unique Name Assumption (UNA). CWA interpretation means that we assume an assertion is false if we don’t know explicitly whether it is true or false. Weak UNA means that if two individuals are not inferred to be the same, then they will be assumed to be distinct. Then the user or application specifies that an ontology should be interpreted with standard OWL semantics and another should be interpreted with the alternative semantics. Or they specify that some ontology should be treated as a set of constraints for some RDF data.

The alternative semantics we use is based on the way integrity constraints are supported in classical Datalog systems where constraints are written in First Order Logic and then translated to Horn rules that use Negation as Failure (NAF). Since OWL is a subset of First Order Logic this approach works quite nicely for translating constraints written as OWL axioms.

In our case it turns out we can do this translation to SPARQL queries rather then Horn rules. So an OWL axiom is automatically translated to one or more queries; then if at least one query returns a result, that means the constraint is violated. I will explain the technical details of our approach in another post, but let me show the SPARQL query generated if we declare the range restriction above as a constraint:


ASK WHERE {
   ?x  :isManufacturedBy  ?y .
   NOT { ?y  rdf:type :Manufacturer . }
}

Of course the NOT keyword is not part of SPARQL, but you can fake it with the well-known OPTIONAL/FILTER/!BOUND pattern for encoding NAF in SPARQL.

This translation works for all OWL axioms and constructs, including everything in OWL2.

This means that users and developers don’t need to learn a different constraint language or learn to write complex SPARQL queries by hand, which is impractical since these queries get complicated for anything more complex than trivial examples. Instead they can just build OWL ontologies using their favorite OWL editor or reuse existing OWL ontologies and then just say that they want the axioms in these ontologies to be treated as integrity constraints.

So our new IC validator for Pellet means that you can now use Pellet and OWL to both create new knowledge and validate SemWeb instance data. That’s code and specification reuse for the win. That it’s also a very concrete bridge technology between RDF and OWL is icing on the cake.

What's Next?

We are wrapping up the prototype and expect to release a version of it soon. In some future weblog posts, I’ll describe how constraint validation interacts with reasoning results and delve into the details of our technical approach.


Colophon

This is Thinking Clearly, a weblog by Clark & Parsia, LLC—read more about this site.

Follow us on Twitter RSS Feed