Integrity Constraints, Reasoning, and a Preview Release

In a previous post Evren introduced some of the work we’ve been doing lately to turn OWL into an expressive schema or data validation language. In other words, using OWL to specify and implement integrity constraints for RDF and other data.

Simple Integrity Constraints & Reasoning

In this post I want to give a simple example to motivate the integration of OWL reasoning with integrity constraint checking. Consider the case encountered when instance data is expressed in terms of the most specific concepts in an ontology:


:Citizen a owl:Class .
:Man a owl:Class .
:Woman a owl:Class .
:ssn a owl:DatatypeProperty .
:Citizen owl:disjointUnionOf ( :Man :Woman ) .

So, in this ontology there are three concepts: citizen, man, woman; and all citizens are men or women but not both. And, further, there is a property, Social Security Number.

The instance data is


:Marge a :Woman .
:Homer a :Man ;
  :ssn "123-45-6789" .


So: there is a woman, Marge; and a man, Homer, who has a Social Security Number. In this example, how can we say that all citizens should have Social Security Numbers? Like this:


:Citizen rdfs:subClassOf [
  a owl:Restriction ;
  owl:onProperty :ssn ;
  owl:cardinality 1 ] .

But without reasoner integration, this constraint won’t apply to either Marge or Homer, because the constraint refers to citizens, not to men or women. Which is as it should be since duplicating the constraint is not only inaccurate but error-prone (DRY, after all).

Since our integrity constraint checker uses OWL reasoning, however, it will infer that Marge and Homer are both citizens:


:Marge a :Citizen .
:Homer a :Citizen .

Thus, when it applies the integrity constraint (that citizens must have SSNs) it will produce a validation error: Marge is a citizen, which we know by reasoning, but the data does not contain Marge’s SSN.

Bad Workarounds

Integration with reasoning allows us to do better data modeling, since it prevents us from repeating the constraints for all of the most specific class types (men, women) when it’s simpler to put the constraint on a more general class (citizen). And this is more accurate data modeling, too, since the requirement for SSNs is not dependent on a person’s gender.

The obvious workaround, if you don’t have reasoning integrated with constraint checking, is to write integrity constraints in terms of the most specific concepts, which is almost always an unnecessary proliferation of constraints—violating DRY—and may be inaccurate modeling, too.

This workaround makes constraint authoring and maintenance much more difficult. By integrating with the reasoner, we support a more natural and scalable approach to integrity constraint specification, one that leverages the ontology by placing constraints appropriately high in the class hierarchy.

Preview Release

If you want to play with a preview release of our OWL-based Integrity Constraint validator, you can download it now. Note that it’s not licensed under open source terms. This version will likely be released open source in a future version of Pellet, but we’re not doing that today. Information about installation, use, and the forum for more information and bug reports (please!) are all included in the README.txt. Details of the evaluation license terms are in the LICENSE.txt.


Colophon

This is Thinking Clearly, a weblog by Clark & Parsia, LLC—read more about this site.

Follow us on Twitter RSS Feed