Pellet ICV 0.4 Release: Using OWL Integrity Constraints to Validate SKOS

Today we released a new version of Pellet Integrity Constraint Validator. This release includes support for some additional constraint types—such as property disjointness and functional data properties—as well as assorted bug fixes.

If you missed the previous posts about integrity constraints (IC), let’s briefly recap: ICs allow us to validate RDF instance data using the closed world assumption. ICs in Pellet ICV are expressed as regular OWL axioms, but the validator interprets them differently and auto-generates SPARQL queries to check the constraints.

That means you can use OWL as a schema and data validation language for Linked Data. See, the SemWeb layer cake was a pretty good idea after all!

In the new release we include some examples about using ICs to validate integrity conditions expressed in SKOS motivated by Paul Herman’s posts on this issue. The SKOS spec defines many [integrity conditions" (Section 4.4, 5.4, 8.4, 9.4, 10.4. These integrity conditions are formally part of the SKOS data model. However, in the spec they are expressed only in natural language, in order to allow for different kinds of implementation. We think that diversity is a good thing; but expressing the integrity conditions (or even just a subset of them) in a formal language would have been a good thing, too.

The 0.4 release includes an OWL integrity constraint ontology that does just that. In the rest of this post, I’ll show you how we can use this ontology and Pellet ICV to validate SKOS data. (Note: Pellet ICV is just a prototype. We will include the ICV engine in a future release of PelletDb which will be very scalable with respect to integrity constraint evaluation.)

Even though most of the SKOS conditions can be syntactically expressed as OWL 2 axioms, they are not all allowed by OWL2. For example, let’s look at condition S27:

S27: skos:related is disjoint with the property skos:broaderTransitive.

We use a property disjointness axiom from OWL 2 to express this condition:

  skos:related owl:propertyDisjointWith skos:broaderTransitive

However, OWL2 does not allow us to define disjointness on transitive properties (there are several other restrictions on transitive properties to avoid undecidability in reasoning).

But for ICs, we are not bound by these restrictions. Since ICs are used only for detecting violations and not for generating new inferences, such axioms are allowed as ICs. In our implementation, when Pellet ICV sees the above axiom, it will automatically translate it to the following SPARQL query and execute that query using Pellet:


SELECT  ?x0
WHERE {
      ?x0  skos:related            ?x1 ;
           skos:broaderTransitive  ?x1 .
    }

If there are answers to this query, that means the corresponding constraint is violated. It is easy to see that this query will have an answer when executed over the data shown in Example 27:


<A> skos:broader <B> ; skos:related <C> .
<B> skos:broader <C> .

The reasoner will draw the inference <A> skos:broaderTransitive <C> since skos:broader is a subproperty of skos:broaderTransitive which is (surprise) transitive. As a result, Pellet ICV will tell us that this data is violating a constraint. In fact, Pellet ICV will use Pellet’s explanation mechanism to help us debug and repair integrity constraint violations. More about that in a future release.

There are two obvious advantages to OWL ICs:

First, expressing ICs in OWL is very straight-forward and often close to how the natural language description reads, much more so than the the auto-generated SPARQL query. Using a control natural language such as Attempto Controlled English (ACE), we can visualize or create such ICs in natural language.

Second, using OWL ICs allows us to get around the restrictions on OWL axioms while still enabling us to make use of OWL inference. In other words, an inferred fact can satisfy (or violate) a constraint just as well as an explicitly stated fact. Running the generated SPARQL query over an RDF graph containing only the above asserted triples would not return any answers; and, thus, the violation would not be detected.

There are several conditions in SKOS concerning annotation properties. For example, S13 requires disjointness between annotation properties. This isn’t allowed by OWL2, but it works perfectly fine as an IC in our tool. Pellet support queries containing annotation properties and will also respect subproperty definitions between annotation properties so everything will work fine.

There is one particularly pesky SKOS constraint (S14) that cannot be expressed as an OWL IC:

S14: A resource has no more than one value of skos:prefLabel per language tag.

We can approximate this constraint by saying skos:prefLabel is a functional property. This approximation would detect cases of multiple labels for a resource in any language. So this encoding would catch all the violations, but it’s over-constraining: it will also report violations for multi-language SKOS data that are correct according to SKOS. Pellet ICV can handle ICs expressed as SWRL rules so one alternative here would be to encode this constraint as a rule using a built-in function extension to handle language tags. Writing a SWRL rule is not much different than manually writing a SPARQL query for validation which is not ideal but can be useful in corner cases.

You can find a complete listing of SKOS integrity constraints encoded as OWL ICs in the latest Pellet ICV release as well as data files encoding examples from the SKOS reference. As usual, please send your questions and comments about Pellet ICV to the Pellet mailing list.


Colophon

This is Thinking Clearly, a weblog by Clark & Parsia, LLC—read more about this site.

Follow us on Twitter RSS Feed