Semantic Versioning and OWL Ontologies

I begin with a bald, if not exactly bold claim: an OWL ontology is like a public API for your data. Better: an OWL ontology is a public, machine-readable contract between producers and consumers about the meaning of data. I’m going to assume this is reasonable and move on to some implications for managing OWL ontologies, viz., how to version them.

The main implication is that OWL ontologies should be versioned in much the same way as APIs are (or, better: as they should be) versioned in your organization.

Now I’m not talking about using an ontology to precisely, clearly describe an actual API; that is, we’re not talking about using OWL as an API documentation system. (That may be a good thing to do, but it’s not what I’m talking about here.)

How to Think About Versioning OWL Ontologies

We’re talking about treating an ontology as if it were an API and then determining how to apply Semantic Versioning to the task of versioning the ontology as an ontology.

What are the implications of this approach?

First, you should probably keep doing what you ordinarily do to manage, version, control IT artifacts. If you aren’t doing anything for governance, change management, revision control, etc., then stop reading this right now, buy a gun, and shoot yourself in the foot for real.

Why should you (mostly) keep doing what you’re already doing generally?

Because OWL ontologies—and this goes for RDF data, SPARQL queries, RIF rules, etc.—are a lot more like than unlike other, non-semantic IT artifacts (code, data models, business logic, etc). The same strategies you apply to them should work for OWL ontologies, too.

After all, ontologies are text files in the end. We should expect the same management tools & techniques to work. And for the most part they do.

But there’s a more subtle, psychological reason. One way to mis-manage novelty is to let it run riot. When introducing semantic technology into an organization, you should, as a matter of adoption strategy, try to change as few things at a time as possible. Using known-good techniques helps constrain the novelty and gives everyone a chance to internalize it and respond to it appropriately.

Second, OWL ontologies should be semantically versioned, which means two things:

  • make the ontology’s version identifier structured & meaningful, i.e., encode some meaning in the string of characters that makes up the version identifier; and

  • change the version identifer according to well-understood, public, and reasonable rules.

Which suggests, of course, that a version identifier, plus a strategy for changing version identifiers, is a simple signaling mechanism intended to make multi-party coordination games cheaper and less disruptive for the participants. Consumers and producers of an ontology, no less and no more than of an API, are engaging in a multi-party coordination game in which costs should be kept as low as possible. Semantic versioning is one such cost control mechanism.

The rest of this article, then, is figuring out how to apply the Semantic Versioning rules to OWL ontologies: since they’re not really APIs, some adaptation is required.

How to Version OWL Ontologies

SemVer has 9 rules, the first 6 of which apply directly to versioning OWL ontologies. Let’s quickly review these easy ones:

(1) We can handle the first easily: it just says that to use SemVer, you must declare a public API. Right. Hard to do otherwise with an OWL ontology.

(2) A version identifier is of the form, “X.Y.Z”, where X, Y, and Z are integers, which increase in the normal way. Further, X is the major field; Y is the minor field; and Z is the patch field. Semantic versioning just means declaring public rules and conditions for when each field is incremented, based on the kind and impact of changes in the versioned artifact, i.e., in an OWL ontology.

(3) You can make a special version identifier for, say, beta releases or release candidates, etc. by appending some alphnumeric stuff to the patch field. This applies directly to OWL with no changes.

(4) You never change a public ontology—in any way whatever—without changing its version identifier. The version identifier is a signal to consumers about what is in the new version. If the ontology changes publicly, the version identifer must change, too. Again, ontologies are just like code with respect to this rule.

(5) and (6) There are two special cases: before the ontology is finalized, use a 0 is the major field. That signals to consumers that anything may change at any time. The other special case is version 1.0.0, which defines the public ontology. Before 1.0.0, we’re not finished. After 1.0.0, the following rules about how version identifiers signal the kinds of changes that have taken place between revisions must be followed.

The last three are the hardest: what conditions constitute changes to major, minor, and patch fields? These are harder because OWL ontologies are in some sense quite different from programming language APIs. We’ve so far been riding the high of their similarity, but now we have to deal substantively with their dissimilarity.

We want to end up with a versioning scheme that sends this set of signals:

  • if there’s a patch change, consumers can safely ignore that version
  • if there’s a major change, consumers should not ignore that version
  • if there’s a minor change, consumers need to investigate further

Admittedly, the minor change ambiguity is not ideal, but for now we can’t seem to do any better.

This is a kind of Goldilocks or binning problem: what counts as a big, medium, and minor change to an OWL ontology? Someone’s always unhappy, no matter what solution one offers to this kind of problem.

But let’s try to state some principles:

  • inferences that are legal in an ontology are important, both the direct ones in the ontology and the indirect ones in some other ontology or system
  • OWL is monotonic, so in some sense deletions are more important than additions
  • few changes are absolutely safe with respect to inferences, but there are some OWL bits that are non-logical, i.e., can’t affect inferences

Why do we focus on inferences, i.e., semantically-significant changes? OWL provides a variety of mechanisms that help us handle purely syntactic changes, including declaring two classes or individuals to be the same. But also we’re trying to play a bit of analogy here: changing the asserted (i.e., explicit) axioms in an ontology is a bit like changing the implementation of some interface rather than its explicit, public, contracted behavior. We want to focus on the significant, effectual changes, not on every change. Versioning an OWL ontology too tightly on its asserted axioms gives producers too little flexibility and causes consumers to have to check-in more often than otherwise might be the case. In short, our intuition is that it’s more expensive with no additional gain to do it that way.

Based on these principles, let’s state some more specific, broad ideas:

  • a major change is anything that removes direct, valid inferences
  • a minor change is anything that may remove indirect inferences, may break SPARQL queries (or RIF rules or other OWL axioms, etc) written against the ontology, or which may add direct or indirect inferences
  • a patch change is anything that is non-logical

We treat additions and deletions differently. OWL is monotonic, which just means that no addition to an OWL ontology can ever cause there to be fewer legal inferences. Additions to ontologies can only ever cause there to be more inferences or the same number of them. To lose inferences, you have to remove stuff from an OWL ontology. Thus, we treat deletions as more serious than additions; hence, they always should trigger a major field change in the version identifier.

There’s a wrinkle. Some ontology changes that seem trivial aren’t. For example, what if we rename an OWL class from foo:Person to foo:Human? Surely that’s not so crucial? It’s not if, but only if, no consumers of the ontology have any axioms using foo:Person or there are no SPARQL queries or RIF rules or RDFS statements about foo:Person. Hence, we have to be very careful about analyzing changes in ontologies, and their impact, so that we can signal this properly when updating a version identifier. (This example might also motivate judicious use of owl:equivalentClass to help minimize breakages—we’ll leave this for another post.)

We also have to distinguish between the ontology being versioned (“direct”) and any other ontology (“indirect”) that imports (i.e., consumes) it. We do this because even trivial seeming changes (changing the name of a class, changing a URI for a class or individual) can break axioms in some other ontology that is importing ours. But since that would mean almost every change would require a major field increment (which offends us as programmers), we say that if the ontology changes such that its own (“direct”) inferences are reduced, that’s a major increment. We do this partly for aesthetic reasons: a version id, 52.4.1, offends our programmer sensibilities far more than 4.52.1. We’re more likely to get the latter than the former with this scheme.

Finally, some OWL features are non-logical, that is, they aren’t semantically significant, i.e., they can’t affect inferences in a valid OWL system. Chief among these are things like RDF comments, OWL comments, the value of RDFS labels, and the contents of axiom annotations.

How to Implement This in OWL 2

We’ll deal with that in a followup post; OWL 2 adds some versioning support which we’ll use to implement this scheme. We’ll also discuss how this scheme relates to URLs, URIs, etc.

Feedback: Comments


Comments


Comments by Disqus

Colophon

This is Thinking Clearly, a weblog by Clark & Parsia, LLC—read more about this site.

Follow us on Twitter RSS Feed