Local Ontology Repositories with Pellet

It is relatively common for us to hear from customers that they want to use Pellet without it accessing the network. Sometimes they want to avoid network problems by caching locally; sometimes they’re conforming to local security policy constraints; often, people just like hacking on local copies before publishing their ontologies on the Web. Regardless of motivation, they need to avoid the network access used to fetch the contents of an ontology’s imports closures. In this post I outline how a user can setup a local ontology repository that will be used by Pellet’s Jena loader.

The most common use case is a user hand editing a collection of local ontologies which use HTTP URLs. Until the ontologies are ready to be published there is no content (or, even worse, outdated content) at those URLs. The problem is that it’s cumbersome to change all the URLs to file: URLs only to change them back when publishing.

Consider two simple ontologies. First,


@prefix owl: <http://www.w3.org/2002/07/owl#> .

<http://example.org/PeopleOntology> a owl:Ontology .
<http://example.org/PeopleOntology#Person> a owl:Class .

And, second,


@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .

<http://example.org/FriendOntology> a owl:Ontology ;
                       owl:imports <http://example.org/PeopleOntology> .
<http://example.org/FriendOntology#Friend> a owl:Class ;
                       rdfs:subClassOf <http://example.org/PeopleOntology#Person> .

We want to use Pellet to iteratively check the inferred class hierarchy as we develop the ontologies. To do this with the command line tools, we normally issue the following command

:; pellet classify http://example.org/FriendOntology

But if we try this as-is, we’ll get an error. We need Pellet to recognize that, while these ontologies are destined to be published on the Web, for now they are in local files named people.ttl and friend.ttl. To do this, we use a Jena LocationMapper configuration file. We can setup the file, named location-mapping.ttl, with the following Turtle content:


@prefix lm: <http://jena.hpl.hp.com/2004/08/location-mapping#> .

[] lm:mapping
  [ lm:name "http://example.org/PeopleOntology" ; lm:altName "file:people.ttl"  ] ,
  [ lm:name "http://example.org/FriendOntology" ; lm:altName "file:friend.ttl"  ] .

The only other change we need is to explicitly tell Pellet to use the Jena loader; it uses the OWLAPI loader by default. The command line looks like

:; pellet classify --loader Jena http://example.org/FriendOntology

With the location mapping configuration file in place, we no longer get a timeout but instead see the class hierarchy we expect, based on the content of the local files.

The second common use case is a user working with an ontology they’ve found on the Web and which has an arbitrarily large imports closure. This user wants to avoid network accesses to fetch ontologies. There are three steps to addressing this; first we need to identify all of the ontology URLs in the imports closure, then we need store them in our local repository; finally, we need to create an adequate mapping file.

To illustrate this example, we’ll use the LKIF-Core ontology. This ontology is interesting because it has a moderate number of ontologies in its imports closure. We could use a tool like Protégé 4 to identify the ontologies in the imports closure; but we’re going to assume that Pellet is the only ontology tool available. To find all the network resources fetched, we can take advantage of some debug logging available in Jena. Jena uses log4j, so we need to create a log4j configuration file, called lm-log4j.properties, to echo the interesting content to standard error.


log4j.rootLogger=WARN, stderr
log4j.appender.stderr=org.apache.log4j.ConsoleAppender
log4j.appender.stderr.target=System.err
log4j.appender.stderr.layout=org.apache.log4j.SimpleLayout
log4j.logger.com.hp.hpl.jena.util.FileManager=DEBUG

Once created, we set the system property

log4j.configuration

to reference the file. If you’re using the shell script included with Pellet-2.0 RC5 or newer, you can do this with an environment variable as follows

:; export pellet_java_args="-Dlog4j.configuration=file:lm-log4j.properties"

Then proceed as before

:; pellet consistency --loader Jena http://www.estrellaproject.org/lkif-core/lkif-core.owl

There will be a lot of DEBUG messages, but it’s easy to narrow in on the useful details with a simple grep command, such as

:; pellet consistency --loader Jena http://www.estrellaproject.org/lkif-core/lkif-core.owl 2>&1 | grep 'Not mapped'

What’s output should be something like the following, enumerating all of the URLs which are being retrieved:


DEBUG - Not mapped: http://www.estrellaproject.org/lkif-core/lkif-core.owl
DEBUG - Not mapped: http://www.estrellaproject.org/lkif-core/norm.owl
DEBUG - Not mapped: http://www.estrellaproject.org/lkif-core/legal-role.owl
DEBUG - Not mapped: http://www.estrellaproject.org/lkif-core/legal-action.owl
DEBUG - Not mapped: http://www.estrellaproject.org/lkif-core/role.owl
DEBUG - Not mapped: http://www.estrellaproject.org/lkif-core/expression.owl
DEBUG - Not mapped: http://www.estrellaproject.org/lkif-core/action.owl
DEBUG - Not mapped: http://www.estrellaproject.org/lkif-core/process.owl
DEBUG - Not mapped: http://www.estrellaproject.org/lkif-core/relative-places.owl
DEBUG - Not mapped: http://www.estrellaproject.org/lkif-core/time.owl
DEBUG - Not mapped: http://www.estrellaproject.org/lkif-core/mereology.owl
DEBUG - Not mapped: http://www.estrellaproject.org/lkif-core/lkif-top.owl

After downloading each of these files and saving them locally, we can create a location mapping file as above, with one map entry per ontology. That said, the location mapping configuration file supports more sophisticated mapping, and this is a great time to take advantage of prefix based mapping. The following content in location-mapping.ttl should be sufficient:


@prefix lm: <http://jena.hpl.hp.com/2004/08/location-mapping#> .

[] lm:mapping
   [ lm:prefix "http://www.estrellaproject.org/lkif-core/" ; lm:altPrefix "file:./" ] .

With this in place and all the files in the working directory, if we rerun the previous command, grep doesn’t find any matches. To disable the debug output

:; unset pellet_java_args

Then proceed as before

:; pellet consistency --loader Jena http://www.estrellaproject.org/lkif-core/lkif-core.owl

We’ve used the location mapping configuration to completely avoid network access.

A few additional details are worth noting. First, Jena does some searching for the location mapping configuration file, but the easiest approach is to keep it in the working directory. Alternatively, it can be explicitly named using the LocationMap system property. This approach can be attractive if you work on multiple ontology projects and would like them to share a single local repository. E.g., you might use

:; export pellet_java_args="-DLocationMap=file:///etc/my-repository.ttl"

Second, in Pellet 2.0 RC5 this functionality is only available if Pellet’s Jena loader is used. We’ve got a ticket open to duplicate the functionality in the OWLAPI loader and hope to have it in place before the final Pellet 2.0 release.

Feel free to comment on this functionality or any other aspect of Pellet’s behavior on the pellet-users mailing list. See you there.

Update: There has been some public discussion, such as this thread on public-owl-wg@w3.org about tools using XML Catalogs to provide a standardized map description format similar to the one provided by the location mapper configuration file used here. We think that any mechanism that is sane and supported by OWL tools in an interoperable way is a good thing. Translation between the Jena format and XML Catalogs looks straight forward, so you needn’t worry about backwards compatibility issues if Pellet supports XML Catalogs in the future.


Colophon

This is Thinking Clearly, a weblog by Clark & Parsia, LLC—read more about this site.

Follow us on Twitter RSS Feed