Why Reasoning Matters: Consistency Checking

Reasoning only matters (to us, to the market) if it’s useful. Significance—technical and economic—is, ideally, a function of utility and perceived utility. This is the first in a series of posts that will, I hope, increase the perceived utility of formal, automated, logic-based reasoning.

We think reasoning is useful in a great number of ways, for a great number of use cases. But its proponents haven’t always done a great job of communicating that utility to others, in part because like any non-trivial field, reasoning is complicated. Lots of squiggles and symbols and off-putting bits. Like machine learning or computer vision or…Perl.

Let’s talk, first, about data integration. Non-trivial cases require something more than the standard ploy (i.e., Linked Data):

  1. make an RDF Schema
  2. coin or re-use a URI scheme
  3. convert n sources into RDF
  4. dump that RDF into an RDF database
  5. query the database (i.e., build a new front-end, etc)

We’ve done that for NASA, for example, quite successfully: an expertise location service we built called POPS, as well as a NASA data center analysis tool called BIANCA—and while it works, there weren’t any really hard modeling, mapping, or integration bits.

For really hard bits, like schema and mapping alignments, partial alignments, dynamic mappings, query routing, and so on, you need more help from the computer. Reasoning gives you that help, particularly when the problems are complex (very large or many schemas to be aligned, or partial mappings or alignments, etc).

In these applications, especially where data volumes are large, you want to follow standard engineering principles with regard to failure and edge case detection: that is, you want to fail early and often. Expressing a Global View on n schemas as an OWL ontology means that certain kinds of mapping and integration conceptual errors can’t happen in a live, production system. Using consistency checking at design-time, the computer checks that all concepts in the Global View can actually be instantiated, that is, that they are logically consistent.

You just can’t do that in RDF or RDFS, since it’s not generally possible to express a contradiction in those languages. Everything is always consistent in an RDF or Linked Data application. But that’s not always the way the world works. For some applications, that feature is really a bug.

Pellet analogous to static type checking at compile time in some programming languages. That sort of feature of the system eliminates certain classes of failures.

In complex integration apps, eliminating that class of error not only makes the system more robust at run-time, but it also increases the confidence one can have about the answers to queries against the data. A pure Linked Data solution cannot eliminate that class of errors and, thus, cannot increase confidence that query answers actually make sense.

As Jim Hendler likes to say, there is no single, univocal notion or standard of truth on the Web or on the Semantic Web. Yes, of course. But for some apps and data sets, there is such a notion, and it’s incredibly useful that tools like Pellet can detect and enforce those modeling choices and constraints.

Significance—technical and economic—is a function of utility and perceived utility. This post gives you some good reasons to perceive the utility of reasoning differently; in future posts, I’ll give more good reasons around things like explanation, automated debugging, and other reasoning services.


Colophon

This is Thinking Clearly, a weblog by Clark & Parsia, LLC—read more about this site.

Follow us on Twitter RSS Feed