Our Approach to Modeling, Fidelity, and KR

For some people, the point of the Semantic Web is distributed, web-friendly knowledge management and knowledge representation. Generally we’re in that camp. But that camp breaks down into several factions, and it’s useful to be clear about which faction we’re in.

There is a spectrum that runs from Maximum Fidelity to Maximum Scalability. Given our roots in Description Logic, we lie somewhere in-between these two poles. Notice that I have intentionally avoided calling these “extremes”; they are endpoints, and perfectly respectable, useful ones, depending on who you are and what you’re trying to achieve.

The Max Fidelity folks want to model as closely as possible some world-chunk in as fine-grained and faithful manner as is possible. This often means that they are at least first order logic fans, and sometimes higher-order logic users. They debate edge cases, corner cases, alternate and competing semantics and logics in an attempt to ever more faithfully mirror reality. The price they pay is, generally, computability. For some use cases, that price is perfectly acceptable. For other use cases, that price is entirely too high, since the most perfect representation of the world is useless if you can’t practically compute with it—at least, that’s how Max Fidelity often looks to us.

At the far end of the spectrum we have Max Scalability folks, for whom the point of the Semantic Web is rather more the “Web” than the “Semantic” part—we might playfully call them the “semantic WEB” crowd, in order to reflect their ideal ratio. Here the point isn’t to model perfectly; but, rather, to do something with lots and lots of data, ideally Webfuls of data. This means, in the argot of current tech choices, that they tend to be RDF and Linked Data fans and users, since that’s just about the only approach to doing anything at all interesting with Webfuls of data. The price they pay, of course, is expressivity. For some use cases, that’s just fine, since you don’t always need a lot or even much semantic fidelity to get the job done. Sometimes we build applications for customer that take this approach. But, as above, for other use cases, this is simply a killer, because without enough or the right semantics, you don’t get the right kind of help from the machine in figuring out complex stuff.

So what do we have so far? First, we have a notional (and idealized) spectrum that runs from Webfuls of data to, roughly, at least first order logic. Second, we have obviously tons of interesting use cases at (probably) every point along this spectrum. And, third, we have the suggestion that we aim for some kind of sweet spot in the middle — where “sweet spot” and “in the middle” are not absolute notions, but are interest-relative and goal-specific, and where the interests and goals we care about are, surprise-surprise, ours.

(In other words, I’ve setup a little fantasy where we are the Heroes — where we naturally occupy the “sweet spot”—but then, since I’m not a complete jerk, I’ve ironized or called into question that very fantasy in an effort to suggest that we, just like everyone else, try to spin things to make ourselves look smart, cool, and useful.)

And—will miracles never cease?—that’s just about where Description Logic fits along such an idealized spectrum. Technically, it’s the decidable subset of first order logic, which means that we try to balance Fidelity and Scalability in a way where we can get some of both.

The Max Fidelity folks are forever poking us with sticks to the effect that we can’t model world-chunks nearly as faithfully as they can. Well, no crap, of course we can’t! Then the Max Scalability folks poke us with different sticks to the effect that we can’t scale to Webfuls of data—again, no duh!

And then we poke back at both camps—hey, they started it!—to the effect that we can model far better than Max Scalers and we can scale far further than Max Fideliters (yes, I just made that word up…Rock!)…

Finally, a word about how this positioning issue plays out in our approach to modeling. In short, we model such that we get the right inferences, since getting the inferences is typically what our kind of applications (analysis, decision support kinds of apps, in short) are all about. So that means some edge or corner cases, even if they fit into DL, get ignored or dropped out or even distorted when there’s no point—given requirements analysis—to fidelity for its own sake. And it means, on the flip side, that we don’t worry too much that that inference over Webfuls of data is not realistically achievable anytime soon. Fast enough for the customer’s data is sufficient scalability in most cases for us.


Colophon

This is Thinking Clearly, a weblog by Clark & Parsia, LLC—read more about this site.

Follow us on Twitter RSS Feed