We’ve recently completed work on a prototype for distributed query answering across disconnected RDF data sources (or relational databases or web services). The system uses Pellet to reformulate a high-level SPARQL query—written against a data model expressed in OWL—into many subqueries. Reasoning happens during query reformulation, as does SWRL rule firing, and the distributed data sources simply answer the subqueries routed to them by Pellet, based on descriptions of the data sources and their relations. Pellet then combines the answers into the final query result for the user.
The query reformulation method is sound and complete for some subsets of OWL2 DL, while maintaining good data complexity for query answering (for the curious: LOGSPACE). Distributed query answering solves two main use cases: scalability and integration. In the former, we spread the data across many systems to achieve massive scale; in the latter, the data is already spread across many systems and the requirement is to query across it sensibly.
There is a touch point with the linked data effort, which meant that the new voiD vocabulary for describing datasets turns out to be very useful for describing the distributed data sources that we query over, including their interrelations.
This kind of unanticipated and uncoordinated cooperation between different parts of the Semantic Web community is a bit of real evidence that standards-based approaches can pay big dividends. We neither knew nor participated in the voiD effort, and its proponents certainly didn’t participate in our distributed query answering system—but this stuff just works together very nicely.