9.2: OBDA Design Choices

Last updated
Save as PDF

Page ID: 6451

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

Regardless whether you needed the motivation, the commonality of both cases described in the previous section is that it tries to ‘cut out’ several processes that were hitherto done manually by automating them, which is one of the core aims of computing. In essence, the aim is to link the knowledge layer to the data layer that contains gigabytes or even terabytes of data, and in some way still obtain inferences. From a knowledge engineering viewpoint, this is “logic-based knowledge representation \(+\) lots of data” where the latter is relegated to secondary storage rather than kept in the OWL file. From a database perspective, this is seen as “databases \(+\) background knowledge” where the latter happens to have been represented an OWL file. The following example illustrates how the knowledge can make a difference.

Example \(\PageIndex{1}\):

Consider we have the following:

\(\texttt{Professor(Mkhize)}\) //explicit data represented and stored in a database

\(\texttt{Professor}\sqsubseteq\texttt{Employee}\) // knowledge represented in the ontology

Then what is the answer to the query “list all employees”? In a database-only setting, it will tell you that there are no employees, i.e., return \(\{\}\). In the ontologyonly setting, it wouldn’t know if there are none. With the ontology \(+\) database, it will infer Employee(Mkhize) and return {Mkhize} as answer, which is what one may have expected intuitively by reading the example.

The question then becomes: how to combine the two? Data complexity for OWL 2 DL is decidable but open, and for query complexity the decidability is open (recall Table 4.2.1). Put differently: this is bad news. There are two ways we can restrict things to get the eventual algorithms ‘well-behaved’ for scalability:

Restrict the TBox somehow, i.e., decrease the expressivity of the language (at the cost of what one can model); within this option, there are two variants:

v1. Incorporate the relevant parts of the TBox into the query and evaluate the query over a completed ABox;

v2. Rewrite the query as needed, incorporate the TBox into the ABox, and then evaluate the query.

2. Restrict the queries one can ask in such a way that they are only those whose answer do not depend on the choice of model of the combination of the TBox \(+\) ABox.

First of all, when we’re considering limiting the TBox, we end up with some minimalist language like OWL 2 QL or one of the DL-Lite flavors underpinning OWL 2 QL. Then, what does v1 and v2 mean for our example above? This is illustrated in the following example.

Example \(\PageIndex{2}\):

Consider again the query “list all employees” with the following:

\(\texttt{Professor(Mkhize)}\) //explicit data represented and stored in a database

\(\texttt{Professor}\sqsubseteq\texttt{Employee}\) // knowledge represented in the ontology

Option v1 will first notice Employee’s subclass, Professor. It will then rewrite the query so as to ask for all instances of both Employee and Professor. It then returns {Mkhize} as answer.

Option v2 first will notice Professor(Mkhize), and it sees Professor’s superclass Employee. It will extend the database with a new assertion, Employee(Mkhize). Then it will try to answer the original query. It now sees Employee(Mkhize) and will return {Mkhize} as answer.

Does this difference in process matter? Of course, it does. There are advantages and disadvantages, as shown in Table 8.2.1: having to re-compute the whole extended database can be time-consuming, as is using the knowledge of the TBox to rewrite the query. Option 1 v1 is used more often, and we’ll see the technicalities of it in the remainder of the chapter¹.

	v1 (query rewriting)	v2 (data completion)
Queries	rewriting is exponential in \(\|Query\|\)	data only grows polynomially in \(\|ABox\|\)
Updates	applies to original data	needs to rematerialize the data completion

Table 8.2.1: Comparing v1 and v2 of the ‘restrict your ontology language’ option.

Option 2 tends to be more used in the database world for database’s physical design, data structures, query optimization, and materialized views [TW11].

Footnotes

¹More details about the options can be found in[CGL⁺07, KLT⁺10, LTW09]