# 9: Ontology-Based Data Access

## Introduction

There are a myriad of advanced topics in ontology engineering, of which most require an understanding of both the logic foundations and of the modeling and engineering, albeit that each subtopic may put more emphasis on one aspect than another. A textbook at an introductory level cannot possibly cover all the specialized subtopics. Those included in this Block III aim to give an impression of the many possible directions with very distinct flavors and interests. They could have been other topics as well, and it was not easy to make a selection. For instance, machine learning is currently popular, and it is being used in ontology engineering, yet has not been included. Likewise, ontology mapping and alignment has a set of theories, methods, and tools drawing from various disciplines and topics that is of interest (graph matching, similarity measures, language technologies). Perhaps readers are interested in learning more about the various applications of ontologies in ontology-driven information systems to be motivated more thanks to demonstrations of some more concrete benefits of ontologies in IT and computing. I do have reasons for including the ones that have been included, though.

Ontology-Based Data Access could be seen as an application scenario of ontologies, yet it is also intricately linked with ontology engineering due to the representation limitations to achieve scalability, the sort of automated reasoning one does with it, handling the ABox, and querying ontologies, which can be done but hasn’t been mentioned at all so far. That is, it adds new theory, methods, and tools into an ontology engineer’s ‘knapsack’. It principally provides an answer to the question:

• How can I have a very large ABox in my knowledge base and still have good performance?

The second topic (in Chapter ??) is of an entirely different nature compared to OBDA and brings afore two ontology development issues that have so far been ignored as well:

• What to do if one would want, say, the AWO not in English, or have it in several languages, like name the class Isilwane or Dier rather than Animal, and manage it all with several natural languages?
• How can one interact with domain experts in natural language, so that they can provide knowledge and verify that what has been represented in the ontology is what they want to have in there, without them having to learn logic?

That is, there is an interaction between ontologies and natural language, which oftentimes cannot be ignored.

A different topic is the tension between the expressivity of the logic and what one would like to—or need to—represent. Indeed, we have come across the DOL framework (Section 4.3: OWL in Context "The Distributed Ontology, Model, and Specification Language DOL"), but that does neither cover all possibilities (yet), nor doe sit make immediately clear how to represent advanced features. For instance, what if the knowledge is not ‘crisp’, i.e. either true or false, but may be true to a degree? Or if one has used machine learning and induced some $$x$$, then that will be probabilistically true and it may be nicer to represent that uncertainty aspect in the ontology as well. Also, one of the BFO 2.x versions squeezed notions of time in the labels of the object properties (recall Section 6.1: Foundational Ontologies "Several Foundational Ontologies"), but OWL is not temporal, so, logically, those labels have no effect whatsoever. Language extensions to some fragment of OWL and to DLs have been proposed, as there are requests for features to represent such knowledge and reason over it. The main question this strand of research tries to answer is:

• In what way(s) can ontology languages deal with, or be extended with, language features, such a time and vagueness, so that one also can use those extensions in automated reasoning (cf. workarounds with labels)?

That is, this topic has a tight interaction between modeling something even more precisely—obtaining a better quality ontology—and not just availability of language features and tinkering with workarounds, but actually getting them.

As mentioned in the introduction of the book, one can read either chapter in any order, as they do not depend on each other. They are short chapters and could perhaps have been combined into one large chapter with several sections, but that did not look nice aesthetically. Also, it is easier for lectures to cover a whole chapter at once and cover one topic at a time (though noting that each topic easily can cover more than one lecture).

Blocks I and II were rather theoretical in the sense that we have not seen many practical application infrastructures with ontologies. This is set to change in this chapter. We shall look at both theoretical foundations of ontology-based data access (OBDA) and one of its realizations, and you will set up an OBDA system yourself as an exercise. Also, this chapter will provide some technical details of the EPNet example of food in the Mediterranean [CLM+16] that was briefly described in Section 1.3: What is the Usefulness of an Ontology? "Ontologies as Part of a Solution to Other Problems" as an example of ontologies for data integration in the humanities.

From an education perspective, there are several ‘starting points’ for introducing OBDA, for it depends on one’s background how one looks at it. The short description is that it links an OWL file (the ‘ontology’ in the TBox) to lots of data in a relational database (the ‘Abox’) by means of a newly introduced mapping layer, which subsequently can be used for automated reasoner-enhanced ‘intelligent’ queries. The sneer quotes on ‘ontology’ have to do with the fact that, practically, the ‘ontology’ is a logic-based simple conceptual data model formalized in $$±$$OWL 2 QL. The sneer quotes on ‘intelligent’ refer to the fact that with the knowledge represented in the ontology/conceptual data model and an OBDA-enabled reasoner, one can pose more advanced queries to the database in an easier way in some cases than with just a plain relational database and SQL (although at the same time, one cannot use all of SQL).

In any case, in this chapter we divert from the “we don’t really care about expressivity and scalability” of Block II to a setting that is driven by the need for scalability of ontology-driven information systems. We start with some motivations why one should care (Section 8.1: Introduction: Motivations). There are many way to address the issues of query formulation and management, and several design choices are available even within the OBDA approach alone; they are described in Section 8.2: OBDA Design Choices. Sections 8.3: An OBDA Architecture and 8.4: Principal Components describe one of the architectures and its principal components.