2.3: What is an Ontology?

Last updated
Save as PDF

Page ID: 6402

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

Note

You may prefer to read this section again later on in the course, when we are well into Block II. Try to read it now anyway, but if it’s not clear upon the first read, then don’t worry, as it will become clearer as we go along.

Screenshot (48).png

Figure 1.2.1: Screenshot of the lion eating only herbivores and at least some impala in the Protégé ontology development environment.

The Definition Game

To arrive at some answer(s) as to what an ontology is, let us first compare it with some artefacts you are already familiar with: relational databases and conceptual data models such as EER and UML. An important distinction between conceptual data models and ontologies is that a conceptual data model provides an application-specific implementation-independent representation of the data that will be handled by the prospective application, whereas (domain) ontologies provide an application-independent representation of a specific subject domain, i.e., in principle, regardless the particular application, or, phrased positively: (re)usable by multiple applications. From this distinction follow further differences regarding their contents—in theory at least—to which we shall return to in Block II. Looking at actual ontologies and conceptual data models, the former is normally formalised in a logic language, whereas conceptual modelling is more about drawing the boxes and lines informally¹, and they are used differently and serve different purposes.

A comparison between relational databases and ontologies as knowledge bases reveals that, unlike RDBMSs, ontologies (knowledge bases) include the representation of the knowledge explicitly, by having rules included, by using automated reasoning (beyond plain queries) to infer implicit knowledge and detect inconsistencies of the knowledge base, and they usually operate under the Open World Assumption².

This informal brief comparison gives a vague idea of what an ontology might be, or at least what it is not, but it does not get us closer to a definition of what an ontology is. An approach to the issue of definitions was taken in the 2007 Ontolog Communiqué³ , where its participants and authors made a collection of things drawn into a diagram to express ‘things that have to do with an ontology’; this is depicted in Figure 1.2.2. It is intended as a “Template for discourse” about ontologies, which has a brief⁴ and longer⁵ explanation of the text in the labeled ovals. The “semantic” side has to do with the meaning represented in the ontology and the “pragmatic” side has to do with the practicalities of using ontologies.

Figure 1.2.2: The OntologySummit2007’s “Dimension map”.

Let us now look at attempts to put that into words into a definition. Intuitively it is known by the ontologists what an ontology is, but putting that into words such that it also can survive philosophers’ scrutiny is no trivial matter. The consequence is that, at the time of writing, there is no unanimously agreed-upon definition what an ontology is. The descriptions have been improving over the past 20 years, though. We mention them here, as some are better than others, and you may come across this in the scientific literature. The most quoted (but problematic!) definition is the following one by Tom Gruber:

Definition \(\PageIndex{1}\):

An ontology is a specification of a conceptualization. ([Gru93])

You may see this quote especially in older scientific literature on ontologies, but it has been superseded by other, more precise ones, for Gruber’s definition is unsatisfactory for several reasons: what is a “conceptualization” exactly, and what is a “specification”? Using two nebulous terms to describe a third one does not clarify matters. A proposed refinement to address these two questions is the following one:

Definition \(\PageIndex{2}\):

An ontology is a formal, explicit specification of a shared conceptualization. ([SBF98])

However, this still leaves us with the questions as to what a “conceptualization” is and what a “formal, explicit specification” is, and why and how “shared”? Is it shared enough when, say, you and I agree on the knowledge represented in the ontology, or do we need a third one or a whole group to support it? A comprehensive definition is given in Guarino’s landmark paper on ontologies [Gua98] (revisited in [GOS09]):

Definition \(\PageIndex{3}\):

An ontology is a logical theory accounting for the intended meaning of a formal vocabulary, i.e. its ontological commitment to a particular conceptualization of the world. The intended models of a logical language using such a vocabulary are constrained by its ontological commitment. An ontology indirectly reflects this commitment (and the underlying conceptualization) by approximating these intended models. ([Gua98])

A broader scope is also described in [Gua09], and a more recent overview about definitions of “an ontology” versus Ontology in philosophy can be found in [GOS09], which refines in a step-wise and more precise fashion Definitions 1.2 and 1.3. It is still not free of debate [Neu17], though, and it is a bit of a mouthful as definition. A simpler definition is given by the developers of the World Wide Web Consortium’s standardised ontology language OWL⁶:

Definition \(\PageIndex{4}\):

An ontology being equivalent to a Description Logic knowledge base. ([HPSvH03])

That last definition has a different issue, and is unduly restrictive, because 1) it surely is possible to have an ontology that is represented in another logic language (OBO format, Common Logic, etc.) and 2) then formalising a thesaurus as a “Description Logic knowledge base” (or: in OWL) also ends up as a simple ‘lightweight ontology’ (e.g., the NCI thesaurus as cancer ‘ontology’) and a conceptual data model in EER or UML that is translated into OWL becomes an ‘application ontology’ or ‘operational ontology’ by virtue of it being formalised in OWL. But, as we saw above, there are differences between the two.

For better or worse, currently, and in the context of the most prominent application area of ontologies—the Semantic Web—the tendency is toward it being equivalent to a logical theory, and a Description Logics knowledge base in particular (Definition 1.2.4). Ontologists at least frown when someone calls ‘a thesaurus in OWL’ or ‘an ER diagram in OWL’ ontologies, but even aside from that: the blurring of the distinctions between the different artefacts is problematic for various reasons (discussed in later chapters), and one should note the fact that just because something is represented in OWL does not make it an ontology, just like that something that is represented in a language other than OWL may well be an ontology.

Some Philosophical Notes on Ontologies

The previous section mentioned that the definition would have to survive the philosophers’ scrutiny. But why so? The reason for that is that ‘ontologies’ in computer science did not come out of nowhere. Philosophers are in the picture because the term ‘ontology’ is taken from philosophy, where it has a millennia-old history, and one uses insights emanating from philosophy when developing good ontologies. When we refer to that philosophical notion, we use Ontology, with a capital ‘O’, and it does not have a plural. Orthogonal to the definition game, there are discussions about what is actually represented in an ontology, i.e., its contents, from a philosophical perspective.

One debate is about ontology as a representation of a conceptualisation— roughly: things you are thinking of—and as a representation of reality. Practically, whether that is a relevant topic may depend on the subject domain for which you would be developing an ontology. If you represent formally the knowledge about, say, malaria infections, you would better represent the (best approximation of) reality, being the current state of scientific knowledge, not some divergent political or religious opinion about it, because the wrong representation can lead to wrong inferences, and therewith wrong treatments that are either ineffective or even harmful. Conversely, there are subject domains where it does not really matter much whether you represent reality or a conceptualisation thereof, or something independent of whether that exists in reality or not, or even certainly does not exist in reality. Such discussions were commonplace in computing and applications of ontologies some 10-15 years ago, but have quieted down in recent years. One such debate can be found in writing in [Mer10a, Mer10b, SC10]. Merrill [Mer10a] provides several useful clarifications.

empiricist doctrine \(\PageIndex{1}\):

First, there is an “Empiricist Doctrine” where “the terms of science... are to be taken to refer to actually existing entities in the real world”, such as Jacaranda tree, HIV infection and so forth, which are considered mind-independent, because HIV infections occurred also without humans thinking of it, knowing how it worked, and naming those events HIV infections. This is in contrast with the “conceptualist view according to which such terms refer to concepts (which are taken to be psychological or abstract formal entities of one sort or another)”, with concepts considered to be mind-dependent entities; prototypical examples of such mind-dependent entities are Phlogiston and Unicorn—there are no objects in the world as we know it that are phlogiston or unicorns, only our outdated theories and fairy tale stories, respectively, about them.

universalist doctrine \(\PageIndex{2}\):

Second, the “Universalist Doctrine”, which asserts “that the so-called “general terms” of science” (HIV infection etc.) “are to be understood as referring directly to universals”, with universals being “a class of mind independent entities, usually contrasted with individuals, postulated to ground and explain relations of qualitative identity and resemblance among individuals. Individuals are said to be similar in virtue of sharing universals.” [MR05]. However, philosophers do not agree on the point whether universals exist, and even if they exist, what kind of things they are.

This brings the inquiring person to metaphysics, which, perhaps, is not necessarily crucial in building ontologies that are to serve information systems; e.g., it need not be relevant for developing an ontology about viruses whilst adhering to the empiricist doctrine. The philosophically inclined reader may wish to go a step further and read about interactions between Ontology and metaphysics by, e.g., [Var12].

There are other aspects of philosophy that can have an effect on what is represented in an ontology and how. For instance, it can help during the modelling stage, like that there’s a difference between what you are vs. the role(s) you play and between participating in an event vs. being part of an event, and help clarifying assumptions you may have about the world that may trickle into the ontology, like whether you’re convinced that the vase and the clay it is made of are the same thing or two different things. We will return to this topic in Chapter 6.

Good, Not So Good, and Bad Ontologies

Just like one can write good and bad code, one can have good and bad ontologies. Their goodness, or badness, is a bit more elaborate than with software code, however. Bad software code can be unmaintainable spaghetti code or have bugs or not even compile. For ontologies, the equivalent to ‘not compile’ is when there is a violation of the syntax. We’ll get into the syntax in Block I. The equivalent to ‘bugs’ is two-fold, as it is for software code: there can be errors in the sense that, say, a class cannot have any instances due to conflicting constraints and there can be semantic errors in that what has been represented is logically correct, but entirely unintended. For instance, that a class, say, Student somehow turns up as a subclass of Table, which it obviously should not.

There are further intricate issues that make one ontology better than another. Some structuring choices are excluded because of ontological constraints. Let us take the example of green apples. One could formalise it as that we have apples that have the attribute green or say there are green objects that have an appleshape. Logic does not care about this distinction, but, at least intuitively, somehow, objects having the colour green seems more reasonable than green objects having an apple-shape. There are reasons for that: Apple carries an identity condition, so one can identify the object (it is a ‘sortal’), whereas Green does not (it is a value of the attribute hasColor that a thing has). Ontology helps explaining such distinctions, as we shall see in Chapter 6.

Finally, with the interplay between the logic one uses to represent the knowledge in an ontology and the meaning of the entities in the subject domain, we can show schematically a notion of good and bad ontologies. Consider Figure 1.2.3. We have a good ontology when what we want to represent has been represented in the ontology, yet what is actually represented is very close and only slightly more than the intention; that is, we have a high precision and maximum coverage. We have a less good ontology when the ontology represents quite a bit more than what it should; that is, we have a low precision and maximum coverage. Things can go wrong when we have a maximum precision, but only limited coverage, or: the ontology does not contain all that it should, hence, would be a bad ontology when it can’t do what it should in our ontology-driven information system. Things are even worse if we have both a low precision and limited coverage: then it contains stuff we don’t want in there and does not contain stuff that should be in there.

Screenshot (50).png

Figure 1.2.3: Good, less good, bad, and even worse ontologies. The pink circle denotes the subject domain (say, African Wildlife), the green circle denotes what’s in the ontology (say, the AWO).

Footnotes

¹though one surely can provide them with logic-based reconstructions (e.g., [ACK⁺07, BCDG05, Kee13])

²vs. Closed World Assumption in a relational database setting. We return to the OWA and CWA in a later chapter.

³ontolog.cim3.net/cgi-bin/wiki...007_Communique

⁴ontolog.cim3.net/cgi-bin/wiki.../DimensionsMap

⁵ontolog.cim3.net/cgi-bin/wiki...007_Communique

⁶Note: we will go into some detail of OWL, Description Logics, and knowledge bases in Chapters 3 and 4.