11.1: Uncertainty and Vagueness

Last updated
Save as PDF

Page ID: 6464

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

This advanced ontology engineering topic concerns how to cope with uncertainty and vagueness in ontology languages and their reasoners and what we can gain from all the extra effort. At the time of writing, this elective topic is mainly focused on theory and research, and a few proof-of-concept tools exist. Let’s first clarify these two terms upfront:

Uncertainty: statements are true or false, but due to lack of knowledge we can only estimate to which probability / possibility / necessity degree they are true or false;
Vagueness: statements involve concepts for which there is no exact definition (such as tall, small, close, far, cheap, expensive), which are then true to some degree, taken from a truth space.

Consider, e.g., information retrieval: to which degree is a web site, a page, a paragraph, an image, or a video segment relevant to the information need and an acceptable answer to what the user was searching for? In the context of ontology alignment, one would want to know (automatically) to which degree the focal concepts of two or more ontologies represent the same thing, or are ‘sufficiently’ overlapping. In an electronic health record system, one may want to classify patients based on their symptoms, such as throwing up often, having a high blood pressure, and yellow-ish eye color. Or compute the probability that a person is HIV positive is 23% and has been exposed to TB is 85%, or the probability that birds fly. How can ontology-driven software agents do the negotiation for your holiday travel plans that are specified imprecisely, alike “I am looking for a package holiday of preferably less than R15000, but really no more than R20000 , for about 12 days in a cold country that has snow”? One may want to classify, say, ripe apples, find the set of all individuals that mostly buy low calorie food, and patients that are possibly septic when having the properties of “infection and [temperature \(>\) 39C OR temperature \(<\) 36C, respiratory rate \(>\) 20 breaths/minute OR PaCO2 \(<\) 32 mmHg]”. Of course, one can combine the two notions as well, e.g.: “It is probable to degree 0.915 that it will be hot in June in the Northern hemisphere”.

The main problem to solve, then, is what and how to incorporate such vague or uncertain knowledge in OWL and its reasoners (or another logic, as one desires). The two principal approaches regarding uncertainty probabilistic and possible languages, ontologies, and reasoning services, where the former way of dealing with uncertainty receives a lot more attention than the latter¹. The two principal approaches regarding vagueness and the semantic web are fuzzy and rough extensions, where fuzzy used to receive more attention compared to the rough approach, but theories for the latter are catching up.

Fuzzy Ontologies

In fuzzy logic², statements are true to some degree which is taken from a truth space, which is usually [0, 1]. This sounds easier than it is, especially the deductions that follow from them. Consider first the following example.

Example \(\PageIndex{1}\)

Take the statement that Hotel EnjoyYourStay is close to the train station to degree 0.83. A query on a hotel booking site may be “Find me the top-k cheapest hotels close to the train station”, more formally:

q(h) ← hasLocation(h, hl) ∧ hasLocation(train, cl) ∧ close(hl, cl) ∧ cheap(h)

Then what is the meaning—and logically: interpretation—of the query when filled in with the values alike close(EnjoyYourStay,train) ∧ cheap(200)? That is, how does that work out in the model (in the context of model-theoretic semantics)? We have to change the interpretation function a little to accommodate for the fuzzy concepts. The interpretation is a function \(I\) that maps atoms into a value between \(0\) and \(1\) inclusive, i.e., \(I(A)\in [0, 1]\). Given that, then if the knowledge base states I(close(EnjoyYourStay,train)) = 0.83 and I(cheap(200)) = 0.2, then what is the result of 0.83 ∧ 0.2? More generally, what is the result of \(n ∧ m\), for \(n, m\in [0, 1]\)?

Many-valued formulae in fuzzy logic have the form \(\phi\geq\mathcal{l}\) or \(\phi\leq u\) where \(l, u\in [0, 1]\), meaning that the degree of truth is at least \(\mathcal{l}\) and at most u, respectively. Figure 10.1.1 visualizes the intuition of the graded degrees of truth. The top-left chart shows the membership function of expensive: a value of 25 is definitely not expensive, i.e., it would evaluate to 0, then it slowly starts climbing toward expensive, where at 187.5, expensive(someObject)= 0.5, and climbing further up to 400, when, whatever it is, it is definitely expensive (evaluating to 1); this function has a so-called right shoulder. One can do the same with two opposite concepts, like hot and cold. There are no values, as what is perceived to be hot (cold) weather in Iceland is different from Ireland, that’s again different from Italy and yet again from the Ivory Coast. An example of a fuzzy function passed the revue in Example 9.1.1 for ‘mesoscopic small’ for the hunting spear. More fuzzy functions exist, such as the trapezoidal function with both a left and a right shoulder, also depicted in Figure 10.1.1, which, perhaps, could refer to the optimal temperature of a cup of coffee—not too cold and not too hot.

Screenshot (118).png

Figure 10.1.1: Some examples of fuzzy functions; see text for explanation.

Returning to Example 10.1.1, it noted that the interpretation has to be modified cf. the ‘standard’ way we have seen in Block I in order to take care of a truth value between 0 and 1 inclusive, rather than either 0 or 1. It is a mapping \(I : Atoms → [0, 1]\), which are extended to formulae as follows (the principal list is shown):

\[\mathcal{I} (\neg\phi) = \mathcal{I} (\phi)\in 0\]

\[\mathcal{I} (\exists x\phi) = sup_{c\in\Delta^{\mathcal{I}}} \mathcal{I}^{c}_{x} (\phi)\]

\[\mathcal{I} (\forall x\phi) = inf_{c\in\Delta^{\mathcal{I}}} \mathcal{I}^{c}_{x} (\phi)\]

\[\mathcal{I} (\phi ∧\psi) = \mathcal{I} (\phi)\otimes\mathcal{I}(\psi)\]

\[\mathcal{I} (\phi ∨\psi) = \mathcal{I} (\phi)\oplus\mathcal{I}(\psi)\]

\[\mathcal{I} (\phi\to\psi) = \mathcal{I} (\phi)\Rightarrow\mathcal{I}(\psi)\]

\[\mathcal{I} (\neg\phi) = \ominus \mathcal{I} (\phi)\]

where \(\mathcal{I}^{c}_{x}\) is as \(\mathcal{I}\) except that var \(x\) is mapped to individual \(c\), the \(\otimes ,\oplus ,\Rightarrow ,\) and \(\ominus\) are combination functions: triangular norms (or t-norms), triangular conorms (or s-norms), implication functions, and negation functions, respectively. Also, they extend the classical Boolean conjunction, disjunction, implication, and negation, respectively, to the many-valued case. In addition, it poses the notion of degree of subsumption between two fuzzy sets \(A\) and \(B\), which is defined as \(inf_{x\in X} A(x)\Rightarrow B(x)\) and such that if \(A(x)\leq B(x)\) for all \(x\in [0, 1]\) then \(A\sqsubseteq B\) evaluates to 1. Finally, \(\mathcal{I}\models\phi\geq l\) (resp. \(\mathcal{I}\models\phi\leq u)\) iff \(\mathcal{I} (\phi)\geq l\) (respectively, \( I(\phi)\leq u)\).

For notations of specific fuzzy DLs and fuzzy OWL extension, basically, the language specifications are the usual DL/OWL notation, with the above-mentioned modifications and additions. For details and examples, you are suggested to start with [Str08], which provides a good introductory overview and has a very long list of references to start delving into the topics, and a more technical take on it is provided in [LS08]. An alternative approach is taken in [BS11] (that also describes many examples), where all the fuzzy knowledge is put in the annotations of an OWL 2 file through a Protégé plugin³ and then it is either ignored by a standard reasoner or parsed and handled by the fuzzyDL⁴ or DeLorean⁵ reasoner.

Fuzzy DLs can be classified according to the DL or OWL language that they extend, the allowed fuzzy constructs and the underlying fuzzy logics (notably, G¨odel, Lukasiewicz, Zadeh), and their reasoning services. Regarding the latter, because we have those special new fuzzy features in the language, we get new reasoning services with it. They are:

Consistency, Subsumption, Equivalence
Graded instantiation: Check if individual \(a\) is an instance of class \(C\) to degree at least \(n\), i.e., \(KB\models\langle a : C, n\rangle\)
Best Truth Value Bound problem: determine tightest bound \(n\in [0, 1]\) of an axiom \(\alpha\), i.e. \(glb(KB, \alpha) = sup \{n, | KB\models \langle\alpha\geq n\rangle\}\) (likewise for \(lub\), lowest upper bound)
Best Satisfiability Bound problem: \(glb(KB, C)\) determined by the max value of \(x\) s.t. \((\mathcal{R},\mathcal{T} ,\mathcal{A}\cup \{a : C\geq x\})\) (among all models, determine the max degree of truth that concept C may have over all individuals \(x\in\Delta^{\mathcal{I}}\))
\(glb(KB, C\sqsubseteq D)\) is the minimal value of \(x\) such that \(KB = (\mathcal{R},\mathcal{T} ,\mathcal{A}\cup\{ a : C\sqcap\neg D\geq 1 − x\})\) is satisfiable, where \(a\) is a new individual; Therefore, the greatest lower bound problem can be reduced to the minimal satisfiability problem of a fuzzy knowledge base

There are several fuzzy reasoners, such as the aforementioned FuzzyDL for fuzzy \(\mathcal{SHIF(D)}\) the DeLorean reasoners.

How does this work out practically? Given some fuzzy \(\mathcal{SHIF(D), SHOIN (D), SROIQ(D)}\) that is serialized in OWL syntax, then one can declare the fuzzy concepts with either modifiers (e.g., very) or ‘concrete’ fuzzy concepts (e.g., Young) using data properties, where both additions have explicit membership functions. Jumping over the technicalities here, it enables one to specify fuzzy concepts as follows, using ‘minor’ and ‘young’ as examples. Just by numbers, \(_{\leq 18}(x)\) over \(\mathbb{N}\), evaluates to true if \(x\leq 18\), false otherwise (i.e., \(cr(0, 18)\)). Let’s define ‘minor’ as \(\texttt{Minor}\equiv\texttt{Person}\sqcap\exists\texttt{Age.}_{\leq 18}\). Being a minor and being young is not exactly the same thing, so let’s add something new for the latter: \(\texttt{Young : Natural} → [0, 1]\) is declared as a fuzzy datatype predicate denoting the degree of youngness, which then allows one to define ‘young’ as, say, \(\texttt{Young(x) = ls(x, 10, 30)}\), where ls is the left shoulder function (like the cold line in Figure 10.1.1). Then a young person may be defined as \(\texttt{YoungPerson}\equiv\texttt{Person}\sqcap\exists\texttt{Age.Young}\). What does the ontology entail with these axioms asserted? It entails, e.g.:

\(O\models\texttt{Minor}\sqsubseteq\texttt{YoungPerson}\geq\texttt{0.6}\),

\(O\models\texttt{YoungPerson}\sqsubseteq\texttt{Minor}\geq\texttt{0.4}\)

The values of 0.6 and 0.4 follow from the fuzzy functions, were ‘minor’ covered the age range of 0 to 18 and the young covered 0-30 age range that was going downhill with being ‘young’ from the age of 10.

Example 9.1.1 describes the specification of a ‘small bladed hunting spear’ in a fuzzy way.

Rough Ontologies

Rough ontologies are not an endpoint of themselves, but a means to an end—if one is lucky. If a concept turns out to be a rough concept, i.e., there are some individuals of which we don’t know whether they are instances of that concept, it means that the concept is underspecified at least with respect to the properties one has represented in the ontology. More precision is good, from an ontological viewpoint at least, and rough ontologies can assist the modeler by providing an iterative way to add more properties so as to reduce the amount of uncertainty in an observable way. That is, adding more properties to the concept should lead to fewer individuals of which we don’t know whether they are instances of that concept. If adding or removing a property does not make a difference, then that property is less important than others for that specific concept. Thus is, it can assist in making one’s ontology more precise, be this regarding the representation only, or to press the domain expert for more subject domain details.

Screenshot (119).png

Figure 10.1.2: A rough set and associated notions (Source: based on [PS07]).

Rough DLs were first introduced in [Kee10b]. As ‘roughness’ is not commonly included in undergraduate education, a brief summary (based on [Kee10b]) is provided first, before porting it to DLs and demonstrating where it solves some representation and reasoning problems.

Rough Sets

We consider here the typical rough set model of Pawlak, which is illustrated diagrammatically in Figure 10.1.2. Formally, \(I = (U, A)\) is an information system, where \(U\) is a non-empty finite set of objects and \(A\) a finite non-empty set of attributes such that for every \(a\in A\), we have the function \(a : U\mapsto V_{a}\) where \(v_{a}\) is the set of values that attribute \(a\) can have. For any subset of attributes \(P\subseteq A\), the equivalence relation IND\((P)\) is then defined as follows:

\[\texttt{IND}(P) = \{(x, y)\in U\times U | \forall a\in P, a(x) = a(y)\}\]

which generates a partition of \(U\), denoted with \(U/\)IND\((P)\), or \(U/P\) for short. If \((x, y)\in\)IND\((P)\), then \(x\) and \(y\) are indistinguishable with respect to the attributes in \(P\), which is referred to as p-indistinguishable.

From those objects in \(U\), the aim is to represent set \(X\) such that \(X\subseteq U\), using \(P\) (with \(P\subseteq A\)). That set \(X\) may not be crisp, i.e., it may include or exclude objects which are indistinguishable on the basis of the attributes in \(P\), i.e., we do not know given the attributes under consideration. This can be approximated by using a lower and an upper approximation, which are defined as:

\[\underline{P} X = \{x | [x]_{P}\subseteq X\}\]

\[\overline{P} X = \{x | [x]_{P}\cap X\neq ∅\}\]

where \([x]_{P}\) denotes the equivalence classes of the p-indistinguishability relation. The lower approximation (10.1.9) is the set of objects that are positively members of set \(X\) (more precisely: it is the union of all equivalence classes in \([x]_{P}\)). The upper approximation is the set of objects that are possibly in \(X\) and its complement, \(U −\overline{P} X\), is the negative region that is the union of all equivalence classes of sets of objects that are definitely not in \(X\) (i.e., \(\neg X\)). Then, “with every rough set we associate two crisp sets, called lower and upper approximation” [PS07], which is denoted as a tuple \(X = \langle\underline{X},\overline{X}\rangle\). Finally, the difference between the lower and upper approximation, \(B_{P} X = \overline{P}X −\underline{P} X\), is called the boundary region: this region contains the objects that neither can be classified as member of \(X\) nor that they are not in \(X\). It follows that if \(B_{P} X = ∅\) then \(X\) is a crisp set with respect to \(P\) and when \(B_{P} X\neq ∅\) then \(X\) is rough w.r.t. \(P\), i.e., then there are indistinguishable objects.

Given the boundary region, one can compute the accuracy of approximation, \(\alpha_{P} X\), which indicates how well a rough set approximates the target set. There are several ways to compute it, e.g., \(\alpha_{P} X = \dfrac {|\underline{P} X|} {|\overline{P} X|}\) or \(\alpha_{P} X = 1− \dfrac{|B_{P} X|} {|U|}\) (this is a separate topic not further discussed here). Note that \(\underline{P} X\subseteq X\subseteq\overline{P} X\). There are further notions that we do not need in this summary, but are ontologically interesting: the reduct and core, which are the set of sufficient conditions (attributes) and the set of necessary conditions, respectively (still with respect to \(P\)).

Transferring Rough Sets into Ontologies

In most ontology languages, there are more constructors than just attributes and ‘attributes’ may be, at least, represented with a role (object property) \(R\in\mathcal{R}\) or value attributions (data property) \(D\in\mathcal{D}\), which has to be accounted for: rough set’s \(P\) is thus to be taken from \(\mathcal{R}\cup\mathcal{D}\) where those attributes have the rough concept declared as domain. In addition, it requires an appropriate model-theoretic semantics for \(\underline{C}\) and \(\overline{C}\) as well as a ‘rough concept’, denoted here with “\(\wr C\)” to simplify notation. The semantics of the approximations can be transferred in a straightforward manner, where \(E\) denotes indistinguishability (equivalence) relation (which is reflexive, symmetric, and transitive):

\[\underline{C} = \{x | \forall y : (x, y)\in E → y\in C\}\]

\[\overline{C} = \{x | \exists y : (x, y)\in E ∧ y\in C\}\]

Then there is rough sets’ tuple notation, \(X = \langle\underline{X},\overline{X}\rangle\) to transfer into DL, but a \(\wr C = \langle\underline{C},\overline{C}\rangle\) is a rather unusual notation. Instead, one can use also two new binary relationships, dubbed lapr and uapr, to relate any rough concept and its associated approximations, which are typed as follows:

\[\forall\phi,\psi .lapr(\phi,\psi ) → \wr C(\phi) ∧ \underline{C}(\psi)\]

\[\forall\phi,\psi .uapr(\phi,\psi) → \wr C(\phi) ∧ \overline{C}(\psi)\]

Note that they quantify over sets, not objects that are member of the respective sets, therewith making explicit the knowledge about the three sets and how they relate. Finally, we make explicit that \(\wr C\) is identified by the combination of its lower and upper approximation, which the flowing axioms ensure:

\( \forall\phi. \wr C(\phi) →\exists\psi .lapr(\phi,\psi ),\)

\(\forall\phi . \wr C(\phi) →\exists\psi .uapr(\phi,\psi),\)

\[\forall\phi,\psi,\varphi .lapr(\phi,\psi) ∧ lapr(\phi, \varphi) →\psi = \varphi,\]

\(\forall\phi,\psi,\varphi .uapr(\phi,\psi) ∧ uapr(\phi, \varphi) →\psi = \varphi,\)

\(\forall\phi_{1},\phi_{2},\psi_{1},\psi_{2} .lapr(\phi_{1},\psi_{1}) ∧ uapr(\phi_{1},\psi_{2})∧ lapr(\phi_{2},\psi_{1}) ∧ uapr(\phi_{2},\psi_{2}) → \phi_{1} = \phi_{2}.\)

They say that: 1) there must be exactly one lower and one upper approximation for each rough concept, and 2) there is one rough concept for each combination of lower and upper approximation (i.e., if an approximation differs, it is a different rough concept).

Practically within an OWL-only setting, the \(\overline{\texttt{C}}\) and \(\underline{\texttt{C}}\) is reduced to, in OWL 2 DL functional syntax:

\(\texttt{EquivalentClasses(}\overline{\texttt{C}}\texttt{ ObjectSomeValuesFrom(a:Ind a:C))}\)

\(\texttt{EquivalentClasses(}\underline{\texttt{C}}\texttt{ ObjectAllValuesFrom(a:Ind a:C))}\)

where Ind denotes the indistinguishability relation. Then Eq. 10.1.15 is approximated by adding object properties uapr, lapr that have \(\wr\texttt{C}\) as domain, and an exactly 1 cardinality constraint:

\(\texttt{ObjectPropertyDomain(a:upar a:}\wr\texttt{C)}\)

\(\texttt{ObjectPropertyDomain(a:lapr a:}\wr\texttt{C)}\)

\(\texttt{ObjectExactCardinality(1 a:uapr a:}\overline{\texttt{C}})\)

\(\texttt{ObjectExactCardinality(1 a:lapr a:}\underline{\texttt{C}})\)

One has to add this for each rough concept and its approximations. For instance, the promiscuous bacteria of [Kee10c]:

\(\texttt{PromiscuousBacterium}\equiv\texttt{Organism}\sqcap\exists\:\texttt{Percentage.real}_{>10}\sqcap\geq 6\:\texttt{hasHGTCluster.FlexibleHGTGeneCluster }\)

\(\texttt{PromiscuousBacterium}\sqsubseteq = 1\:\texttt{lapr.PromBactLapr}\)

\[\texttt{PromiscuousBacterium}\sqsubseteq = 1\:\texttt{uapr.PromBactUapr}\]

\(\texttt{PromBactLapr}\equiv\forall\:\texttt{Ind.PromBact}\)

\(\texttt{PromBactUapr}\equiv\exists\:\texttt{Ind.PromBact}\)

Each such concept will have to be tested against the instances in the ABox: is the PromiscuousBacterium indeed a rough concept? This was experimented with in two ways: ‘natively’ in OWL files as well as in the OBDA setting (Figure 8.4.1 was taken from that experiment). Practically, at the time at least, putting the instances in the OWL file was a ‘bad’ idea, mainly because the reasoners are not optimised to deal efficiently with data properties, value restrictions, and instances (or: computing the rough concepts took quite some time even with just 17 individuals). The OBDA setting did work, but was at the time cumbersome to set up; this has improved in the meantime (see also Chapter 8).

One can add rough subsumption to rough concepts [Kee11a] as well, which is beyond the current introductory scope.

Footnotes

¹We will not cover probabilistic ontologies in this chapter; a recent introductory overview is described in [Luk17]. Some pointers to reasoners are: Pronto, which is an extension to the Pellet reasoner (pellet.owldl.com/pronto/), PR-OWL (http://www.pr-owl.org/ and others, such as a Probabilistic Ontology Mapping Tool (OMEN), and combinations with Bayesian networks (BayesOWL, OntoBayes).

²In this section, I assume the reader recalls something of fuzzy logic or fuzzy sets, and that the occasional examples suffice as refresher.

³available at http://www.umbertostraccia.it/cs/software/FuzzyOWL/ in July 2018, not the URL listed in the paper.

⁴www.umbertostraccia.it/cs/sof...L/fuzzyDL.html

⁵http://webdiis.unizar.es/~fbobillo/delorean.php