# Complexity Recap

This appendix is expected to be relevant only to those who have no idea of, or too little recollection of, computational complexity, or who have come across it many years ago and may like a brief refresher.

Theory of computation concerns itself with, among other things, languages and its dual, problems. A problem is the question of deciding whether a given string is a member of some particular language; more precisely: if $$\Sigma$$ is an alphabet, $$L$$ is a language over $$\Sigma$$, then the problem $$L$$is “given a string $$w\in\Sigma^{*}$$ , decide whether or not $$w$$ is in $$L$$”. The usage of ‘problem’ and ‘language’ is interchangeable. When we focus on strings for their own sake (e.g., in the set $$\{o^{n}1^{n} | n\geq 1\}$$), then we tend to think of the set of strings as a language. When we focus on the ‘thing’ that is encoded as a string (e.g., a particular graph, a logical expression, how satisfiable a class is), we tend to think of the set of strings as a problem. Within the context of ontologies, we typically talk of the representation languages and reasoning problems.

There are several classes of languages; see Figure 13.1. The regular free languages have their counterpart with finite automata; the context-free languages with push-down automata; the recursive languages is the class of languages accepted by a Turing machine (TM) that always halts; the recursively enumerable languages is the class of languages defined by a TM; the non-recursively enumerable languages is the class of languages for which there is no TM (e.g., the diagonalization language). The recursive languages, and, to a lesser extent, the recursively enumerable languages, are by far the most interesting ones for ontologies.

Turing machines are used as a convenient abstraction of actual computers for the notion of computation. A TM that always halts = algorithm, i.e., the TM halts on all inputs in finite time, either accepting or rejecting; hence, the recursive languages are decidable problems/languages. Problems/languages that are not recursive are called undecidable, and they do not have an algorithm; if they are in the class of recursively enumerable languages (but not recursive), then they have a procedure that runs on an arbitrary TM that may give you an answer but may very well never

Figure 13.1: Graphical depiction of the main categories of languages.

halt; see also Figure 13.2. First order predicate logic in its full glory is undecidable. Description logics are decidable fragments of first order predicate logic1, i.e., they are recursive languages and (can) have algorithms for the usual problems (standard reasoning services).

Figure 13.2: Graphical depiction of the main categories of languages; the rectangle denotes a Turing Machine; $$w$$ is a string and $$L$$ is a language.

Not all algorithms are alike, however, and some take up more time (by the CPU) or space (in the form of memory size) to compute the answer than others. So, we want to know for a given problem, the answer to “how much time [/space] does it take to compute the answer, as a function of the size of the input?”. If the computation takes many years with the top-of-the-range hardware, then it is still not particularly interesting to implement (from a computer science viewpoint, that is). To structure these matters, we use the notion of a complexity class. There are very many of them, but we only refer to a few in the context of ontologies. For instance, it may take a polynomial amount of time to compute class subsumption for an OWL 2 EL-formalized ontology and exponential time to compute how satisfiable an EER diagram (represented in the DL $$\mathcal{DLR}_{\texttt{ifd}}$$) is and the bigger the diagram (more precisely: the logical theory), correspondingly the longer it takes. The intuition is depicted in Figure 13.3: for small ontologies, there is but a minor difference in performance, but one really starts to notice it with larger logical theories.

Figure 13.3: General idea of time complexity of an algorithm, as function of the size of the input. e.g.: All basic arithmetic operations can be computed in polynomial time; evaluating a position in generalized chess and checkers on an $$n\times n$$ board costs exponential time.

Looking ahead at the complexity classes relevant for OWL, we list here a description of the meaning of them (copied from the OWL 2 Profiles Standard page [MGH+09]):

- Decidability open means that it is not known whether this reasoning problem is decidable at all.

- Decidable, but complexity open means that decidability of this reasoning problem is known, but not its exact computational complexity. If available, known lower bounds are given in parenthesis; for example, (NP-Hard) means that this problem is at least as hard as any other problem in NP.

- X-complete for X one of the complexity classes explained below indicates that tight complexity bounds are known—that is, the problem is known to be both in the complexity class X (i.e., an algorithm is known that only uses time/space in X) and hard for X (i.e., it is at least as hard as any other problem in X). The following is a brief sketch of the classes used in Table 4.2.3, from the most complex one down to the simplest ones.

- 2NEXPTIME is the class of problems solvable by a nondeterministic algorithm in time that is at most double exponential in the size of the input (i.e., roughly $$2^{2^{n}}$$ , for $$n$$ the size of the input).

- NEXPTIME is the class of problems solvable by a nondeterministic algorithm in time that is at most exponential in the size of the input (i.e., roughly $$2^{n}$$ , for $$n$$ the size of the input).

- PSPACE is the class of problems solvable by a deterministic algorithm using space that is at most polynomial in the size of the input (i.e., roughly $$n^{c}$$ , for $$n$$ the size of the input and $$c$$ a constant).

- NP is the class of problems solvable by a nondeterministic algorithm using time that is at most polynomial in the size of the input (i.e., roughly $$n^{c}$$ , for $$n$$ the size of the input and $$c$$ a constant).

- PTIME is the class of problems solvable by a deterministic algorithm using time that is at most polynomial in the size of the input (i.e., roughly $$n^{c}$$ , for $$n$$ the size of the input and $$c$$ a constant). PTIME is often referred to as tractable, whereas the problems in the classes above are often referred to as intractable.

- LOGSPACE is the class of problems solvable by a deterministic algorithm using space that is at most logarithmic in the size of the input (i.e., roughly $$log(n)$$, for $$n$$ the size of the input and $$c$$ a constant). NLOGSPACE is the nondeterministic version of this class.

- AC0 is a proper subclass of LOGSPACE and defined not via Turing Machines, but via circuits: AC0 is the class of problems definable using a family of circuits of constant depth and polynomial size, which can be generated by a deterministic Turing machine in logarithmic time (in the size of the input). Intuitively, AC0 allows us to use polynomially many processors but the run-time must be constant. A typical example of an AC0 problem is the evaluation of first-order queries over databases (or model checking of first-order sentences over finite models), where only the database (first-order model) is regarded as the input and the query (first-order sentence) is assumed to be fixed. The undirected graph reachability problem is known to be in LogSpace, but not in AC0 .

## Footnotes

1More precisely: there is at least one that turned out to be undecidable ($$\mathcal{DLR_{US}}$$ ), but this is an exception to the rule.