Skip to main content

# 7.5: Other Semi-Automated Approaches

Other (semi-)automated approaches to bottom-up ontology development include machine learning techniques, deploying so-called ‘non-standard’ DL reasoning services, and converting diagrams fro biology into (candidate) ontology terms and relations.

A short overview and relevant references of machine learning techniques for ontology development can be found in [dFE10], who also outline where such inductive methods can be used, being: classifying instances, learning new relationships among individuals, probabilistic ontologies, and probabilistic mapping for the ontology matching task, (semi)-automating the ontology population task, refining ontologies, and reasoning on inconsistent or noisy knowledge bases. Several ‘hybrids’ exists, such as the linking of Bayesian networks with probabilistic ontologies [dCL06] and improving data mining with an ontology [ZYS+05].

Other options are to resort to a hybrid of Formal Concept Analysis with OWL [BGSS07], least common subsumer [BST07, Tur08, PT11], and similar techniques. The notion of least common subsumer and most specific concept and motivations where and how it may be useful are described in [PT11]. The least common subsumer and most specific concept use non-standard reasoning services that helps with ontology development, and they are defined in terms of DL knowledge bases as follows.

Definition $$\PageIndex{1}$$:

(least common subsumer ([PT11])). Let $$\mathcal{L}$$ be a Description Logic language, $$\mathcal{K} = \mathcal{(T , A)}$$ be a knowledge base represented in DL $$\mathcal{L}$$ (an $$\mathcal{L}$$-KB). The least common subsumer (lcs) with respect to $$\mathcal{T}$$ of a collection of concepts $$C_{1}, . . . , C_{n}$$ is the $$\mathcal{L}$$-concept description $$C$$ such that:

1. $$C_{i}\sqsubseteq _{\mathcal{T}} C$$ for all $$1\leq i\leq n$$, and
2. for each $$\mathcal{L}$$-concept description $$D$$ holds: if $$C_{i}\sqsubseteq _{\mathcal{T}} D$$ for all $$1\leq i\leq n$$, then $$C\sqsubseteq _{\mathcal{T}} D$$.

Definition $$\PageIndex{2}$$:

(most specific concept ([PT11])). Let $$\mathcal{L}$$ be a Description Logic language, $$\mathcal{K} = \mathcal{(T , A)}$$ be a knowledge base represented in DL $$\mathcal{L}$$ (an $$\mathcal{L}$$-KB). The most specific concept (msc) with respect to $$\mathcal{K}$$ of an individual from $$\mathcal{A}$$ is the $$\mathcal{L}$$-concept description $$C$$ such that:

1. $$\mathcal{K}\models C(a)$$, and
2. for each $$\mathcal{L}$$-concept description $$D$$ holds: $$\mathcal{K}\models D(a)$$ implies $$C\sqsubseteq_{\mathcal{T}} D$$.

The least common subsumer computes the common superclass of a concept and the most specific concept classifies an individual into a concept description.

One could exploit biological models to find candidate terms and relations when those models have been created with software. This allows for semi-automated approaches to formalize the graphical vocabulary in textbooks and drawing tools, and subsequently use an algorithm to populate the TBox with the knowledge taken from the drawings. This because such software has typical icons for categories of things, like a red oval meaning Protein, a yellow rectangle meaning Cell Process, and a pale green arrow with a grey hexagon in the middle meaning Protein Modification. Each individual diagram can thus be analyzed, and the named shapes at least categorized as subclasses of such main classes and relations asserted for the arrows between the shapes. This has been attempted for STELLA and PathwayStudio models [Kee05, Kee12b]. Related are the efforts with converting models represented in the Systems Biology Markup Language (SMBL) into OWL [HDG+11].