8.2: From Spreadsheets to OWL

Last updated
Save as PDF

Page ID: 6444

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

Spreadsheets are normally intended to store data. There are two ways to leverage the structure of spreadsheet content in the process of developing an ontology. The first option is based on the ‘standard’ way of using a spreadsheet. Normally, the first row contains column headings that are essentially the ‘attributes’ or classes of something, and the rest of the columns or rows are the data. This can be likewise for the rows in the first column. This gives two opportunities for bottom-up development: extract those column/row headings and take that vocabulary to insert in the TBox. For instance, a row 1 that contains in columns A-D \(\langle\texttt{FlowerType, Colour, Height, FloweringSeason}\rangle\) gives a clear indication what one could put in the ontology. It is a bit less structured than databases and their corresponding conceptual models, and there are no reverse engineering algorithms to discover the constraints, so still an analysis has to be carried out as to how one represents, say, Color in the ontology. Subsequently, and having recorded how the column headings have been represented in the ontology, one could load the data into the ABox accordingly, if desired.

The second option of using a spreadsheet is that it can be seen as an easier interface to declare knowledge compared to adding axioms in an ODE such as Protégé, especially for domain experts. The idea works as follows. We have seen several axioms that adhere to a particular pattern, such as \(\texttt{C}\sqsubseteq\texttt{D}\) and \(\texttt{C}\sqsubseteq\exists\texttt{R.D}\), where the former could be called, say, “named class subsumption” and the latter “all-some”. This can be converted into ‘logical macros’, i.e., a non-logician-friendly interface where only the vocabulary is entered into specific fields, and some script behind the scenes does the rest to insert it in the ontology. This is illustrated in the following example.

Example \(\PageIndex{1}\):

Consider Figure 7.3.1. On the left-hand side is a small spreadsheet, with in column A some data that one would want to have converted into classes in the TBox and to be asserted to be subclasses of those values-to-be-turned-intoclasses in column B. That is, it is a table representation of the axiom type \(\texttt{C}\sqsubseteq\texttt{D}\). The script to do that may be, say, a JSON script to process the spreadsheet that, in turn, uses the OLW API to write into the OWL file. Such a script is shown on the right-hand side of the figure. Upon running it, it will add \(\texttt{Lion}\sqsubseteq\texttt{Animal}\) etc. to the ontology, if not already present.

The principle is similar for the data in columns D and E, but then for the “allsome” axiom type \(\texttt{C}\sqsubseteq\exists\texttt{R.D}\). The values in column D will be the class that will have a property declared and in column E what the class in column D is doing (eating, in this case). Looking at it differently: the table consisting of columns D and E amounts to the eats relation and is intended to be converted into the two axioms \(\texttt{Lion}\sqsubseteq\exists\texttt{eats.Impala}\) and \(\texttt{Giraffe}\sqsubseteq\exists\texttt{eats.Twig}\).

One such tool with which one can do this is cellfie² that uses the M² DSL for the transformation [OHWM10]. It still requires one to declare what the axiom pattern should be in the rule line, which could be seen as disadvantage but also as having the advantage to be more flexible. For instance, a table could have three columns, so that a domain expert can add arbitrary object properties in a row, like in row 2 \(\langle\texttt{lion, eats, impala}\rangle\) and in row 3 \(\langle\texttt{lion, drinks, water}\rangle\) and then declare in the rule that the second column has to become an object property in the ontology.

If there are many axioms to add, such an approach likely also will be faster when knowledge has to be added in batch compared to clicking around in the ODE.

Footnotes

²https://github.com/protegeproject/cellfie-plugin