0.01: Implementing abstraction
- Page ID
- 81525
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)Implementing abstraction
In general, abstraction is implemented by what is generically termed an Application Programming Interface (API). API is a somewhat nebulous term that means different things in the context of various programming endeavours. Fundamentally, a programmer designs a set of functions and documents their interface and functionality with the principle that the actual implementation providing the API is opaque.
For example, many large web applications provide an API accessible via HTTP. Accessing data via this method surely triggers many complicated series of remote procedure calls, database queries and data transfers, all of which are opaque to the end user who simply receives the contracted data.
Those familiar with object-oriented languages such as Java, Python or C++ would be familiar with the abstraction provided by classes. Methods provide the interface to the class, but abstract the implementation.
Implementing abstraction with C
A common method used in the Linux kernel and other large C code bases, which lack a built-in concept of object-orientation, is function pointers. Learning to read this idiom is key to navigating most large C code bases. By understanding how to read the abstractions provided within the code an understanding of internal API designs can be built.
#include <stdio.h> /* The API to implement */ struct greet_api { int (*say_hello)(char *name); int (*say_goodbye)(void); }; /* Our implementation of the hello function */ int say_hello_fn(char *name) { printf("Hello %s\n", name); return 0; } /* Our implementation of the goodbye function */ int say_goodbye_fn(void) { printf("Goodbye\n"); return 0; } /* A struct implementing the API */ struct greet_api greet_api = { .say_hello = say_hello_fn, .say_goodbye = say_goodbye_fn }; /* main() doesn't need to know anything about how the * say_hello/goodbye works, it just knows that it does */ int main(int argc, char *argv[]) { greet_api.say_hello(argv[1]); greet_api.say_goodbye(); printf("%p, %p, %p\n", greet_api.say_hello, say_hello_fn, &say_hello_fn); exit(0); }
Code such as the above is the simplest example of constructs used repeatedly throughout the Linux Kernel and other C programs. Let's have a look at some specific elements.
We start out with a structure that defines the API
(struct greet_api
). The
functions whose names are encased in parentheses with a pointer
marker describe a function
pointer[1]. The function pointer describes the
prototype of the function it must point to;
pointing it at a function without the correct return type or
parameters will generate a compiler warning at least; if left in
code will likely lead to incorrect operation or crashes.
We then have our implementation of the API. Often for
more complex functionality you will see an idiom where API
implementation functions will only be a wrapper around other
functions that are conventionally prepended with one or or two
underscores[2]
(i.e. say_hello_fn()
would call
another function
_say_hello_function()
). This
has several uses; generally it relates to having simpler and
smaller parts of the API (marshalling or checking arguments, for
example) separate from more complex implementation, which often
eases the path to significant changes in the internal workings
whilst ensuring the API remains constant. Our implementation is
very simple, however, and doesn't even need it's own support
functions. In various projects, single-, double- or even
triple-underscore function prefixes will mean different things,
but universally it is a visual warning that the function is not
supposed to be called directly from "beyond" the API.
Second to last, we fill out the function pointers in
struct greet_api greet_api
.
The name of the function is a pointer; therefore there is no
need to take the address of the function
(i.e. &say_hello_fn
).
Finally we can call the API functions through the
structure in main
.
You will see this idiom constantly when navigating the
source code. The tiny example below is taken from
include/linux/virtio.h
in the
Linux kernel source to illustrate:
include/linux/virtio.h
/** * virtio_driver - operations for a virtio I/O driver * @driver: underlying device driver (populate name and owner). * @id_table: the ids serviced by this driver. * @feature_table: an array of feature numbers supported by this driver. * @feature_table_size: number of entries in the feature table array. * @probe: the function to call when a device is found. Returns 0 or -errno. * @remove: the function to call when a device is removed. * @config_changed: optional function to call when the device configuration * changes; may be called in interrupt context. */ struct virtio_driver { struct device_driver driver; const struct virtio_device_id *id_table; const unsigned int *feature_table; unsigned int feature_table_size; int (*probe)(struct virtio_device *dev); void (*scan)(struct virtio_device *dev); void (*remove)(struct virtio_device *dev); void (*config_changed)(struct virtio_device *dev); #ifdef CONFIG_PM int (*freeze)(struct virtio_device *dev); int (*restore)(struct virtio_device *dev); #endif };
It's only necessary to vaguely understand that this structure is a description of a virtual I/O device. We can see the user of this API (the device driver author) is expected to provide a number of functions that will be called under various conditions during system operation (when probing for new hardware, when hardware is removed, etc.). It also contains a range of data; structures which should be filled with relevant data.
Starting with descriptors like this is usually the easiest way to begin understanding the various layers of kernel code.
Libraries
Libraries have two roles which illustrate abstraction.
Allow programmers to reuse commonly accessed code.
Act as a black box implementing functionality for the programmer.
For example, a library implementing access to the raw data in JPEG files has both the advantage that the many programs that wish to access image files can all use the same library and the programmers building these programs do not need to worry about the exact details of the JPEG file format, but can concentrate their efforts on what their program wants to do with the image.
The standard library of a UNIX platform is generically
referred to as libc
. It
provides the basic interface to the system: fundamental calls
such as read()
,
write()
and
printf()
. This API is
described in its entirety by a specification called
POSIX
. It is freely
available online and describes the many calls that make up the
standard UNIX API.
Most UNIX platforms broadly follow the POSIX standard, though often differ in small but sometimes important ways (hence the complexity of the various GNU autotools, which often try to abstract away these differences for you). Linux has many interfaces that are not specified by POSIX; writing applications that use them exclusively will make your application less portable.
Libraries are a fundamental abstraction with many details. Later chapters will describe how libraries work in much greater detail.
[1] Often you will see that the names of the parameters are omitted, and only the type of the parameter is specified. This allows the implementer to specify their own parameter names avoiding warnings from the compiler.
[2] A double-underscore function
__foo
may conversationally be
referred to as "dunder foo".