2.1: Introduction

Last updated
Save as PDF

Page ID: 49262

Wikibooks - Data Structures
Wikipedia

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

There is no single data structure that offers optimal performance in every case. In order to choose the best structure for a particular task, we need to be able to judge how long a particular solution will take to run. Or, more accurately, you need to be able to judge how long two solutions will take to run, and choose the better of the two. We don't need to know how many minutes and seconds they will take, but we do need some way to compare algorithms against one another.

Asymptotic complexity is a way of expressing the main component of the cost of an algorithm, using idealized (not comparable) units of computational work. Consider, for example, the algorithm for sorting a deck of cards, which proceeds by repeatedly searching through the deck for the lowest card. The asymptotic complexity of this algorithm is the square of the number of cards in the deck. This quadratic behavior is the main term in the complexity formula, it says, e.g., if you double the size of the deck, then the work is roughly quadrupled.

The exact formula for the cost is more complex, and it contains more details than we need to understand the essential complexity of the algorithm. With our deck of cards, in the worst case, the deck would start out reverse-sorted, so our scans would have to go all the way to the end. The first scan would involve scanning \(52\) cards, the next would take \(51\), etc. So the cost formula is \(52+51+\dots +2\). Generally, letting \(N\) be the number of cards, the formula is \(2+\dots +N\), which equals \(N\times \dfrac {N+1}{2}-1=\dfrac {N^{2}+N}{2}-1=\dfrac {1}{2}\times N^{2}+\dfrac {1}{2}\times N-1\). But the \(N^{2}\) term dominates the expression, and this is what is key for comparing algorithm costs. (This is in fact an expensive algorithm; the best sorting algorithms run in sub-quadratic time.)

Asymptotically speaking, in the limit as \(N\) tends towards infinity, \(2+3+\dots +N\) gets closer and closer to the pure quadratic function \(\dfrac {1}{2}\times N^{2}\). And what difference does the constant factor of \(\dfrac {1}{2}\) make, at this level of abstraction? So the behavior is said to be \(O(n^{2})\).

Now let us consider how we would go about comparing the complexity of two algorithms. Let \(f(n)\) be the cost, in the worst case, of one algorithm, expressed as a function of the input size \(n\), and \(g(n)\) be the cost function for the other algorithm. E.g., for sorting algorithms, \(f(10)\) and \(g(10)\) would be the maximum number of steps that the algorithms would take on a list of \(10\) items. If, for all values of \(n\geq 0\), \(f(n)\) is less than or equal to \(g(n)\), then the algorithm with complexity function \(f\) is strictly faster. But, generally speaking, our concern for computational cost is for the cases with large inputs; so the comparison of \(f(n)\) and \(g(n)\) for small values of \(n\) is less significant than the "long term" comparison of \(f(n)\) and \(g(n)\), for \(n\) larger than some threshold.

Note that we have been speaking about bounds on the performance of algorithms, rather than giving exact speeds. The actual number of steps required to sort our deck of cards (with our naive quadratic algorithm) will depend upon the order in which the cards begin. The actual time to perform each of our steps will depend upon our processor speed, the condition of our processor cache, etc., etc. It's all very complicated in the concrete details, and moreover not relevant to the essence of the algorithm.