8.1: ScapegoatTree - A Binary Search Tree with Partial Rebuilding

$$\newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} }$$

$$\newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}}$$

$$\newcommand{\id}{\mathrm{id}}$$ $$\newcommand{\Span}{\mathrm{span}}$$

( \newcommand{\kernel}{\mathrm{null}\,}\) $$\newcommand{\range}{\mathrm{range}\,}$$

$$\newcommand{\RealPart}{\mathrm{Re}}$$ $$\newcommand{\ImaginaryPart}{\mathrm{Im}}$$

$$\newcommand{\Argument}{\mathrm{Arg}}$$ $$\newcommand{\norm}[1]{\| #1 \|}$$

$$\newcommand{\inner}[2]{\langle #1, #2 \rangle}$$

$$\newcommand{\Span}{\mathrm{span}}$$

$$\newcommand{\id}{\mathrm{id}}$$

$$\newcommand{\Span}{\mathrm{span}}$$

$$\newcommand{\kernel}{\mathrm{null}\,}$$

$$\newcommand{\range}{\mathrm{range}\,}$$

$$\newcommand{\RealPart}{\mathrm{Re}}$$

$$\newcommand{\ImaginaryPart}{\mathrm{Im}}$$

$$\newcommand{\Argument}{\mathrm{Arg}}$$

$$\newcommand{\norm}[1]{\| #1 \|}$$

$$\newcommand{\inner}[2]{\langle #1, #2 \rangle}$$

$$\newcommand{\Span}{\mathrm{span}}$$ $$\newcommand{\AA}{\unicode[.8,0]{x212B}}$$

$$\newcommand{\vectorA}[1]{\vec{#1}} % arrow$$

$$\newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow$$

$$\newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} }$$

$$\newcommand{\vectorC}[1]{\textbf{#1}}$$

$$\newcommand{\vectorD}[1]{\overrightarrow{#1}}$$

$$\newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}}$$

$$\newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}}$$

$$\newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} }$$

$$\newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}}$$

$$\newcommand{\avec}{\mathbf a}$$ $$\newcommand{\bvec}{\mathbf b}$$ $$\newcommand{\cvec}{\mathbf c}$$ $$\newcommand{\dvec}{\mathbf d}$$ $$\newcommand{\dtil}{\widetilde{\mathbf d}}$$ $$\newcommand{\evec}{\mathbf e}$$ $$\newcommand{\fvec}{\mathbf f}$$ $$\newcommand{\nvec}{\mathbf n}$$ $$\newcommand{\pvec}{\mathbf p}$$ $$\newcommand{\qvec}{\mathbf q}$$ $$\newcommand{\svec}{\mathbf s}$$ $$\newcommand{\tvec}{\mathbf t}$$ $$\newcommand{\uvec}{\mathbf u}$$ $$\newcommand{\vvec}{\mathbf v}$$ $$\newcommand{\wvec}{\mathbf w}$$ $$\newcommand{\xvec}{\mathbf x}$$ $$\newcommand{\yvec}{\mathbf y}$$ $$\newcommand{\zvec}{\mathbf z}$$ $$\newcommand{\rvec}{\mathbf r}$$ $$\newcommand{\mvec}{\mathbf m}$$ $$\newcommand{\zerovec}{\mathbf 0}$$ $$\newcommand{\onevec}{\mathbf 1}$$ $$\newcommand{\real}{\mathbb R}$$ $$\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}$$ $$\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}$$ $$\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}$$ $$\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}$$ $$\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}$$ $$\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}$$ $$\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}$$ $$\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}$$ $$\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}$$ $$\newcommand{\laspan}[1]{\text{Span}\{#1\}}$$ $$\newcommand{\bcal}{\cal B}$$ $$\newcommand{\ccal}{\cal C}$$ $$\newcommand{\scal}{\cal S}$$ $$\newcommand{\wcal}{\cal W}$$ $$\newcommand{\ecal}{\cal E}$$ $$\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}$$ $$\newcommand{\gray}[1]{\color{gray}{#1}}$$ $$\newcommand{\lgray}[1]{\color{lightgray}{#1}}$$ $$\newcommand{\rank}{\operatorname{rank}}$$ $$\newcommand{\row}{\text{Row}}$$ $$\newcommand{\col}{\text{Col}}$$ $$\renewcommand{\row}{\text{Row}}$$ $$\newcommand{\nul}{\text{Nul}}$$ $$\newcommand{\var}{\text{Var}}$$ $$\newcommand{\corr}{\text{corr}}$$ $$\newcommand{\len}[1]{\left|#1\right|}$$ $$\newcommand{\bbar}{\overline{\bvec}}$$ $$\newcommand{\bhat}{\widehat{\bvec}}$$ $$\newcommand{\bperp}{\bvec^\perp}$$ $$\newcommand{\xhat}{\widehat{\xvec}}$$ $$\newcommand{\vhat}{\widehat{\vvec}}$$ $$\newcommand{\uhat}{\widehat{\uvec}}$$ $$\newcommand{\what}{\widehat{\wvec}}$$ $$\newcommand{\Sighat}{\widehat{\Sigma}}$$ $$\newcommand{\lt}{<}$$ $$\newcommand{\gt}{>}$$ $$\newcommand{\amp}{&}$$ $$\definecolor{fillinmathshade}{gray}{0.9}$$

A ScapegoatTree is a BinarySearchTree that, in addition to keeping track of the number, $$\mathtt{n}$$, of nodes in the tree also keeps a counter, $$\mathtt{q}$$, that maintains an upper-bound on the number of nodes.

    int q;


At all times, $$\mathtt{n}$$ and $$\mathtt{q}$$ obey the following inequalities:

$\mathtt{q}/2 \le \mathtt{n} \le \mathtt{q} \enspace . \nonumber$

In addition, a ScapegoatTree has logarithmic height; at all times, the height of the scapegoat tree does not exceed

$\log_{3/2} \mathtt{q} \le \log_{3/2} 2\mathtt{n} < \log_{3/2} \mathtt{n} + 2\enspace .\label{scapegoat-height}$

Even with this constraint, a ScapegoatTree can look surprisingly unbalanced. The tree in Figure $$\PageIndex{1}$$ has $$\mathtt{q}=\mathtt{n}=10$$ and height $$5<\log_{3/2}10 \approx 5.679$$.

Implementing the $$\mathtt{find(x)}$$ operation in a ScapegoatTree is done using the standard algorithm for searching in a BinarySearchTree (see Section 6.2). This takes time proportional to the height of the tree which, by ($$\ref{scapegoat-height}$$) is $$O(\log \mathtt{n})$$.

To implement the $$\mathtt{add(x)}$$ operation, we first increment $$\mathtt{n}$$ and $$\mathtt{q}$$ and then use the usual algorithm for adding $$\mathtt{x}$$ to a binary search tree; we search for $$\mathtt{x}$$ and then add a new leaf $$\mathtt{u}$$ with $$\texttt{u.x}=\mathtt{x}$$. At this point, we may get lucky and the depth of $$\mathtt{u}$$ might not exceed $$\log_{3/2}\mathtt{q}$$. If so, then we leave well enough alone and don't do anything else.

Unfortunately, it will sometimes happen that $$\mathtt{depth(u)} > \log_{3/2}\mathtt{q}$$. In this case, we need to reduce the height. This isn't a big job; there is only one node, namely $$\mathtt{u}$$, whose depth exceeds $$\log_{3/2}\mathtt{q}$$. To fix $$\mathtt{u}$$, we walk from $$\mathtt{u}$$ back up to the root looking for a scapegoat, $$\mathtt{w}$$. The scapegoat, $$\mathtt{w}$$, is a very unbalanced node. It has the property that

$\frac{\texttt{size(w.child)}}{\mathtt{size(w)}} > \frac{2}{3} \enspace ,\label{scapegoat}$

where $$\texttt{w.child}$$ is the child of $$\mathtt{w}$$ on the path from the root to $$\mathtt{u}$$. We'll very shortly prove that a scapegoat exists. For now, we can take it for granted. Once we've found the scapegoat $$\mathtt{w}$$, we completely destroy the subtree rooted at $$\mathtt{w}$$ and rebuild it into a perfectly balanced binary search tree. We know, from ($$\ref{scapegoat}$$), that, even before the addition of $$\mathtt{u}$$, $$\mathtt{w}$$'s subtree was not a complete binary tree. Therefore, when we rebuild $$\mathtt{w}$$, the height decreases by at least 1 so that the height of the ScapegoatTree is once again at most $$\log_{3/2}\mathtt{q}$$.

    boolean add(T x) {
// first do basic insertion keeping track of depth
Node<T> u = newNode(x);
if (d > log32(q)) {
// depth exceeded, find scapegoat
Node<T> w = u.parent;
while (3*size(w) <= 2*size(w.parent))
w = w.parent;
rebuild(w.parent);
}
return d >= 0;
}


If we ignore the cost of finding the scapegoat $$\mathtt{w}$$ and rebuilding the subtree rooted at $$\mathtt{w}$$, then the running time of $$\mathtt{add(x)}$$ is dominated by the initial search, which takes $$O(\log \mathtt{q}) = O(\log \mathtt{n})$$ time. We will account for the cost of finding the scapegoat and rebuilding using amortized analysis in the next section.

The implementation of $$\mathtt{remove(x)}$$ in a ScapegoatTree is very simple. We search for $$\mathtt{x}$$ and remove it using the usual algorithm for removing a node from a BinarySearchTree. (Note that this can never increase the height of the tree.) Next, we decrement $$\mathtt{n}$$, but leave $$\mathtt{q}$$ unchanged. Finally, we check if $$\mathtt{q} > 2\mathtt{n}$$ and, if so, then we rebuild the entire tree into a perfectly balanced binary search tree and set $$\mathtt{q}=\mathtt{n}$$.

    boolean remove(T x) {
if (super.remove(x)) {
if (2*n < q) {
rebuild(r);
q = n;
}
return true;
}
return false;
}


Again, if we ignore the cost of rebuilding, the running time of the $$\mathtt{remove(x)}$$ operation is proportional to the height of the tree, and is therefore $$O(\log \mathtt{n})$$.

$$\PageIndex{1}$$ Analysis of Correctness and Running-Time

In this section, we analyze the correctness and amortized running time of operations on a ScapegoatTree. We first prove the correctness by showing that, when the $$\mathtt{add(x)}$$ operation results in a node that violates Condition ($$\ref{scapegoat-height}$$), then we can always find a scapegoat:

Lemma $$\PageIndex{1}$$.

Let $$\mathtt{u}$$ be a node of depth $$h>\log_{3/2} \mathtt{q}$$ in a ScapegoatTree. Then there exists a node $$\mathtt{w}$$ on the path from $$\mathtt{u}$$ to the root such that

$\frac{\mathtt{size(w)}}{\mathtt{size(parent(w))}} > 2/3 \enspace . \nonumber$

Proof. Suppose, for the sake of contradiction, that this is not the case, and

$\frac{\mathtt{size(w)}}{\mathtt{size(parent(w))}} \le 2/3 \enspace . \nonumber$

for all nodes $$\mathtt{w}$$ on the path from $$\mathtt{u}$$ to the root. Denote the path from the root to $$\mathtt{u}$$ as $$\mathtt{r}=\mathtt{u}_0,\ldots,\mathtt{u}_h=\mathtt{u}$$. Then, we have $$\mathtt{size(u}_0\mathtt{)}=\mathtt{n}$$, $$\mathtt{size(u}_1\mathtt{)}\le\frac{2}{3}\mathtt{n}$$, $$\mathtt{size(u}_2\mathtt{)}\le\frac{4}{9}\mathtt{n}$$ and, more generally,

$\mathtt{size(u}_i\mathtt{)}\le\left(\frac{2}{3}\right)^i\mathtt{n} \enspace . \nonumber$

But this gives a contradiction, since $$\mathtt{size(u)}\ge 1$$, hence

$1 \le \mathtt{size(u)} \le \left(\frac{2}{3}\right)^{h} \mathtt{n}<\left(\frac{2}{3}\right)^{\log _{3 / 2} \mathtt{q}} \mathtt{n} \leq\left(\frac{2}{3}\right)^{\log _{3 / 2} \mathtt{n}} \mathtt{n}=\left(\frac{1}{\mathtt{n}}\right) \mathtt{n} = 1 \enspace . \nonumber$

Next, we analyze the parts of the running time that are not yet accounted for. There are two parts: The cost of calls to $$\mathtt{size(u)}$$ when searching for scapegoat nodes, and the cost of calls to $$\mathtt{rebuild(w)}$$ when we find a scapegoat $$\mathtt{w}$$. The cost of calls to $$\mathtt{size(u)}$$ can be related to the cost of calls to $$\mathtt{rebuild(w)}$$, as follows:

Lemma $$\PageIndex{2}$$.

During a call to $$\mathtt{add(x)}$$ in a ScapegoatTree, the cost of finding the scapegoat $$\mathtt{w}$$ and rebuilding the subtree rooted at $$\mathtt{w}$$ is $$O(\mathtt{size(w)})$$.

Proof. The cost of rebuilding the scapegoat node $$\mathtt{w}$$, once we find it, is $$O(\mathtt{size(w)})$$. When searching for the scapegoat node, we call $$\mathtt{size(u)}$$ on a sequence of nodes $$\mathtt{u}_0,\ldots,\mathtt{u}_k$$ until we find the scapegoat $$\mathtt{u}_k=\mathtt{w}$$. However, since $$\mathtt{u}_k$$ is the first node in this sequence that is a scapegoat, we know that

$\mathtt{size(u}_{i}\mathtt{)} < \frac{2}{3}\mathtt{size(u}_{i+1}\mathtt{)} \nonumber$

for all $$i\in\{0,\ldots,k-2\}$$. Therefore, the cost of all calls to $$\mathtt{size(u)}$$ is

\begin{align} O\left( \sum_{i=0}^k \mathtt{size(u}_{k-i}\mathtt{)} \right) &=O\left(\mathtt{size(u}_k\mathtt{)}+\sum_{i=0}^{k-1} \mathtt{size(u}_{k-i-1}\mathtt{)}\right) \nonumber\\ &=O\left(\mathtt{size(u}_k\mathtt{)}+ \sum_{i=0}^{k-1}\left(\frac{2}{3}\right)^i\mathtt{size(u}_{k}\mathtt{)}\right) \nonumber\\ &=O\left(\mathtt{size(u}_k\mathtt{)}\left(1+\sum_{i=0}^{k-1} \left(\frac{2}{3}\right)^i\right)\right) \nonumber\\ &=O(\mathtt{size(u}_k\mathtt{)}) = O(\mathtt{size(w)}) \enspace , \nonumber\\ \end{align}\nonumber

where the last line follows from the fact that the sum is a geometrically decreasing series.

All that remains is to prove an upper-bound on the cost of all calls to $$\mathtt{rebuild(u)}$$ during a sequence of $$m$$ operations:

Lemma $$\PageIndex{3}$$.

Starting with an empty ScapegoatTree any sequence of $$m$$ $$\mathtt{add(x)}$$ and $$\mathtt{remove(x)}$$ operations causes at most $$O(m\log m)$$ time to be used by $$\mathtt{rebuild(u)}$$ operations.

Proof. To prove this, we will use a credit scheme. We imagine that each node stores a number of credits. Each credit can pay for some constant, $$c$$, units of time spent rebuilding. The scheme gives out a total of $$O(m\log m)$$ credits and every call to $$\mathtt{rebuild(u)}$$ is paid for with credits stored at $$\mathtt{u}$$.

During an insertion or deletion, we give one credit to each node on the path to the inserted node, or deleted node, $$\mathtt{u}$$. In this way we hand out at most $$\log_{3/2}\mathtt{q}\le \log_{3/2}m$$ credits per operation. During a deletion we also store an additional credit "on the side." Thus, in total we give out at most $$O(m\log m)$$ credits. All that remains is to show that these credits are sufficient to pay for all calls to $$\mathtt{rebuild(u)}$$.

If we call $$\mathtt{rebuild(u)}$$ during an insertion, it is because $$\mathtt{u}$$ is a scapegoat. Suppose, without loss of generality, that

$\frac{\texttt{size(u.left)}}{\mathtt{size(u)}} > \frac{2}{3} \enspace . \nonumber$

Using the fact that

$\mathtt{size(u)} = 1 + \texttt{size(u.left)} + \texttt{size(u.right)} \nonumber$

we deduce that

$\frac{1}{2}\texttt{size(u.left)} > \texttt{size(u.right)} \enspace \nonumber$

and therefore

$\texttt{size(u.left)} - \texttt{size(u.right)}>\frac{1}{2}\texttt{size(u.left)} > \frac{1}{3}\mathtt{size(u)} \enspace . \nonumber$

Now, the last time a subtree containing $$\mathtt{u}$$ was rebuilt (or when $$\mathtt{u}$$ was inserted, if a subtree containing $$\mathtt{u}$$ was never rebuilt), we had

$\texttt{size(u.left)} - \texttt{size(u.right)} \le 1 \enspace . \nonumber$

Therefore, the number of $$\mathtt{add(x)}$$ or $$\mathtt{remove(x)}$$ operations that have affected $$\texttt{u.left}$$ or $$\texttt{u.right}$$ since then is at least

$\frac{1}{3}\mathtt{size(u)} - 1 \enspace . \nonumber$

and there are therefore at least this many credits stored at $$\mathtt{u}$$ that are available to pay for the $$O(\mathtt{size(u)})$$ time it takes to call $$\mathtt{rebuild(u)}$$.

If we call $$\mathtt{rebuild(u)}$$ during a deletion, it is because $$\mathtt{q} > 2\mathtt{n}$$. In this case, we have $$\mathtt{q}-\mathtt{n}> \mathtt{n}$$ credits stored "on the side," and we use these to pay for the $$O(\mathtt{n})$$ time it takes to rebuild the root. This completes the proof.

$$\PageIndex{2}$$ Summary

The following theorem summarizes the performance of the ScapegoatTree data structure:

Theorem $$\PageIndex{1}$$.

A ScapegoatTree implements the SSet interface. Ignoring the cost of $$\mathtt{rebuild(u)}$$ operations, a ScapegoatTree supports the operations $$\mathtt{add(x)}$$, $$\mathtt{remove(x)}$$, and $$\mathtt{find(x)}$$ in $$O(\log \mathtt{n})$$ time per operation.

Furthermore, beginning with an empty ScapegoatTree, any sequence of $$m$$ $$\mathtt{add(x)}$$ and $$\mathtt{remove(x)}$$ operations results in a total of $$O(m\log m)$$ time spent during all calls to $$\mathtt{rebuild(u)}$$.

This page titled 8.1: ScapegoatTree - A Binary Search Tree with Partial Rebuilding is shared under a CC BY license and was authored, remixed, and/or curated by Pat Morin (Athabasca University Press) .