# 8.1: ScapegoatTree - A Binary Search Tree with Partial Rebuilding


A ScapegoatTree is a BinarySearchTree that, in addition to keeping track of the number, $$\mathtt{n}$$, of nodes in the tree also keeps a counter, $$\mathtt{q}$$, that maintains an upper-bound on the number of nodes.

    int q;


At all times, $$\mathtt{n}$$ and $$\mathtt{q}$$ obey the following inequalities:

$\mathtt{q}/2 \le \mathtt{n} \le \mathtt{q} \enspace . \nonumber$

In addition, a ScapegoatTree has logarithmic height; at all times, the height of the scapegoat tree does not exceed

$\log_{3/2} \mathtt{q} \le \log_{3/2} 2\mathtt{n} < \log_{3/2} \mathtt{n} + 2\enspace .\label{scapegoat-height}$

Even with this constraint, a ScapegoatTree can look surprisingly unbalanced. The tree in Figure $$\PageIndex{1}$$ has $$\mathtt{q}=\mathtt{n}=10$$ and height $$5<\log_{3/2}10 \approx 5.679$$.

Implementing the $$\mathtt{find(x)}$$ operation in a ScapegoatTree is done using the standard algorithm for searching in a BinarySearchTree (see Section 6.2). This takes time proportional to the height of the tree which, by ($$\ref{scapegoat-height}$$) is $$O(\log \mathtt{n})$$.

To implement the $$\mathtt{add(x)}$$ operation, we first increment $$\mathtt{n}$$ and $$\mathtt{q}$$ and then use the usual algorithm for adding $$\mathtt{x}$$ to a binary search tree; we search for $$\mathtt{x}$$ and then add a new leaf $$\mathtt{u}$$ with $$\texttt{u.x}=\mathtt{x}$$. At this point, we may get lucky and the depth of $$\mathtt{u}$$ might not exceed $$\log_{3/2}\mathtt{q}$$. If so, then we leave well enough alone and don't do anything else.

Unfortunately, it will sometimes happen that $$\mathtt{depth(u)} > \log_{3/2}\mathtt{q}$$. In this case, we need to reduce the height. This isn't a big job; there is only one node, namely $$\mathtt{u}$$, whose depth exceeds $$\log_{3/2}\mathtt{q}$$. To fix $$\mathtt{u}$$, we walk from $$\mathtt{u}$$ back up to the root looking for a scapegoat, $$\mathtt{w}$$. The scapegoat, $$\mathtt{w}$$, is a very unbalanced node. It has the property that

$\frac{\texttt{size(w.child)}}{\mathtt{size(w)}} > \frac{2}{3} \enspace ,\label{scapegoat}$

where $$\texttt{w.child}$$ is the child of $$\mathtt{w}$$ on the path from the root to $$\mathtt{u}$$. We'll very shortly prove that a scapegoat exists. For now, we can take it for granted. Once we've found the scapegoat $$\mathtt{w}$$, we completely destroy the subtree rooted at $$\mathtt{w}$$ and rebuild it into a perfectly balanced binary search tree. We know, from ($$\ref{scapegoat}$$), that, even before the addition of $$\mathtt{u}$$, $$\mathtt{w}$$'s subtree was not a complete binary tree. Therefore, when we rebuild $$\mathtt{w}$$, the height decreases by at least 1 so that the height of the ScapegoatTree is once again at most $$\log_{3/2}\mathtt{q}$$.

    boolean add(T x) {
// first do basic insertion keeping track of depth
Node<T> u = newNode(x);
int d = addWithDepth(u);
if (d > log32(q)) {
// depth exceeded, find scapegoat
Node<T> w = u.parent;
while (3*size(w) <= 2*size(w.parent))
w = w.parent;
rebuild(w.parent);
}
return d >= 0;
}


If we ignore the cost of finding the scapegoat $$\mathtt{w}$$ and rebuilding the subtree rooted at $$\mathtt{w}$$, then the running time of $$\mathtt{add(x)}$$ is dominated by the initial search, which takes $$O(\log \mathtt{q}) = O(\log \mathtt{n})$$ time. We will account for the cost of finding the scapegoat and rebuilding using amortized analysis in the next section.

The implementation of $$\mathtt{remove(x)}$$ in a ScapegoatTree is very simple. We search for $$\mathtt{x}$$ and remove it using the usual algorithm for removing a node from a BinarySearchTree. (Note that this can never increase the height of the tree.) Next, we decrement $$\mathtt{n}$$, but leave $$\mathtt{q}$$ unchanged. Finally, we check if $$\mathtt{q} > 2\mathtt{n}$$ and, if so, then we rebuild the entire tree into a perfectly balanced binary search tree and set $$\mathtt{q}=\mathtt{n}$$.

    boolean remove(T x) {
if (super.remove(x)) {
if (2*n < q) {
rebuild(r);
q = n;
}
return true;
}
return false;
}


Again, if we ignore the cost of rebuilding, the running time of the $$\mathtt{remove(x)}$$ operation is proportional to the height of the tree, and is therefore $$O(\log \mathtt{n})$$.

## $$\PageIndex{1}$$ Analysis of Correctness and Running-Time

In this section, we analyze the correctness and amortized running time of operations on a ScapegoatTree. We first prove the correctness by showing that, when the $$\mathtt{add(x)}$$ operation results in a node that violates Condition ($$\ref{scapegoat-height}$$), then we can always find a scapegoat:

Lemma $$\PageIndex{1}$$.

Let $$\mathtt{u}$$ be a node of depth $$h>\log_{3/2} \mathtt{q}$$ in a ScapegoatTree. Then there exists a node $$\mathtt{w}$$ on the path from $$\mathtt{u}$$ to the root such that

$\frac{\mathtt{size(w)}}{\mathtt{size(parent(w))}} > 2/3 \enspace . \nonumber$

Proof. Suppose, for the sake of contradiction, that this is not the case, and

$\frac{\mathtt{size(w)}}{\mathtt{size(parent(w))}} \le 2/3 \enspace . \nonumber$

for all nodes $$\mathtt{w}$$ on the path from $$\mathtt{u}$$ to the root. Denote the path from the root to $$\mathtt{u}$$ as $$\mathtt{r}=\mathtt{u}_0,\ldots,\mathtt{u}_h=\mathtt{u}$$. Then, we have $$\mathtt{size(u}_0\mathtt{)}=\mathtt{n}$$, $$\mathtt{size(u}_1\mathtt{)}\le\frac{2}{3}\mathtt{n}$$, $$\mathtt{size(u}_2\mathtt{)}\le\frac{4}{9}\mathtt{n}$$ and, more generally,

$\mathtt{size(u}_i\mathtt{)}\le\left(\frac{2}{3}\right)^i\mathtt{n} \enspace . \nonumber$

But this gives a contradiction, since $$\mathtt{size(u)}\ge 1$$, hence

$1 \le \mathtt{size(u)} \le \left(\frac{2}{3}\right)^{h} \mathtt{n}<\left(\frac{2}{3}\right)^{\log _{3 / 2} \mathtt{q}} \mathtt{n} \leq\left(\frac{2}{3}\right)^{\log _{3 / 2} \mathtt{n}} \mathtt{n}=\left(\frac{1}{\mathtt{n}}\right) \mathtt{n} = 1 \enspace . \nonumber$

Next, we analyze the parts of the running time that are not yet accounted for. There are two parts: The cost of calls to $$\mathtt{size(u)}$$ when searching for scapegoat nodes, and the cost of calls to $$\mathtt{rebuild(w)}$$ when we find a scapegoat $$\mathtt{w}$$. The cost of calls to $$\mathtt{size(u)}$$ can be related to the cost of calls to $$\mathtt{rebuild(w)}$$, as follows:

Lemma $$\PageIndex{2}$$.

During a call to $$\mathtt{add(x)}$$ in a ScapegoatTree, the cost of finding the scapegoat $$\mathtt{w}$$ and rebuilding the subtree rooted at $$\mathtt{w}$$ is $$O(\mathtt{size(w)})$$.

Proof. The cost of rebuilding the scapegoat node $$\mathtt{w}$$, once we find it, is $$O(\mathtt{size(w)})$$. When searching for the scapegoat node, we call $$\mathtt{size(u)}$$ on a sequence of nodes $$\mathtt{u}_0,\ldots,\mathtt{u}_k$$ until we find the scapegoat $$\mathtt{u}_k=\mathtt{w}$$. However, since $$\mathtt{u}_k$$ is the first node in this sequence that is a scapegoat, we know that

$\mathtt{size(u}_{i}\mathtt{)} < \frac{2}{3}\mathtt{size(u}_{i+1}\mathtt{)} \nonumber$

for all $$i\in\{0,\ldots,k-2\}$$. Therefore, the cost of all calls to $$\mathtt{size(u)}$$ is

\begin{align} O\left( \sum_{i=0}^k \mathtt{size(u}_{k-i}\mathtt{)} \right) &=O\left(\mathtt{size(u}_k\mathtt{)}+\sum_{i=0}^{k-1} \mathtt{size(u}_{k-i-1}\mathtt{)}\right) \nonumber\\ &=O\left(\mathtt{size(u}_k\mathtt{)}+ \sum_{i=0}^{k-1}\left(\frac{2}{3}\right)^i\mathtt{size(u}_{k}\mathtt{)}\right) \nonumber\\ &=O\left(\mathtt{size(u}_k\mathtt{)}\left(1+\sum_{i=0}^{k-1} \left(\frac{2}{3}\right)^i\right)\right) \nonumber\\ &=O(\mathtt{size(u}_k\mathtt{)}) = O(\mathtt{size(w)}) \enspace , \nonumber\\ \end{align}\nonumber

where the last line follows from the fact that the sum is a geometrically decreasing series.

All that remains is to prove an upper-bound on the cost of all calls to $$\mathtt{rebuild(u)}$$ during a sequence of $$m$$ operations:

Lemma $$\PageIndex{3}$$.

Starting with an empty ScapegoatTree any sequence of $$m$$ $$\mathtt{add(x)}$$ and $$\mathtt{remove(x)}$$ operations causes at most $$O(m\log m)$$ time to be used by $$\mathtt{rebuild(u)}$$ operations.

Proof. To prove this, we will use a credit scheme. We imagine that each node stores a number of credits. Each credit can pay for some constant, $$c$$, units of time spent rebuilding. The scheme gives out a total of $$O(m\log m)$$ credits and every call to $$\mathtt{rebuild(u)}$$ is paid for with credits stored at $$\mathtt{u}$$.

During an insertion or deletion, we give one credit to each node on the path to the inserted node, or deleted node, $$\mathtt{u}$$. In this way we hand out at most $$\log_{3/2}\mathtt{q}\le \log_{3/2}m$$ credits per operation. During a deletion we also store an additional credit "on the side." Thus, in total we give out at most $$O(m\log m)$$ credits. All that remains is to show that these credits are sufficient to pay for all calls to $$\mathtt{rebuild(u)}$$.

If we call $$\mathtt{rebuild(u)}$$ during an insertion, it is because $$\mathtt{u}$$ is a scapegoat. Suppose, without loss of generality, that

$\frac{\texttt{size(u.left)}}{\mathtt{size(u)}} > \frac{2}{3} \enspace . \nonumber$

Using the fact that

$\mathtt{size(u)} = 1 + \texttt{size(u.left)} + \texttt{size(u.right)} \nonumber$

we deduce that

$\frac{1}{2}\texttt{size(u.left)} > \texttt{size(u.right)} \enspace \nonumber$

and therefore

$\texttt{size(u.left)} - \texttt{size(u.right)}>\frac{1}{2}\texttt{size(u.left)} > \frac{1}{3}\mathtt{size(u)} \enspace . \nonumber$

Now, the last time a subtree containing $$\mathtt{u}$$ was rebuilt (or when $$\mathtt{u}$$ was inserted, if a subtree containing $$\mathtt{u}$$ was never rebuilt), we had

$\texttt{size(u.left)} - \texttt{size(u.right)} \le 1 \enspace . \nonumber$

Therefore, the number of $$\mathtt{add(x)}$$ or $$\mathtt{remove(x)}$$ operations that have affected $$\texttt{u.left}$$ or $$\texttt{u.right}$$ since then is at least

$\frac{1}{3}\mathtt{size(u)} - 1 \enspace . \nonumber$

and there are therefore at least this many credits stored at $$\mathtt{u}$$ that are available to pay for the $$O(\mathtt{size(u)})$$ time it takes to call $$\mathtt{rebuild(u)}$$.

If we call $$\mathtt{rebuild(u)}$$ during a deletion, it is because $$\mathtt{q} > 2\mathtt{n}$$. In this case, we have $$\mathtt{q}-\mathtt{n}> \mathtt{n}$$ credits stored "on the side," and we use these to pay for the $$O(\mathtt{n})$$ time it takes to rebuild the root. This completes the proof.

## $$\PageIndex{2}$$ Summary

The following theorem summarizes the performance of the ScapegoatTree data structure:

Theorem $$\PageIndex{1}$$.

A ScapegoatTree implements the SSet interface. Ignoring the cost of $$\mathtt{rebuild(u)}$$ operations, a ScapegoatTree supports the operations $$\mathtt{add(x)}$$, $$\mathtt{remove(x)}$$, and $$\mathtt{find(x)}$$ in $$O(\log \mathtt{n})$$ time per operation.

Furthermore, beginning with an empty ScapegoatTree, any sequence of $$m$$ $$\mathtt{add(x)}$$ and $$\mathtt{remove(x)}$$ operations results in a total of $$O(m\log m)$$ time spent during all calls to $$\mathtt{rebuild(u)}$$.

8.1: ScapegoatTree - A Binary Search Tree with Partial Rebuilding is shared under a CC BY license and was authored, remixed, and/or curated by Pat Morin.