13.2: XFastTrie - Searching in Doubly-Logarithmic Time

Last updated
Save as PDF

Page ID: 8491

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

The performance of the BinaryTrie structure is not very impressive. The number of elements, \(\mathtt{n}\), stored in the structure is at most \(2^{\mathtt{w}}\), so \(\log \mathtt{n}\le \mathtt{w}\). In other words, any of the comparison-based SSet structures described in other parts of this book are at least as efficient as a BinaryTrie, and are not restricted to only storing integers.

Next we describe the XFastTrie, which is just a BinaryTrie with \(\mathtt{w+1}\) hash tables--one for each level of the trie. These hash tables are used to speed up the \(\mathtt{find(x)}\) operation to \(O(\log \mathtt{w})\) time. Recall that the \(\mathtt{find(x)}\) operation in a BinaryTrie is almost complete once we reach a node, \(\mathtt{u}\), where the search path for \(\mathtt{x}\) would like to proceed to \(\texttt{u.right}\) (or \(\texttt{u.left}\)) but \(\mathtt{u}\) has no right (respectively, left) child. At this point, the search uses \(\texttt{u.jump}\) to jump to a leaf, \(\mathtt{v}\), of the BinaryTrie and either return \(\mathtt{v}\) or its successor in the linked list of leaves. An XFastTrie speeds up the search process by using binary search on the levels of the trie to locate the node \(\mathtt{u}\).

To use binary search, we need a way to determine if the node \(\mathtt{u}\) we are looking for is above a particular level, \(\mathtt{i}\), of if \(\mathtt{u}\) is at or below level \(\mathtt{i}\). This information is given by the highest-order \(\mathtt{i}\) bits in the binary representation of \(\mathtt{x}\); these bits determine the search path that \(\mathtt{x}\) takes from the root to level \(\mathtt{i}\). For an example, refer to Figure \(\PageIndex{1}\); in this figure the last node, \(\mathtt{u}\), on search path for 14 (whose binary representation is 1110) is the node labelled \(11{\star\star}\) at level 2 because there is no node labelled \(111{\star}\) at level 3. Thus, we can label each node at level \(\mathtt{i}\) with an \(\mathtt{i}\)-bit integer. Then, the node \(\mathtt{u}\) we are searching for would be at or below level \(\mathtt{i}\) if and only if there is a node at level \(\mathtt{i}\) whose label matches the highest-order \(\mathtt{i}\) bits of \(\mathtt{x}\).

Figure \(\PageIndex{1}\): Since there is no node labelled \(111\star\), the search path for 14 (1110) ends at the node labelled \(11{\star\star}\).

In an XFastTrie, we store, for each \(\mathtt{i}\in\{0,\ldots,\mathtt{w}\}\), all the nodes at level \(\mathtt{i}\) in a USet, \(\mathtt{t[i]}\), that is implemented as a hash table (Chapter 5). Using this USet allows us to check in constant expected time if there is a node at level \(\mathtt{i}\) whose label matches the highest-order \(\mathtt{i}\) bits of \(\mathtt{x}\). In fact, we can even find this node using \(\mathtt{t[i]\texttt{.}find(x\text{>>>}(w-i))}\)

The hash tables \(\mathtt{t[0]},\ldots,\mathtt{t[w]}\) allow us to use binary search to find \(\mathtt{u}\). Initially, we know that \(\mathtt{u}\) is at some level \(\mathtt{i}\) with \(0\le \mathtt{i}< \mathtt{w}+1\). We therefore initialize \(\mathtt{l}=0\) and \(\mathtt{h}=\mathtt{w}+1\) and repeatedly look at the hash table \(\mathtt{t[i]}\), where \(\mathtt{i}=\lfloor(\mathtt{l+h})/2\rfloor\). If \(\mathtt{t[i]}\) contains a node whose label matches \(\mathtt{x}\)'s highest-order \(\mathtt{i}\) bits then we set \(\mathtt{l=i}\) ( \(\mathtt{u}\) is at or below level \(\mathtt{i}\)); otherwise we set \(\mathtt{h=i}\) ( \(\mathtt{u}\) is above level \(\mathtt{i}\)). This process terminates when \(\mathtt{h-l}\le 1\), in which case we determine that \(\mathtt{u}\) is at level \(\mathtt{l}\). We then complete the \(\mathtt{find(x)}\) operation using \(\texttt{u.jump}\) and the doubly-linked list of leaves.

    T find(T x) {
        int l = 0, h = w+1, ix = it.intValue(x);
        Node v, u = r, q = newNode();
        while (h-l > 1) {
            int i = (l+h)/2;
            q.prefix = ix >>> w-i;
            if ((v = t[i].find(q)) == null) {
                h = i;
            } else {
                u = v;
                l = i;
            }
        }
        if (l == w) return u.x;
        Node pred = (((ix >>> w-l-1) & 1) == 1) 
                 ? u.jump : u.jump.child[0];
        return (pred.child[next] == dummy) 
                     ? null : pred.child[next].x;
    }

Each iteration of the \(\mathtt{while}\) loop in the above method decreases \(\mathtt{h-l}\) by roughly a factor of two, so this loop finds \(\mathtt{u}\) after \(O(\log \mathtt{w})\) iterations. Each iteration performs a constant amount of work and one \(\mathtt{find(x)}\) operation in a USet, which takes a constant expected amount of time. The remaining work takes only constant time, so the \(\mathtt{find(x)}\) method in an XFastTrie takes only \(O(\log\mathtt{w})\) expected time.

The \(\mathtt{add(x)}\) and \(\mathtt{remove(x)}\) methods for an XFastTrie are almost identical to the same methods in a BinaryTrie. The only modifications are for managing the hash tables \(\mathtt{t[0]}\),..., \(\mathtt{t[w]}\). During the \(\mathtt{add(x)}\) operation, when a new node is created at level \(\mathtt{i}\), this node is added to \(\mathtt{t[i]}\). During a \(\mathtt{remove(x)}\) operation, when a node is removed form level \(\mathtt{i}\), this node is removed from \(\mathtt{t[i]}\). Since adding and removing from a hash table take constant expected time, this does not increase the running times of \(\mathtt{add(x)}\) and \(\mathtt{remove(x)}\) by more than a constant factor. We omit a code listing for \(\mathtt{add(x)}\) and \(\mathtt{remove(x)}\) since the code is almost identical to the (long) code listing already provided for the same methods in a BinaryTrie.

The following theorem summarizes the performance of an XFastTrie:

Theorem \(\PageIndex{1}\).

An XFastTrie implements the SSet interface for \(\mathtt{w}\)-bit integers. An XFastTrie supports the operations

\(\mathtt{add(x)}\) and \(\mathtt{remove(x)}\) in \(O(\mathtt{w})\) expected time per operation and
\(\mathtt{find(x)}\) in \(O(\log \mathtt{w})\) expected time per operation.

The space used by an XFastTrie that stores \(\mathtt{n}\) values is \(O(\mathtt{n}\cdot\mathtt{w})\).