5.8: B Trees

Last updated
Save as PDF

Page ID: 47907

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\id}{\mathrm{id}}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\kernel}{\mathrm{null}\,}\)

\( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\)

\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\)

\( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)

\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)

\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vectorC}[1]{\textbf{#1}} \)

\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

Summary

Whereas binary trees have nodes that have two children, with the left child and all of its descendants less than the "value" of the node, and the right child and all of its children more than the "value" of the node, a B-tree is a generalization of this.
The generalization is that instead of one value, the node has a list of values, and the list is of size n ( n > 2 ). n is chosen to optimize storage , so that a node corresponds in size to a block for instance. This is in the days before ssd drives, but searching binary nodes stored on ssd ram would still be slower than searching ssd ram for a block of values, loading into normal ram and cpu cache, and searching the loaded list.
At the start of the list, the left child of the first element of the list has a value less than the first element , and so do all its children. To the right of the first element, is a child which has values more than the first element's value, as do all of its children, but also less than the value of the second element. Induction can be used , and this holds so for the child between element 1 and 2, 2 and 3, ... so on until n-1 and nth node.
To insert into a non-full B tree node, is to do a insertion into a sorted list.
In a B+ tree, insertions can only be done in leaf nodes, and non-leaf nodes hold copies of a demarcating value between adjacent child nodes e.g. the left most value of an element's right child's list of nodes.
Whenever a list becomes full e.g. there are n nodes, the node is "split", and this means making two new nodes, and passing the demarcating value upto the parent.

B Trees were described originally as generalizations of binary search trees , where a binary tree is a 2-node B-Tree, the 2 standing for two children, with 2-1 = 1 key separating the 2 children. Hence a 3-node has 2 values separating 3 children, and a N node has N children separated by N-1 keys.

A classical B-Tree can have N-node internal nodes, and empty 2-nodes as leaf nodes, or more conveniently, the children can either be a value or a pointer to the next N-node, so it is a union.

The main idea with B-trees is that one starts with a root N-node , which is able to hold N-1 entries, but on the Nth entry, the number of keys for the node is exhausted, and the node can be split into two half sized N/2 sized N nodes, separated by a single key K, which is equal to the right node's leftmost key, so any entry with key K2 equal or greater than K goes in the right node, and anything less than K goes in the left. When the root node is split, a new root node is created with one key, and a left child and a right child. Since there are N children but only N-1 entries, the leftmost child is stored as a separate pointer. If the leftmost pointer splits, then the left half becomes the new leftmost pointer, and the right half and separating key is inserted into the front of the entries.

An alternative is the B+ tree which is the most commonly used in database systems, because only values are stored in leaf nodes, whereas internal nodes only store keys and pointers to other nodes, putting a limit on the size of the datum value as the size of a pointer. This often allows internal nodes with more entries able to fit a certain block size, e.g. 4K is a common physical disc block size. Hence , if a B+ tree internal node is aligned to a physical disc block, then the main rate limiting factor of reading a block of a large index from disc because it isn't cached in a memory list of blocks is reduced to one block read.

A B+ tree has bigger internal nodes, so is wider and shorter in theory than an equivalent B tree which must fit all nodes within a given physical block size, hence overall it is a faster index due to greater fan out and less height to reach keys on average.

Apparently, this fan out is so important, compression can also be applied to the blocks to increase the number of entries fitting within a given underlying layer's block size (the underlying layer is often a filesystem block).

Most database systems use the B+ tree algorithm, including postgresql, mysql, derbydb, firebird, many Xbase index types, etc.

Many filesystems also use a B+ tree to manage their block layout (e.g. xfs, NTFS, etc).

Transwiki has a java implementation of a B+ Tree which uses traditional arrays as key list and value list.

Below is an example of a B Tree with test driver, and a B+ tree with a test driver. The memory / disc management is not included, but a usable hacked example can be found at Hashing Memory Checking Example.

This B+ tree implementation was written out of the B Tree, and the difference from the transwiki B+ tree is that it tries to use the semantics of SortedMap and SortedSet already present in the standard Java collections library.

Hence , the flat leaf block list of this B+ implementation can't contain blocks that don't contain any data, because the ordering depends on the first key of the entries, so a leaf block needs to be created with its first entry.

A B tree java example

package btreemap;

import java.util.ArrayList;
import java.util.Collection;
import java.util.Collections;
import java.util.Comparator;
import java.util.List;
import java.util.Map;
import java.util.Set;
import java.util.SortedMap;
import java.util.TreeMap;

/** can't work without setting a comparator */
public class BTreeMap<K, V> implements SortedMap<K, V> {

	private static final int NODE_SIZE = 100;
	
	

	@Override
	public Comparator<? super K> comparator() {
		// TODO Auto-generated method stub

		return comparator;
	}

	Comparator< ? super K> defaultComparator = new
	Comparator< K>() {

		@Override
		public int compare(K o1, K o2) {
			// TODO Auto-generated method stub
			Comparable c1 = (Comparable)o1;
			Comparable c2 = (Comparable)o2;
			return c1.compareTo(c2);
		}
	};
	Comparator<? super K> comparator = defaultComparator;
	BTBlock<K, V> root = new BTBlock<K, V>(NODE_SIZE, comparator);
	/**
	 * 
	 * @param comparator
	 *      - this is mandatory for the tree to work
	 */
	public void setComparator(Comparator<? super K> comparator) {
		this.comparator = comparator;
		root = new BTBlock<K, V>(NODE_SIZE, comparator);
	}

	/**
	 * 
	 * 
	 * 
	 * @param <K>
	 * @param <V>
	 *      the entry is associated with a right child block.
	 * 
	 */
	static class BlockEntry<K, V> {
		K k;
		V v;
		BTBlock left;

		BlockEntry() {

		}

		BlockEntry(K k, V v) {
			left = null;
			this.k = k;
			this.v = v;
		}
	}

	/**
	 * 
	 * - this represents the result of splitting a full block into
	 *     a left block, and a right block, and a median key, the right
	 *     block and the median key held in a BlockEntry structure as above.
	 * @param <K>
	 * @param <V>
	 * @param <V>g
	 */
	static class SplitRootEntry<K, V> {
		BTBlock<K, V> right;
		BlockEntry<K, V> entry;

		SplitRootEntry(BlockEntry<K, V> entry, BTBlock<K, V> right) {
			this.right = right;
			this.entry = entry;
		}

		SplitRootEntry() {
			super();
		}
	}

	/**
	 * this is used to return a result of a possible split , during recursive
	 * calling.
	 * 
	 * 
	 * 
	 * @param <K>
	 * @param <V>
	 */
	static class resultAndSplit<K, V> {

		/**
		 * null , if there is no split.
		 */
		SplitRootEntry<K, V> splitNode;
		V v;

		resultAndSplit(V v) {
			this.v = v;
		}

		resultAndSplit(SplitRootEntry<K, V> entry, V v) {
			this.v = v;
			this.splitNode = entry;
		}
	}

	/**
	 * used to represent the insertion point after searching a block if compare
	 * is zero, then a match was found, and pos is the insertion entry if
	 * compare < 0 and pos == 0 , then the search ended up less than the
	 * leftmost entry else compare > 0 , and the search will be to the immediate
	 * right of pos.
	 * 
	 * 
	 * 
	 */
	static class PosAndCompare {
		int pos = 0;
		int compare = 0;
	}

	static class BTBlock<K, V> {
		List<BlockEntry<K, V>> entries;
		BTBlock<K, V> rightBlock = null;
		private int maxSz = 0;

		Comparator<? super K> comparator;

		Comparator<? super K> comparator() {
			return comparator;
		}

		public BTBlock(int size, Comparator<? super K> c) {
			entries = new ArrayList<BlockEntry<K, V>>();
			maxSz = size;
			this.comparator = c;
		}

		/**
		 * PosAndCompare usage: if compare is zero, then a match was found, and
		 * pos is the insertion entry if compare < 0 and pos == 0 , then the
		 * search ended up less than the leftmost entry else compare > 0 , and
		 * the search will be to the immediate right of pos.
		 * 
		 * 
		 * 
		 */
		// private void blockSearch(K k, PosAndCompare pc) {
		// for (int i = 0; i < entries.size(); ++i) {
		// pc.compare = comparator.compare(k, entries.get(i).k);
		// if (pc.compare == 0) {
		// pc.pos = i;
		// return;
		// }
		// if (pc.compare < 0 && i == 0) {
		// pc.pos = 0;
		// return;
		// }
		//
		// if (pc.compare < 0) {
		// pc.pos = i - 1;
		// pc.compare = 1;
		// return;
		// }
		//
		// }
		// pc.pos = entries.size() - 1;
		// pc.compare = 1;
		//
		// // binary search, it's hard to get it right !
		// // int left = 0;
		// // int right = entries.size();
		// //
		// // while (left <= right && left < entries.size()) {
		// // // pc.pos = (right - left) / 2 + left;
		// // pc.pos = (left + right) / 2;
		// // pc.compare = comparator().compare(k, entries.get(pc.pos).k);
		// // if (pc.compare < 0) {
		// // right = pc.pos - 1;
		// // } else if (pc.compare > 0) {
		// // left = pc.pos + 1;
		// // } else {
		// // return;
		// // }
		// // }
		// //
		// // BlockEntry<K, V> e = new BlockEntry<K, V>(k, null);
		// // pc.pos = Collections.binarySearch(entries, e, cmp);
		//
		// }

		Comparator<BlockEntry<K, V>> cmp = new Comparator<BlockEntry<K, V>>() {

			@Override
			public int compare(BlockEntry<K, V> o1, BlockEntry<K, V> o2) {
				// TODO Auto-generated method stub
				return comparator.compare(o1.k, o2.k);

			}
		};

		resultAndSplit<K, V> put(K k, V v) {
			V v2;
			if (entries.size() == 0) {
				entries.add(new BlockEntry<K, V>(k, v));
				return new resultAndSplit<K, V>(v);
			} else {

				// PosAndCompare pc = new PosAndCompare();
				BlockEntry<K, V> e = new BlockEntry<K, V>(k, v);
				int res = Collections.binarySearch(entries, e, cmp);
				int index = -res - 1;
				// blockSearch(k, pc);

				// a match
				if (res >= 0) {
					v2 = entries.get(res).v;
					entries.get(res).v = v;
					return new resultAndSplit<K, V>(v2);
				}

				// follow leftBlock if search is to left of first entry

				if (index == entries.size() && rightBlock != null) {

					resultAndSplit<K, V> result = rightBlock.put(k, v);
					if (result.splitNode != null) {
						rightBlock = result.splitNode.right;
						entries.add(result.splitNode.entry);
					}
				} else if (index == entries.size() && rightBlock == null
						&& entries.size() == maxSz) {
					rightBlock = new BTBlock<K, V>(this.maxSz, comparator);
					resultAndSplit<K, V> result = rightBlock.put(k, v);

				} else if (index < entries.size()
						&& entries.get(index).left != null) {
					// follow right block if it exists
					resultAndSplit<K, V> result = entries.get(index).left.put(
							k, v);
					if (result.splitNode != null) {
						entries.get(index).left = result.splitNode.right;
						// add to the left
						entries.add(index, result.splitNode.entry);

					}

				} else {
					// add to the left
					entries.add(index, e);

				}

				// check if overflowed block , split if it has.
				if (entries.size() > maxSz) {
					int mid = entries.size() / 2;

					// copy right half to new entries list.
					List<BlockEntry<K, V>> leftEntries = new ArrayList<BlockEntry<K, V>>();
					for (int i = 0; i < mid; ++i) {
						leftEntries.add(entries.get(i));
					}

					BlockEntry<K, V> centreEntry = entries.get(mid);

					BTBlock<K, V> leftBlock = new BTBlock<K, V>(maxSz,
							comparator);

					leftBlock.entries = leftEntries;

					// the median entry's left block is the new left block's
					// leftmost block
					leftBlock.rightBlock = centreEntry.left;
					// the new right block becomes the right block
					centreEntry.left = leftBlock;

					// reduce the old block's entries into its left half of
					// entries.
					ArrayList<BlockEntry<K, V>> newEntries = new ArrayList<BlockEntry<K, V>>();
					for (int i = mid + 1; i < entries.size(); ++i)
						newEntries.add(entries.get(i));
					this.entries = newEntries;
					// create a return object, with the reduced old block as the
					// left block
					// and the median entry with the new right block attached

					SplitRootEntry<K, V> split = new SplitRootEntry<K, V>(
							centreEntry, this);

					// the keyed value didn't exist before , so null
					return new resultAndSplit<K, V>(split, null);

				}
				return new resultAndSplit<K, V>(v);
			}

		}

		V get(K k) {

			if (entries.size() == 0)
				return null;

			BlockEntry<K, V> e = new BlockEntry<K, V>(k, null);
			int res = Collections.binarySearch(entries, e, cmp);
			int index = -res - 1;

			if (res >= 0) {
				return entries.get(res).v;
			}

			if (index == entries.size() && rightBlock != null) {
				return rightBlock.get(k);
			} else if (index < entries.size()
					&& entries.get(index).left != null) {
				return (V) entries.get(index).left.get(k);
			} else
				return null;
		}

		void getRange(SortedMap map, K k1, K k2) {
			BlockEntry<K, V> e = new BlockEntry<K, V>(k1, null);
			int res = Collections.binarySearch(entries, e, cmp);
			int index = -res - 1;
			BlockEntry<K, V> e2 = new BlockEntry<K, V>(k2, null);
			int res2 = Collections.binarySearch(entries, e2, cmp);
			int index2 = -res2 - 1;

			int from = res >= 0 ? res : index;
			int to = res2 >= 0 ? res2 : index2;

			for (int i = from; i <= to; ++i) {

				if (i < entries.size() && (i > from || res < 0)
						&& entries.get(i).left != null) {
					entries.get(i).left.getRange(map, k1, k2);
					// if (pc2.pos == pc.pos)
					// break;
				}
				if (i < to || res2 >= 0)
					map.put(entries.get(i).k, entries.get(i).v);

				if (i == entries.size() && rightBlock != null) {
					rightBlock.getRange(map, k1, k2);
				}

			}

		}

		K headKey() {
			if (rightBlock != null) {
				return rightBlock.headKey();
			}
			return entries.get(0).k;
		}

		K tailKey() {
			int i = entries.size() - 1;
			if (entries.get(i).left != null) {
				return (K) entries.get(i).left.tailKey();
			}
			return entries.get(i).k;
		}

		void show(int n) {
			showTabs(n);
			for (int i = 0; i < entries.size(); ++i) {
				BlockEntry<K, V> e = entries.get(i);
				System.err.print("#" + i + ":(" + e.k + ":" + e.v + ") ");
			}
			System.err.println();
			showTabs(n);

			if (rightBlock != null) {
				System.err.print("Left Block\n");
				rightBlock.show(n + 1);
			} else {
				System.err.println("No Left Block");
			}

			for (int i = 0; i < entries.size(); ++i) {
				BlockEntry<K, V> e = entries.get(i);
				showTabs(n);
				if (e.left != null) {

					System.err.println("block right of #" + i);
					e.left.show(n + 1);
				} else {
					System.err.println("No block right of #" + i);
				}
			}
			showTabs(n);
			System.err.println("End of Block Info\n\n");
		}

		private void showTabs(int n) {
			// TODO Auto-generated method stub
			for (int i = 0; i < n; ++i) {
				System.err.print(" ");
			}
		}

	}

	@Override
	public SortedMap<K, V> subMap(K fromKey, K toKey) {
		TreeMap<K, V> map = new TreeMap<K, V>();
		root.getRange(map, fromKey, toKey);
		return map;
	}

	@Override
	public SortedMap<K, V> headMap(K toKey) {
		// TODO Auto-generated method stub
		return subMap(root.headKey(), toKey);
	};

	@Override
	public SortedMap<K, V> tailMap(K fromKey) {
		// TODO Auto-generated method stub
		return subMap(fromKey, root.tailKey());
	}

	@Override
	public K firstKey() {
		// TODO Auto-generated method stub
		return root.headKey();
	}

	@Override
	public K lastKey() {
		// TODO Auto-generated method stub
		return root.tailKey();
	}

	@Override
	public int size() {
		// TODO Auto-generated method stub
		return 0;
	}

	@Override
	public boolean isEmpty() {
		// TODO Auto-generated method stub
		return false;
	}

	@Override
	public boolean containsKey(Object key) {
		// TODO Auto-generated method stub
		return get(key) != null;
	}

	@Override
	public boolean containsValue(Object value) {
		// TODO Auto-generated method stub
		return false;
	}

	@Override
	public V get(Object key) {
		// TODO Auto-generated method stub
		return root.get((K) key);
	}

	@Override
	public V put(K key, V value) {
		resultAndSplit<K, V> b = root.put(key, value);
		if (b.splitNode != null) {
			root = new BTBlock<K, V>(root.maxSz, root.comparator);
			root.rightBlock = b.splitNode.right;
			root.entries.add(b.splitNode.entry);
		}
		return b.v;
	}

	@Override
	public V remove(Object key) {
		// TODO Auto-generated method stub
		return null;
	}

	@Override
	public void putAll(Map<? extends K, ? extends V> m) {
		// TODO Auto-generated method stub

	}

	@Override
	public void clear() {
		// TODO Auto-generated method stub

	}

	@Override
	public Set<K> keySet() {
		// TODO Auto-generated method stub
		return null;
	}

	@Override
	public Collection<V> values() {
		// TODO Auto-generated method stub
		return null;
	}

	@Override
	public Set<java.util.Map.Entry<K, V>> entrySet() {
		// TODO Auto-generated method stub
		return null;
	}

}

package btreemap;

import java.util.ArrayList;
import java.util.Comparator;
import java.util.List;
import java.util.Random;

public class TestBtree {

	private static final int N = 50000;

	public static void main(String[] args) {
		BTreeMap<Integer, Integer> map = new BTreeMap<Integer , Integer>();
		Random r = new Random();
		
		ArrayList<Integer> t = new ArrayList<Integer>();
		
		Comparator<Integer> comparator = new Comparator<Integer>() {

			@Override
			public int compare(Integer o1, Integer o2) {
				// TODO Auto-generated method stub
				return o1.intValue() - o2.intValue();
			}
			
		};
		map.setComparator(comparator);
		List<Integer> testVals = new ArrayList<Integer>();
		for (int i =0 ; i < N ; ++i) {
			testVals.add(i);
		}
		for (int i = 0; i < N; ++i ) {
			int x = r.nextInt(testVals.size());
			 x = testVals.remove(x);
			//int x=i;
			t.add(x);
			map.put(x, x);
		}
		
		System.err.println("output " + N + " vals");
		
		map.root.show(0);
		
		for (int i = 0; i < N; ++i) {
			int x = t.get(i);
			
			if ( x != map.get(x)) 
				System.err.println("Expecting " + x + " got " + map.get(x));
			
			
		}
		System.err.println("Checked " + N + " entries");
	}
}

A B+ tree java example

Experiments include timing the run (e.g. time java -cp . btreemap.BPlusTreeTest1), using an external blocksize + 1 sized leaf block size, so that this is basically the underlying entries TreeMap only, vs, say, 400 internal node size, and 200 external node size. Other experiments include using a SkipListMap instead of a TreeMap.

package btreemap;

import java.util.Collection;
import java.util.Comparator;
import java.util.Map;
import java.util.Set;
import java.util.SortedMap;
import java.util.SortedSet;
import java.util.TreeMap;
import java.util.TreeSet;
/**
 * a B+ tree, where leaf blocks contain key-value pairs and 
 * internal blocks have keyed entries pointing to other internal blocks
 * or leaf blocks whose keys are greater than or equal to the associated key.
 * 
 * @author syan
 *
 * @param <K> key type implements Comparable
 * @param <V> value type
 */
public class BPlusTreeMap<K, V> implements SortedMap<K, V> {

	private int maxInternalBlockSize;
	BPlusAnyBlock<K, V> root;
	private int maxExternalSize;

	BPlusTreeMap(int maxInternalSize, int maxExternalSize) {
		this.maxInternalBlockSize = maxInternalSize;
		this.maxExternalSize = maxExternalSize;

	}

	static class SplitOrValue<K, V> {
		V v;
		K k;
		BPlusAnyBlock<K, V> left, right;
		boolean split;

		public boolean isSplit() {
			return split;
		}

		public void setSplit(boolean split) {
			this.split = split;
		}

		public SplitOrValue(V value) {
			v = value;
			setSplit(false);
		}

		public SplitOrValue(BPlusAnyBlock<K, V> left2,
				BPlusAnyBlock<K, V> bPlusBlock) {
			left = left2;
			right = bPlusBlock;
			k = right.firstKey();
			setSplit(true);
			// System.err.printf("\n\n** split occured %s**\n\n", bPlusBlock
			// .getClass().getSimpleName());
		}

	}

	static abstract class BPlusAnyBlock<K, V> {
		public abstract SplitOrValue<K, V> put(K k, V v);

		abstract SplitOrValue<K, V> splitBlock();

		public abstract V get(K k);

		public abstract boolean isEmpty();

		public abstract K firstKey();

	}

	SortedSet<BPlusLeafBlock<K, V>> blockList = getLeafBlockSet();

	SortedSet<BPlusLeafBlock<K, V>> getLeafBlockSet() {
		// return new ConcurrentSkipListSet<BPlusLeafBlock<K, V>>();
		return new TreeSet<BPlusLeafBlock<K, V>>();
	}

	static class BPlusLeafBlock<K, V> extends BPlusAnyBlock<K, V> implements
			Comparable<BPlusLeafBlock<K, V>> {
		SortedMap<K, V> entries = getEntryCollection();

		static <K, V> SortedMap<K, V> getEntryCollection() {
			return new TreeMap<K, V>();
		}

		int maxSize;
		private BPlusTreeMap<K, V> owner;

		public boolean isEmpty() {
			return entries.isEmpty();
		}

		public BPlusLeafBlock(BPlusTreeMap<K, V> bPlusTreeMap,
				SortedMap<K, V> rightEntries) {
			this.owner = bPlusTreeMap;
			maxSize = owner.maxExternalSize;
			entries = rightEntries;
		}

		public SplitOrValue<K, V> put(K k, V v) {
			V v2 = entries.put(k, v);

			if (entries.size() >= maxSize)
				return splitBlock();
			else
				return new SplitOrValue<K, V>(v2);

		}

		public SplitOrValue<K, V> splitBlock() {

			SortedMap<K, V> leftEntries = getEntryCollection();
			SortedMap<K, V> rightEntries = getEntryCollection();

			int i = 0;

			for (Entry<K, V> e : entries.entrySet()) {
				// System.out.println(this.getClass().getSimpleName() +
				// " split entry " + e.getKey());
				if (++i <= maxSize / 2)
					leftEntries.put(e.getKey(), e.getValue());
				else
					rightEntries.put(e.getKey(), e.getValue());
			}
			BPlusLeafBlock<K, V> right = createBlock(rightEntries);
			// System.out.println("finished block split");
			// System.out.println("\nleft block");
			// for (K ik : leftEntries.keySet()) {
			// System.out.print(ik + " ");
			// }
			// System.out.println("\nright block");
			// for (K ik : right.entries.keySet()) {
			// System.out.print(ik + " ");
			// }
			// System.out.println("\n");
			this.entries = leftEntries;

			return new SplitOrValue<K, V>(this, right);
		}

		private BPlusLeafBlock<K, V> createBlock(SortedMap<K, V> rightEntries) {
			return owner.createLeafBlock(rightEntries);
		}

		@Override
		public V get(K k) {
			return entries.get(k);
		}

		@Override
		public int compareTo(BPlusLeafBlock<K, V> o) {

			return ((Comparable<K>) entries.firstKey()).compareTo(o.entries
					.firstKey());
		}

		@Override
		public K firstKey() {
			return entries.firstKey();
		}

	}

	static class BPlusBranchBlock<K, V> extends BPlusAnyBlock<K, V> {
		SortedMap<K, BPlusAnyBlock<K, V>> entries = createInternalBlockEntries();
		int maxSize;

		private BPlusAnyBlock<K, V> left;

		public boolean isEmpty() {
			return entries.isEmpty();
		}

		public BPlusBranchBlock(int maxSize2) {
			this.maxSize = maxSize2;

		}

		public SplitOrValue<K, V> put(K k, V v) {
			BPlusAnyBlock<K, V> b = getBlock(k);

			SplitOrValue<K, V> sv = b.put(k, v);

			if (sv.isSplit()) {

				entries.put(sv.k, sv.right);

				if (entries.size() >= maxSize)
					sv = splitBlock();
				else
					sv = new SplitOrValue<K, V>(null);
			}

			return sv;
		}

		BPlusAnyBlock<K, V> getBlock(K k) {
			assert (entries.size() > 0);
			BPlusAnyBlock<K, V> b = entries.get(k);
			if (b == null) {
				// headMap returns less than k
				SortedMap<K, BPlusAnyBlock<K, V>> head = entries.headMap(k);
				if (head.isEmpty()) {
					b = left;

					// System.out.println("for key " + k
					// + " getting from leftmost block");
					// showEntries();

				} else {
					b = entries.get(head.lastKey());

					// System.out.println("for key " + k
					// + " getting from block with key " + head.lastKey());
					// showEntries();
				}
			}
			assert (b != null);
			return b;
		}

		public void showEntries() {
			System.out.print("entries = ");
			for (K k : entries.keySet()) {
				System.out.print(k + " ");
			}
			System.out.println();
		}

		
		public SplitOrValue<K, V> splitBlock() {

			Set<Entry<K, BPlusAnyBlock<K, V>>> ee = entries.entrySet();
			int i = 0;
			BPlusBranchBlock<K, V> right = new BPlusBranchBlock<K, V>(maxSize);
			SortedMap<K, BPlusAnyBlock<K, V>> leftEntries = createInternalBlockEntries();

			for (Entry<K, BPlusAnyBlock<K, V>> e : ee) {
				// System.out.print("split check " + e.getKey() + ":"
				// );
				if (++i <= maxSize / 2)
					leftEntries.put(e.getKey(), e.getValue());
				else
					right.entries.put(e.getKey(), e.getValue());

			}
			// System.out.println("\n");
			this.entries = leftEntries;

			return new SplitOrValue<K, V>(this, right);
		}

		private SortedMap<K, BPlusAnyBlock<K, V>> createInternalBlockEntries() {
			return new TreeMap<K, BPlusAnyBlock<K, V>>();
		}

		@Override
		public V get(K k) {
			BPlusAnyBlock<K, V> b = getBlock(k);

			return b.get(k);
		}

		@Override
		public K firstKey() {
			return entries.firstKey();
		}

	}

	@Override
	public SortedMap<K, V> subMap(K fromKey, K toKey) {
		TreeMap<K, V> map = new TreeMap<K, V>();
		BPlusLeafBlock<K, V> b1 = getLeafBlock(fromKey);

		BPlusLeafBlock<K, V> b2 = getLeafBlock(toKey);

		SortedSet<BPlusLeafBlock<K, V>> range = blockList.subSet(b1, b2);

		for (BPlusLeafBlock<K, V> b : range) {
			SortedMap<K, V> m = b.entries.subMap(fromKey, toKey);
			map.putAll(m);
		}

		return map;
	}

	private BPlusLeafBlock<K, V> getLeafBlock(K key) {
		BPlusAnyBlock<K, V> b1;
		b1 = root;
		while (b1 instanceof BPlusBranchBlock<?, ?>) {
			b1 = ((BPlusBranchBlock<K, V>) b1).getBlock(key);

		}
		return (BPlusLeafBlock<K, V>) b1;

	}

	public BPlusLeafBlock<K, V> createLeafBlock(SortedMap<K, V> rightEntries) {
		BPlusLeafBlock<K, V> b = new BPlusLeafBlock<K, V>(this, rightEntries);
		blockList.add(b);
		return b;
	}

	@Override
	public SortedMap<K, V> headMap(K toKey) {
		return subMap(firstKey(), toKey);
	};

	@Override
	public SortedMap<K, V> tailMap(K fromKey) {
		return subMap(fromKey, lastKey());
	}

	@Override
	public K firstKey() {
		return blockList.first().entries.firstKey();
	}

	@Override
	public K lastKey() {
		return blockList.last().entries.lastKey();
	}

	@Override
	public int size() {
		return (int) getLongSize();
	}

	private long getLongSize() {
		long i = 0;
		for (BPlusLeafBlock<K, V> b : blockList) {
			i += b.entries.size();
		}
		return i;
	}

	@Override
	public boolean isEmpty() {
		return root.isEmpty();
	}

	@Override
	public boolean containsKey(Object key) {
		return get(key) != null;
	}

	@Override
	public boolean containsValue(Object value) {
		return false;
	}

	@Override
	public V get(Object key) {
		return (V) root.get((K) key);
	}

	@Override
	public V put(K key, V value) {
		if (root == null) {
			SortedMap<K, V> entries = BPlusLeafBlock.getEntryCollection();
			entries.put(key, value);
			root = createLeafBlock(entries);
			return null;
		}
		SplitOrValue<K, V> result = root.put(key, value);
		if (result.isSplit()) {
			BPlusBranchBlock<K, V> root = new BPlusBranchBlock<K, V>(
					maxInternalBlockSize);
			root.left = result.left;
			root.entries.put(result.k, result.right);
			this.root = root;
		}
		return result.v;
	}

	@Override
	public V remove(Object key) {
		return null;
	}

	@Override
	public void putAll(Map<? extends K, ? extends V> m) {
		for (K k : m.keySet()) {
			put(k, m.get(k));
		}
	}

	@Override
	public void clear() {

	}

	@Override
	public Set<K> keySet() {
		TreeSet<K> kk = new TreeSet<K>();
		for (BPlusLeafBlock<K, V> b : blockList) {
			kk.addAll(b.entries.keySet());
		}
		return kk;
	}

	@Override
	public Collection<V> values() {
		// TODO Auto-generated method stub
		return null;
	}

	@Override
	public Set<java.util.Map.Entry<K, V>> entrySet() {
		// TODO Auto-generated method stub
		return null;
	}

	@Override
	public Comparator<? super K> comparator() {
		// TODO Auto-generated method stub
		return null;
	}

	public void showLeaves() {

		for (BPlusLeafBlock<K, V> b : blockList) {
			System.out.println("Block");
			for (Entry<K, V> e : b.entries.entrySet()) {
				System.out.print(e.getKey() + ":" + e.getValue() + " ");
			}
			System.out.println();
		}
	}
}

package btreemap;

import java.util.ArrayList;
import java.util.Comparator;
import java.util.List;
import java.util.Random;

/** driver program to test B+ tree */

public class BPlusTreeTest1 {

	private static final int N = 1200000;

	public static void main(String[] args) {
		BPlusTreeMap<Integer, Integer> map = new BPlusTreeMap<Integer, Integer>(
				400, 200 );// 5000002);
		Random r = new Random();

		ArrayList<Integer> t = new ArrayList<Integer>();

		Comparator<Integer> comparator = new Comparator<Integer>() {

			@Override
			public int compare(Integer o1, Integer o2) {
				// TODO Auto-generated method stub
				return o1.intValue() - o2.intValue();
			}

		};

		List<Integer> testVals = new ArrayList<Integer>();
		for (int i = 0; i < N; ++i) {
			testVals.add(i);
		}

		for (int i = 0; i < N; ++i) {
			int x = r.nextInt(N);
			int z = testVals.set(x, testVals.get(0));
			testVals.set(0, z);

		}

		for (int i = 0; i < N; ++i) {
			map.put(testVals.get(i), testVals.get(i));
			showProgress("put", testVals, i);
		}
		System.err.println("output " + N + " vals");

		try {
			for (int i = 0; i < N; ++i) {

				showProgress("get", testVals, i);
				int x = testVals.get(i);
				if (x != map.get(x))
					System.err.println("Expecting " + x + " got " + map.get(x));

			}
			System.err.println("\nChecked " + N + " entries");
		} catch (Exception e) {
			// TODO: handle exception
			e.printStackTrace();
			map.showLeaves();
		}
	}

	private static void showProgress(String label, List<Integer> testVals, int i) {
		if (i % (N / 1000) == 0) {
			System.out.printf("%s %d:%d; ", label, i, testVals.get(i));
			if (i % (N / 10100) == 0)
				System.out.println();
		}
	}
}

Search

Text Color

Text Size

Margin Size

Font Type

Summary

A B tree java example

A B+ tree java example