Current location - Education and Training Encyclopedia - Graduation thesis - Differences between B-tree and B+ tree in data structure
Differences between B-tree and B+ tree in data structure
First, the origin of B-tree

B-tree was first put forward by German computer scientist rudolf bayer and others in the paper "Organization and Maintenance of Large Ordered Indexes" in 1972. But I looked at the original text and found that the author did not explain why it was called B-trees, so I simply interpreted B of B-tree as balanced or binary, which was not particularly rigorous. Maybe the author is named after Bayer's initials. ...

Second, what does the B tree look like?

It's clearer to look at the picture directly. As shown in the figure, B-tree is actually a balanced multi-branch search tree, that is, when m >, M branches can be opened (m >; =2), we call it M-order B-tree. In order to reflect the conscience of this blog, unlike other places where you can see a second-order B-tree, here we have specially drawn a fifth-order B-tree.

Generally speaking, an M-order B-tree satisfies the following conditions:

Each node can have at most m subtrees.

The root node has at least two nodes (or in extreme cases, a tree has one root node, and single-celled organisms are roots, leaves and trees).

Non-root non-leaf nodes have at least Ceil(m/2) subtrees (Ceil stands for rounding up, and each node in the graph has at least 3 subtrees, that is, at least 3 forks).

The information in non-leaf nodes includes [n, A0, K 1, A 1, K2, A2, …, KN, An], where n represents the number of keywords saved in the node, k is the keyword, and Ki.

Every path length from root to leaf is the same, that is to say, leaf nodes are at the same level, and these nodes have no information. In fact, these nodes mean that the specified value cannot be found, that is, the pointer to these nodes is empty.

The query process of B-tree is similar to binary sort tree. Each node is compared in turn from the root node. Because the keywords in each node and the left and right subtrees are ordered, the specified keywords can be quickly found by comparing the keywords in the nodes or along the pointer. If the search fails, the leaf node will be returned, that is, a null pointer.

For example, look up k in the alphabet in the graph.

Starting from the root node p, the position of k is before p, and it enters the left pointer.

In the left subtree, C, F, J and M are compared in turn, and it is found that K is between J and M.

Continue to visit the subtree along the pointer between j and m, and compare them in turn, and find that the first keyword k is the value of the specified search.

Third, the Plus version-B+tree

As an enhanced version of B-tree, the difference between B+ tree and B-tree lies in:

Nodes of n subtrees contain n keywords (some people think it is n- 1 keyword).

All leaf nodes contain all keywords and pointers to records containing these keywords, and the leaf nodes themselves are connected in the order of keywords from small to large.

Non-leaf nodes can be regarded as index parts, and nodes only contain the largest (or smallest) keywords in their subtrees (root nodes).

The search process of B+ tree is similar to that of B- tree, except that when searching, if the keyword on the non-leaf node is equal to the given value, it will not be terminated, but will continue along the pointer until the position of the leaf node. So in the B+ tree, no matter whether the search is successful or not, every search takes a path from the root to the leaf node.