Here we describe
the persistent vector attributed to Rich Hickey and influenced by Phil
Bagwell’s Ideal Hash Trees. The persistent vector gives practically constant
time for lookup operations, append and update operations; significantly better
than that for our persistent lists.
Such vectors are sometimes known as bitmapped vector tries. Tries are specialized trees in which all the data is stored in its leaves. Further, the internal nodes of a trie have more than the two branches used in balanced binary search trees. In this implementation each node has 32 branches producing a very shallow tree structure. A lookup operation uses parts of the key to select the correct branch.
A vector uses
the index as the key for lookup and other operations. We must split up the
index in some way to traverse the branches of the trie. One simple scheme uses digit
partitioning, splitting a key into its individual digits. However, this
approach requires integer division and integer modulo operations which are too
time-consuming.
Bit partitioned tries are a subset of digit partitioned tries. Conceptually they operate the same way as digit partitioned tries. Instead of splitting the key into digits, we split it into chunks of 5 bits for our 32-way branching. The expensive integer arithmetic is replaced with more efficient bit shift and logical and operations.
To illustrate, consider that we have 2 bit partitioning/4-way branching as shown in the following figure. To look up a value in a trie with up to 1024 elements, the root of the tree is a pointer to a size-4 array with each element a pointer to similar sub-nodes. The lowest level node will have pointers to the actual data values.
No comments:
Post a Comment