Friday, March 25, 2022

Kotlin: Vector type

 Kotlin: Vector type

xxxxx


Here we describe the persistent vector attributed to Rich Hickey and influenced by Phil Bagwell’s Ideal Hash Trees. The persistent vector gives practically constant time for lookup operations, append and update operations; significantly better than that for our persistent lists.

Such vectors are sometimes known as bitmapped vector tries. Tries are specialized trees in which all the data is stored in its leaves. Further, the internal nodes of a trie have more than the two branches used in balanced binary search trees. In this implementation each node has 32 branches producing a very shallow tree structure. A lookup operation uses parts of the key to select the correct branch.

A vector uses the index as the key for lookup and other operations. We must split up the index in some way to traverse the branches of the trie. One simple scheme uses digit partitioning, splitting a key into its individual digits. However, this approach requires integer division and integer modulo operations which are too time-consuming.

Bit partitioned tries are a subset of digit partitioned tries. Conceptually they operate the same way as digit partitioned tries. Instead of splitting the key into digits, we split it into chunks of 5 bits for our 32-way branching. The expensive integer arithmetic is replaced with more efficient bit shift and logical and operations.

To illustrate, consider that we have 2 bit partitioning/4-way branching as shown in the following figure. To look up a value in a trie with up to 1024 elements, the root of the tree is a pointer to a size-4 array with each element a pointer to similar sub-nodes. The lowest level node will have pointers to the actual data values.


We interpret an index as divided into chunks of 2 bits. Hence the decimal index 637 has the binary representation 10-01-11-11-01. The chunks are mapped to levels in the tree. Fetching the element at index 637 involves indexing the root node at binary 10 to obtain the node at the next level. This node is then indexed at binary 01 to obtain the node at level 3, and so on. In general the cost of performing a lookup is proportional to the height of the tree. In the 32-way branching implementation the tree has a very wide and shallow structure producing constant time lookup or, more precisely, O(log32 n) where n is the number of elements. Tries with at most 6 levels are capable of referring to one billion elements.

The following example illustrates defining a number of extension functions on specialized vectors. Extension function maximum finds the greatest value in the vector. Extension function multiply creates a new vector comprising the product of each of the original values and the given scalar. Extension function manhattan computes the distance between two equal-sized vectors based on a strictly horizontal and/or vertical path.

fun Vector<Double>.maximum(): Double =
    this.foldLeft(this[0]){max, d -> Math.max(max, d)}

fun Vector<Double>.multiply(scalar: Double): Vector<Double> =
    this.map{d -> scalar * d}

fun Vector<Double>.manhattanNorm(vec: Vector<Double>): Double {
    return if (this.size() != vec.size())
        throw IllegalArgumentException("manhattanDistance: vectors differ in size")
    else {
        var sum: Double = 0.0
        val size: Int = this.size()
        for (index in (0 until size)) {
            val dist: Double = this[index] - vec[index]
            sum += abs(dist)
        }
        sum
    }
}   // manhattanNorm



val vec1: Vector<Double> = VectorF.of(1.0, 2.0, 3.0, 4.0, 5.0)
val vec2: Vector<Double> = VectorF.of(1.0, -2.0, 4.0, 7.0, 9.0)

assertEquals(5.0, vec1.maximum())
assertEquals(VectorF.of(2.0, 4.0, 6.0, 8.0, 10.0), vec1.multiply(2.0))
assertEquals(12.0, vec1.manhattanNorm(vec2))

The next example declares a simple matrix class in which its elements are stored in column major order using our vector class. Note the init block to check the dimensions and the support function makeVector to create a vector of some given size and a function for delivering the values.

class MatrixException(message: String) : Exception(message)

class Matrix(val rows: Int, val columns: Int, val storage: Vector<Double>) {
    constructor(rows: Int, columns: Int, value: Double = 0.0): this(rows, columns, VectorF.replicate(rows * columns, value))
    constructor(rows: Int, columns: Int, f: (Int, Int) -> Double): this(rows, columns, makeVector(rows, columns, f))

    operator fun get(row: Int, col: Int): Double {
        if (row < 0 || row >= rows)
            throw MatrixException("get: incorrect row index")
        if (col < 0 || col >= columns)
            throw MatrixException("get: incorrect column index")
        return storage[rows * col + row]
    }

    companion object {
        fun makeVector(rows: Int, columns: Int, f: (Int, Int) -> Double): Vector<Double> {
            var vec: Vector<Double> = VectorF.empty()
            for (col in 1..columns) {
                for (row in 1..rows)
                    vec = vec.append(f(row, col))
            }
            return vec
        }
    }   // companion object

    init {
        if (storage.size() != rows * columns)
            throw MatrixException("incorrect sizes")
    }   // init

}   // Matrix



    // Create the matrix ((1.0, 2.0), (3.0, 4.0), (5.0, 6.0))
val mat1: Matrix = Matrix(3, 2, VectorF.of(1.0, 3.0, 5.0, 2.0, 4.0, 6.0))
    // Create the matrix ((1.0, 0.0, 0.0), (0.0, 1.0, 0.0), (0.0, 0.0, 1.0))
val mat2: Matrix = Matrix(3, 3){row, col -> if (row == col) 1.0 else 0.0}

assertEquals(2.0, mat1[0, 1])
assertEquals(6.0, mat1[2, 1])
assertEquals(1.0, mat2[2, 2])



The code for the Dogs library can be found at:

https://github.com/KenBarclay/TBA

No comments: