Sunday, March 27, 2022

Kotlin: Map type

 Kotlin: Map type

A map is an unordered collection of key/value pairs. Inserting into a map requires two values: the key and the corresponding value. Indexing the map with that same key retrieves the associated value. The usual operations for a map include an operation to add a new key/value pair, an operation to delete a key/value pair given the key, and an operation to determine if the key is present in the map.

The Dogs library supports two types of maps: the immutable and persistent Map implemented as a binary search tree; and the immutable and persistent Map based on a hash-array mapped trie. We will discus both in this blog.



Maps

A Map is a collection of key-value pairs. It is sometimes known as a dictionary or an association list. The keys in a Map are unique and each key is associated with a value that need not be unique. The Map data type has two type parameters, for the key type and for the value type. So a Map<String, Int> maps Strings to Ints, while a Map<Int, List<String>> maps an Int to a List of Strings.

The usual operations for a Map include an operation to add a new key/value pair, an operation to delete a key/value pair given the key, and an operation to determine if the key is present in the MapThere is no significance to the order in which the elements are added to the MapThe Map data type is realized as a balanced binary search treeOf course, a balanced binary tree implies an ordering of the elements but this will be invisible to the user. The key type must be Comparable so that the ordering can be maintained.



Immutable and Persistent Maps

Like the other data types in the Dogs library, Maps are immutable and persistent. Immutable Maps have many advantages over their mutable siblings. They are thread-safe and can be passed to untrusted libraries without any side effects. A persistent Map, like a persistent List, is a data structure that always preserves the previous version of itself when it is modified. It does so by sharing parts of its structure.

The following figure shows the effect of inserting the key value Kim into the balanced binary tree xs forming the new tree ys. The resulting tree is no longer balanced since the subtree rooted at Ken has a depth of two for its right subtree and a depth of zero for its left subtree. A rebalancing is achieved by a rotation around the node labelled Ken.


The next figure illustrates a typical balanced insertion. Since the insertion into xs will first apply to the right subtreee of the root node labelled John then the left subtree of John can be shared with the new tree ys. Inserting key Kim into the subtree labelled Ken will cause the rebalancing and this new tree structure becomes part of the new tree ys.







Working with Immutable Maps

The following example demonstrates how to construct a Map and then perform the operations containsKey and lookUpKey. The Map values are the name of the states in the USA and uses their abbreviation for the key.

val states: Map<String, String> = MapF.of(
    "AL" to "Alabama",
    "AZ" to "Arizona",
    "CA" to "California",
    "FL" to "Florida",
    "KS" to "Kansas",
    "NV" to "Nevada",
    "SC" to "South Carolina",
    "TX" to "Texas"
)

assertEquals(true, states.containsKey("FL"))
assertEquals(false, states.containsKey("WY"))

assertEquals(OptionF.some("California"), states.lookUpKey("CA"))
assertEquals(OptionF.none(), states.lookUpKey("UT"))

In the next example the Map pairs the name of a color with its hexadecimal code:

val colors: Map<String, String> =
    MapF.empty<String, String>()
        .insert("Red" to "#FF0000")
        .insert("Blue" to "#0000FF")
        .insert("DarkBlue" to "#00008B")
        .insert("Yellow" to "#FFFF00")
        .insert("White" to "#FFFFFF")

assertEquals(true, colors.containsKey("White"))
assertEquals(OptionF.some("#FFFF00"), colors.lookUpKey("Yellow"))
assertEquals(OptionF.none(), colors.lookUpKey("Green"))

In the following example the Map has a String for the key and an Int for the corresponding value. The code demonstrates the effect of inserting a new key/value pair into an existing MapIf the key is already present in the Map, then the value is replaced.

val people: Map<String, Int> = MapF.of(
    "Ken" to 25,
    "John" to 31,
    "Jessie" to 22,
    "Andrew" to 35,
    "Gordon" to 35
)
val updated: Map<String, Int> = people.insert("Andrew" to 40)

assertEquals(OptionF.some(22), people.lookUpKey("Jessie"))
assertEquals(OptionF.some(22), updated.lookUpKey("Jessie"))

assertEquals(OptionF.some(35), people.lookUpKey("Andrew"))
assertEquals(OptionF.some(40), updated.lookUpKey("Andrew"))

In the next example we show the effect of the member function delete. When the key is not a member of the Map then the original Map is returned.

val people: Map<String, Int> = MapF.of(
    "Ken" to 25,
    "John" to 31,
    "Jessie" to 22,
    "Andrew" to 35,
    "Gordon" to 35
)
val removed: Map<String, Int> = people.delete("Ken").delete("John")

assertEquals(true, people.containsKey("Ken"))
assertEquals(false, removed.containsKey("Ken"))

assertEquals(OptionF.some(22), people.lookUpKey("Jessie"))
assertEquals(OptionF.some(22), removed.lookUpKey("Jessie"))

In Dogs we implemented the data structures List and Stream, both of which could be folded. When writing code that needs to process data contained in one of these structures, we often do not care about the shape of the structure. Folding is one such operation that can be applied to them. We can fold over the values of a Map with the following example:

val people: Map<String, Int> = MapF.of(
    "Ken" to 25,
    "John" to 31,
    "Jessie" to 22,
    "Andrew" to 35,
    "Gordon" to 35
)

assertEquals(148, people.foldLeft(0){acc -> {age -> acc + age}})
assertEquals(0, MapF.empty<String, Int>().foldLeft(0){acc -> {age -> acc + age}})




Hash-Array Mapped Tries

The prime candidate data structure for efficient Map collections is the Hash-Array Mapped Trie (HAMT) by Bagwell. A general trie is a lookup structure for finite strings that acts like a deterministic finite automaton. In a HAMT, the strings are the bits of the hash codes of the elements stored in the trie. Depending on the branching factor we may use different sizes of chunks from the hash code for indexing into the (sparse) array of child nodes.

Figures (a), (b) and (c) graphically illustrate the structure of a small HAMT collection with branching factor 32 after step-by-step inserting the objects A, B, C, and D. HAMTs encode the hash codes of the elements in their tree structure (cf. (d)). The prefix tree structure grows lazily upon insertion until the new element can be distinguished unambiguously from all other elements by its hash code prefix.


The index numbers in the left top corner of each node refer to the positions of elements in an imaginary sparse array. This array is actually implemented as a 32-bit bitmap and a completely filled array with length equal to the node’s arity.

Immutable HAMTs perform path-copying on updates: the edited node and all its parent nodes are reallocated. The resulting new root node satisfies the immutability property by resembling the updated path-copied branch and, for the remainder, references to unmodified branches.

The Dogs implementation of an HAMT has an interface similar to that of the Map shown in the examples above. It includes, for example, the operations containsKey, lookUpKey, insert and foldLeft.

Map is an object that maps keys to values. A Map cannot contain duplicate keys; each key can map to at most one value. However, we come across scenarios wherein more than one value has to be mapped to a key. In such scenarios where multiple values need to be mapped to a single key, we end up creating a Map with a single key but a collection of values: the MultiMap.

The Dewey Decimal Classification (DDC), colloquially known as Dewey Decimal System, is a proprietary library classification system which allows new books to be added to a library in their appropriate location based on subject.

We define a library catalog based on the Dewey classification with the Book class and a typealias for a MultiMap with the Dewey code for the key and a List of Books associated with that classification:

data class Book(val isbn: String, val title: String, val author: String)
typealias Dewey = Map<String, List<Book>>

Using the HAMT Map we associate a collection of Books with the Dewey classification. In the following listing the first assert determines the number of algebra books, while the second assert finds how many books are authored by Saumont in the programming category:

val dewey: Dewey = MapF.of(
    "005" to ListF.of(      // Programming
        Book("9781617295362", "The Joy of Kotlin", "Pierre-Yves Saumont"),
        Book("9781617293290", "Kotlin in Action", "Dmitry Jemerov"),
        Book("9781530075614", "Kotlin for Android Developers", "Antonio Leiva"),
        Book("9781617292736", "Functional Programming in Java", "Pierre-Yves Saumont")
    ),
    "512" to ListF.of(      // Algebra
        Book("9789083136608", "Linear Algebra", "Michael Cohen"),
        Book("9780471433347", "Abstract Algebra", "David Dummit")
    ),
    "520" to ListF.of(      // Astronomy
        Book("9780008389321", "The Universe", "Andrew Cohen")
    )
)

assertEquals(2, dewey["512"].size())    // number of algebra books
assertEquals(2, dewey["005"].filter{book -> (book.author == "Pierre-Yves Saumont")}.size())



Performance

A benchmark comparing our immutable persistent Map, immutable persistent HAMT Map, Kotlin's mutable MutableMap and the PersistentMap class from the Immutable Collections Library for Kotlin at [https://github.com/Kotlin/kotlinx.collections.immutable]. The timings measure (in ms) inserting 10000 elements into the collections:

Insert 10000 elements into our immutable/persistent Map:    9.2
Insert 10000 elements into our immutable/persistent HAMT Map:    8.2
Insert 10000 elements into Kotlin's MutableMap type:    1.1
Insert 10000 elements into kotlinx PersistentMap type:        14.6

A second benchmark compares looking up the values for these 10000 keys in our immutable persistent Map, immutable persistent HAMT Map, Kotlin's mutable MutableMap and the PersistentMap class from the Immutable Collections Library for Kotlin at [https://github.com/Kotlin/kotlinx.collections.immutable]. The timings measure (in ms) searching 10000 elements in the collections:

Lookup 10000 elements in our immutable/persistent Map:    324.8
Lookup 10000 elements in our immutable/persistent HAMT Map:    3.1
Lookup 10000 elements in Kotlin's MutableMap type:    1.4
Lookup 10000 elements in kotlinx PersistentMap type:        3.9



The code for the Dogs library can be found at:

https://github.com/KenBarclay/TBA



No comments: