Monday, August 17, 2020

Kotlin: Parser Combinator

 Kotlin: Parser Combinator

A parser combinator is a higher-order function that accepts several parsers as input and returns a new parser as its output. In this context, a parser is a function accepting strings as input and returning some structure as output, typically a parse tree. Parser combinators enable a recursive descent parsing strategy that facilitates modular piecewise construction and testing. This parsing technique is called combinatory parsing.

In a programming language such as Kotlin that has first-class functions, parser combinators can be used to combine basic parsers to construct parsers for more complex rules. For example, a production rule of a context-free grammar (CFG) may have one or more alternatives and each alternative may consist of a sequence of non-terminal(s) and/or terminal(s). If a simple parser is available for each of these alternatives, a parser combinator can be used to combine each of these parsers, returning a new parser which can recognise any or all of the alternatives.

This blog describes a parser combinator written in Kotlin. The features of Kotlin made it possible to mirror much of the original Haskell code from which it was developed.



ParsecT

ParsecT is the main parser and the central data type in this library. The class is parametrized and defined as:

class ParsecT<TOK, U, A, Z>

where TOK is the type of elements in the input stream (commonly Char), U is the user state, A is the return type, and Z is the value type when an instance is invoked. The class is represented by a single function type property and the member functions deliver a new instance with a suitably modified function property. Hence the class represents a lazy object which only executes when it is finally invoked.

The most convenient way to define various parsers is to define a custom type synonym for your parser. The library defines a typealias entitled Parser:

typealias Parser<A> = ParsecT<Char, Unit, A, Unit>

where the return type A is still a parameter and the input stream is a sequence of Chars.

The object declaration:

object ParsecTF

includes a set of related functions that either operate on one or more ParsecT parameters or delivers a ParsecT instance.



Object declarations CombinatorF, CharacterF and StringF

The object declaration CombinatorF includes a number of functions used to combine parsers. For example, function many parses none or more occurrences of a given parser. Its signature is:

fun <TOK, U, A, Z> many(parsec: ParsecT<TOK, U, A, Z>): ParsecT<TOK, U, List<A>, Z>

Note that the return type for the generated parser is a List<A> where the given parser returns the type A.

The object declaration CharacterF includes a number of val bindings and function declarations for parsing individual characters. The function declarations in CharacterF are like function oneOf which returns a parser that succeeds if the current input character is in the sequence of character parameters. Its signature is:

fun oneOf(vararg chars: Char): Parser<Char>

The val bindings are predefined character parsers. For example, the parser colon will recognize a ':' character in the input. The parser dollar will recognize a '$' character in the input. The parser digit will recognize a single decimal digit in the input.

The object declaration StringF includes the function parse to execute a parser against some input given as a String. The declaration is:

fun <A> parse(parsec: Parser<A>, name: String, text: String): Either<ParseError, A>

The parsec parameter is the parser to execute, the text is the input the parser operates on and name is used to describe the source. The name might be the name of the file from which the text is obtained.

The return type is either a parse error or the return value from the parser. If the parser fails then the return is a Left instance wrapped around a ParseError. If the parser succeeds then the return is a Right wrapped around the parser return.



The Basics

Most of the val bindings in CharacterF are defined with the member function satisfy:

fun satisfy(predicate: (Char) -> Boolean): Parser<Char>

This looks at the next character from the current input, and if the predicate function returns true for this character, it 'pops' it from the input and returns it. In this way, the current position in the input string is tracked behind the scenes. A sample use of this function is:

val pA: Parser<Char> = satisfy{ch: Char -> (ch == 'A')}
assertEquals(right('A'),  parse(pA, "", "ABC"))
assertTrue(parse(pA, "", "BC").isLeft())
println(parse(pA, "", "BC"))

with the final print producing:

Left(ParseError(sourcePosition=(1, 1), messages=<[SysUnExpect: B]>))

with the error on the first character (of the first line) with the unexpected character 'B'.



CharacterF.character

This function returns a parser that matches the current character in the text that we are parsing to whatever character you provide it. Its signature is:

fun character(ch: Char): Parser<Char>

Sample use of this function is:

val pCharacter: Parser<Char> = character('H')
assertEquals(right('H'),    parse(pCharacter, "", "Hello"))
assertTrue(parse(pCharacter, "", "Goodbye").isLeft())

Here, running character('H') returns a parser that will match the single character 'H'. If we use it in parsing our string, which begins with an H, it is perfectly happy. If we try looking for any letter that isn't 'H', we fail. The result is always of type Either<ParseError, Char>; we get back a Right result if the rule was successful, and Left error if the rule fails.



CharacterF.string

This function returns a parser that attempts to match the given string of characters:

val pString: Parser<String> = string("Hello")
assertEquals(right("Hello"),    parse(pString, "", "Hello world"))
assertTrue(parse(pString, "", "Help").isLeft())

The parser will consume characters from the input one by one until all characters match or one of them is not as expected. Since both of the above attempts begin with 'Hel', the error complains about an unexpected 'p'. This consuming of characters will become significant when multiple rules are chained together.



CharacterF.oneOf

Sometimes we want to match a range of characters; that's where CharacterF.oneOf comes in handy. Similar to CharacterF.character, except you provide this function a list of characters you're OK with matching:

val pVowels: Parser<Char> = oneOf('a', 'e', 'i', 'o', 'u')
assertEquals(right('o'),    parse(pVowels, "", "object"))
assertTrue(parse(pVowels, "", "function").isLeft())

Here, we can see that the parser will consume any single character that is a vowel.

Parsec comes with pre-prepared rules which do just that, for example CharacterF.anyChar which will consume one of anything:

val pSymbols: Parser<Char> = anyCharacter
assertEquals(right('H'),    parse(pSymbols, "", "Hello"))
assertEquals(right('+'),    parse(pSymbols, "", "+-*/"))

The rule CharacterF.letter will happily consume any lower or uppercase letter, CharacterF.lower will consume any lowercase letter, CharacterF.digit will consume any single decimal digit, and CharacterF.alphaNum any letter or digit.



CombinatorF.many

There are times when you want to parse more than just one letter. CombinatorF.many tries to run whatever rule it is given as an argument repeatedly until it fails. Even if the rule matches no times, many will return without error, but just give back an empty result. Let's see how it's used:

assertEquals(right(ListF.of('h', 'h', 'h')),                    parse(many(character('h')), "", "hhheee"))
assertEquals(right(ListF.of()),                                  parse(many(character('e')), "", "hhheee"))
assertEquals(right(ListF.of('h', 'h', 'h', 'e', 'e', 'e')),    parse(many(letter), "", "hhheee"))

As we see,CombinatorF.many can't fail, it will match the rule it's given zero times and just return nothing. It'll go as far as it can and give you back everything that matched. CombinatorF.many1 is similar except that the rule provided must match at least once for it to return successfully.



ParsecT.fmap

The ParsecT data type is a functor by supporting the member function fmap which, given any types A and B lets you apply the function (A) -> B to a ParsecT<TOK, U, A, Z> delivering a ParsecT<TOK, U, B, Z>. Thus we can use fmap to transform the return value of a parser:

val pTab: Parser<Char> = tab
val pSpaces: Parser<String> = pTab.fmap{ch: Char -> "    "}
assertEquals(right("    "),     parse(pSpaces, "", "\t"))
assertTrue(parse(pSpaces, "", "A").isLeft())

Combining fmap, many1 and the single character parser digit we can parse a series of decimal digits into an integer:

val pInteger: Parser<Int> = many1(digit).fmap{digs: List<Char> ->
    digs.foldLeft(0){acc: Int -> {ch: Char -> 10 * acc + (ch.toInt() - '0'.toInt())}}
}
assertEquals(right(123),    parse(pInteger, "", "123"))
assertTrue(parse(pInteger, "", "ABC").isLeft())



Combining rules

The simplest way to combine parsers is to execute them in succession. ParsecT is a monad, and monadic bind is exactly what we use for sequencing our parsers. In the following example we sequence three character parsers to recognize ABC:

val pA: Parser<Char> = character('A')
val pB: Parser<Char> = character('B')
val pC: Parser<Char> = character('C')
val pSequence: Parser<Triple<Char, Char, Char>> =
        pA.bind{chA: Char ->
            pB.bind{chB: Char ->
                pC.bind{chC: Char ->
                    inj(Triple(chA, chB, chC))
                }
            }
        }

assertEquals(right(Triple('A', 'B', 'C')),  parse(pSequence, "", "ABC"))

The object declaration ParsecTF includes the function mapM3 which supports an alternative syntax for sequential execution of monadic parsers:

val pA: Parser<Char> = character('A')
val pB: Parser<Char> = character('B')
val pC: Parser<Char> = character('C')
val pSequence: Parser<Triple<Char, Char, Char>> =
        mapM3(pA, pB, pC){chA: Char -> {chB: Char -> {chC: Char -> Triple(chA, chB, chC)}}}

assertEquals(right(Triple('A', 'B', 'C')),  parse(pSequence, "", "ABC"))



Kotlin package declaration parsing

In this section we develop a simple parser for a Kotlin package declaration. A typical package declaration is:

package com.adt.kotlin.parsec

comprising the package keyword, one or more spaces and a qualified name. First we define the parsers for the keyword and the dot separator:

val pPackageKeyword: Parser<String> = StringF.lexeme(string("package"))
val pDot: Parser<Char> = character('.')

In the original parsec documentation, one of the concepts mentioned is the idea of 'lexeme' parsing. This is a style in which every token parser should also consume and ignore any trailing whitespace. This is a simple convention which with a bit of care allows skipping whitespace exactly once wherever it needs to be skipped. The function lexeme performs this task, skipping any whitespace following the package keyword.

An identifier must start with a letter or underscore, and then followed by zero or more letters, underscores or digits in any combination:

val leading: Parser<Char> = satisfy{ch: Char -> Character.isJavaIdentifierStart(ch)}
val trailing: Parser<Char> = satisfy{ch: Char -> Character.isJavaIdentifierPart(ch)}
val pIdentifier: Parser<String> =
        leading.bind{leadingChar: Char ->
            many(trailing).bind{trailingChars: List<Char> ->
                inj(ListF.cons(leadingChar, trailingChars).charsToString())
            }
        }   // pIdentifier

A qualified identifier comprises one or more identifiers separated by a dot.

val pQualifiedName: Parser<QualifiedName> =
        CombinatorF.sepBy1(pIdentifier, pDot).fmap{names: List<String> -> QualifiedName(names)}

Function sepBy1 takes two arguments, a rule to parse items, and a rule to parse separators and returns a list of whatever the rule returns. Using fmap we convert this into a QualifiedName:

data class QualifiedName(val names: List<String>)

The parser for a package declaration is then:

val pPackageDeclaration: Parser<PackageDeclaration> =
        pPackageKeyword.bind{_: String ->
            pQualifiedName.bind{qualifiedName: QualifiedName ->
                inj(PackageDeclaration(qualifiedName))
            }
        }   // pPackageDeclaration

where:

data class PackageDeclaration(val qualifiedName: QualifiedName)

We demonstrate it with:

assertEquals(right(PackageDeclaration(QualifiedName(ListF.of("test", "kotlin")))),
    parse(pPackageDeclaration, "", "package test.kotlin"))
assertTrue(parse(pPackageDeclaration, "", "package test.*").isLeft())
assertTrue(parse(pPackageDeclaration, "", "package").isLeft())
assertTrue(parse(pPackageDeclaration, "", "import test.kotlin").isLeft())



The code for Parsec can be found at:

https://github.com/KenBarclay/Parsec

Friday, June 12, 2020

Kotlin: Non Empty List type

Kotlin: Non Empty List type

A non empty list is one with at least one element, but is otherwise identical to the traditional immutable list in complexity and API. Sometimes you have a function that expects a list with at least one element in the function call. We cannot enforce the user to provide a non empty list using either the Kotlin List interface or the List class from the custom Dogs library. Here we present the NonEmptyList class that guarantees to have one or more elements.



Motivation

Consider an order-taking workflow for a manufacturing company. An order might include the customer name, the billing and shipping addresses and a number of order lines. We might consider a List to represent the order lines but a List may be empty and the integrity of a business rule may be violated if we permit an order with zero order lines.

Equally, suppose we have some function that operates on a series of one or more values given as a parameter. An obvious choice would be use a List for the parameter type and make a check within the function body that the list length exceeds zero.

We can use the Kotlin type system to represent what is valid or invalid so that the compiler can check it for us, instead of relying on runtime checks or code comments to ensure the rules are maintained. For this function a more appropriate parameter type would be NonEmptyList which, as its name suggests, contains one or more elements.

When validating user input we might consider using the Validation class from the custom library (see later). The Validation class has two concrete sub-classes Failure and Success. The Failure class carries a number of error types. Since it is meaningless to have a Failure with no errors then we use the type Failure<NonEmptyList<Error>> where Error is some error type. We discussed this in the Validation type blog.



Immutable NonEmptyList

The immutable NonEmptyList type is defined as:

class NonEmptyList<out A> internal constructor(val hd: A, val tl: List<A>)

Again, since the constructor is tagged internal, a series of factory constructor functions are provided:

assertEquals(4,         cons(1, cons(2, cons(3, singleton(4)))).size())
assertEquals(1,         cons(1, cons(2, cons(3, singleton(4)))).head())
assertEquals(4,         cons(1, cons(2, cons(3, singleton(4)))).last())
assertEquals(9,         of(1, 2, 3, 4, 5, 6, 7, 8, 9).size())
assertEquals(10,        closedRange(1, 10).size())

Unlike the List class, the member function head of class NonEmptyList is a safe operation that guarantees not to throw an exception. Further, some member functions of NonEmptyList such as map deliver a NonEmptyList result, while others, such as filter, deliver a List result. This is necessary since the predicate to the filter function may not match any element of the NonEmptyList. Here are some examples that deliver Lists:

assertEquals(ListF.of(3, 4, 5),     NonEmptyListF.of(1, 2, 3, 4, 5).drop(2))
assertEquals(ListF.of(),            NonEmptyListF.of(1, 2, 3, 4, 5).drop(6))
assertEquals(ListF.of(2, 4),        NonEmptyListF.of(1, 2, 3, 4, 5).filter(isEven))
assertEquals(ListF.of(),            NonEmptyListF.of(1, 2, 3, 4, 5).filter{n -> (n > 10)})

Here are some of the many functions over NonEmptyLists:

assertEquals(3,         NonEmptyListF.of(1, 2, 3, 4, 5)[2])
assertEquals(true,      NonEmptyListF.of(1, 2, 3, 4, 5).contains(4))
assertEquals(15,        NonEmptyListF.of(1, 2, 3, 4, 5).foldLeft(0){acc -> {el -> acc + el}})
assertEquals(
    NonEmptyListF.of(false, true, false, true, false),
    NonEmptyListF.of(1, 2, 3, 4, 5).map(isEven)
)
assertEquals(
    NonEmptyListF.of(Pair(1, 'a'), Pair(2, 'b'), Pair(3, 'c')),
    NonEmptyListF.of(1, 2, 3, 4, 5).zip(NonEmptyListF.of('a', 'b', 'c'))
)


Case Study

An important attribute of applicatives is that they compose. Since the NonEmptyList class is an applicative then it supports the fmap3 function which maps a function of three parameters across the elements of three NonEmptyLists. Suppose we have the function volume which computes the volume of a cuboid:

fun volume(length: Int, width: Int, height: Int): Int = length * width * height

Consider that we wish apply this function but each of the dimensions are in NonEmptyLists:

val lengths: NonEmptyList<Int> = of(2, 3, 4, 5)
val widths: NonEmptyList<Int> = of(6, 7)
val heights: NonEmptyList<Int> = of(7, 8, 9)


The fmap3 function has the signature:

fun <A, B, C, D> fmap3(la: NonEmptyList<A>, lb: NonEmptyList<B>, 
        lc: NonEmptyList<C>, f: (A) -> (B) -> (C) -> D): NonEmptyList<D>


and use it with:

val volumes: NonEmptyList<Int> = NonEmptyListF.fmap3(lengths, widths, heights, ::volume)

assertEquals(
    NonEmptyListF.of(84, 96, 108, 98, 112, 126, 126, 144, 162, 147, 168, 189, 168, 192, 216, 196, 224, 252, 210, 240, 270, 245, 280, 315),
    volumes
)


where the values 84, 96, 108, 98, ... are produced from 2 * 6 * 7, 2 * 6 * 8, 2 * 6 * 9, 2 * 7 * 7, ...



The code for the Dogs library can be found at:

https://github.com/KenBarclay/TBA

Tuesday, June 2, 2020

Kotlin: List type

Kotlin: List type

Ordinary data structures are ephemeral in the sense that making a change to the structure destroys the old version, leaving only the new one. This is the case for the List and Map types in the Kotlin standard library. These imperative data structures operate through in-place mutation of data ─ the data structure is mutable. In our everyday usage of Kotlin data structures we mutate lists, arrays and any other type to realise the algorithm that we implement.


An important property of functional data structures is that they always persist ─ updating a functional data structure does not destroy the existing version, but rather creates a new version that coexists with the old version. Persistence is achieved by copying the affected nodes of a data structure and making all changes in the copy rather than in the original. Because they are never modified directly, all nodes that are unaffected by an update can be shared between the old and new versions of the structure without worrying that a change in one version will inadvertently be visible to the other. Functional data structures are immutable and persistent.



Motivation

Functional languages exploit persistent data structures, many of them based on the seminal book by Chris Okasaki. Persistent data structures are being embraced by imperative programming languages. Persistent data structures should be part of every Kotlin programmer’s toolset, especially one interested in concurrency and parallelism.

Immutability is important for a number of reasons:
  • immutable data makes the code more predictable
  • immutable data is easier to work with
  • immutable data encourages a transformational approach
With immutability there can be no side effects. If there are no side effects it is much easier to reason about the correctness of the code. The code is then more predictable than its wholly imperative equivalent.

If data is immutable many common tasks become much easier: the code is easier to write and to maintain. Fewer unit tests are required and you only have to check that the function works in isolation. Concurrency is simpler too since you are not worrying about update conflicts.

Immutable data promotes a different programming approach. More emphasis is given to transforming the data instead of in-place mutation of the data. A greater emphasis on transformation can lead to more elegant, more modular and more scalable solutions.

Every data structure has its own unique performance characteristics. A naïve implementation of a persistent data structure would deliver a very poor performance. Major studies have been done designing efficient persistent data structures. In many cases they closely match the performance of their mutable cousins.



Immutable Lists

The simple list is ubiquitous throughout programming and is probably the most-used data structure. The list is a collection of references to other objects. The list can grow dynamically. The list is defined generically so that it can handle values identically without depending on their type. Since the type of objects maintained by this collection is arbitrary, the elements of a list can be another list or map or etc. This way we can create useful data structures of arbitrary complexity.

The immutable List type is defined recursively in Dogs as an algebraic data type:

sealed class List<out A> {
    object Nil : List<Nothing>()
    class Cons<A>(val hd: A, val tl : List<A>) : List<A>()
}


Every list is constructed with just the two value constructors Nil and Cons. Nil is a synonym for the empty list while Cons makes a list by putting an element in front of an existing list. Every list is either Nil, if empty, or has the form Cons(x, xs), where x is the head of the list and xs is the tail of the list. The tail is another list.

The following figure is a sample list using an empty list (represented by the circle) and Cons cells (represented by the sub-divided rectangles). Each Cons cell stores a single value for its head and a reference to the remainder for its tail.

This sample list would be constructed by creating an empty list with nil() then cons-ing new elements on to the head with cons(2, nil()). The completed list is assembled with:

cons(1, cons(2, cons(3, cons(4, nil()))))

The type names Nil and Cons are accessible to application code but the constructors are tagged as internal. Hence we use the factory constructor functions nil and cons. Here are some examples of assembled lists:

assertEquals(4,         cons(1, cons(2, cons(3, cons(4, nil())))).size())
assertEquals(1,         cons(1, cons(2, cons(3, cons(4, nil())))).head())
assertEquals(false,     cons(1, cons(2, cons(3, cons(4, nil())))).isEmpty())
assertEquals(4,         ListF.of(1, 2, 3, 4).size())
assertEquals(10,        ListF.closedRange(1, 10).size())


We can grow a List with the append function:

assertEquals(5,         ListF.of(1, 2, 3, 4).append(5).last())
assertEquals(7,         ListF.of(1, 2, 3, 4).append(ListF.of(5, 6, 7)).last())


Prepending an element on to the front of a Kotlin ImmutableList with list.add(0, element) is more expensive than adding to the end with list.add(element). By the same argument, adding to the end of the custom immutable List with append is more expensive than prepending with cons. The function append is best avoided when working with large lists.



Processing immutable lists

The custom immutable List class has the same functionality as that provided by the Kotlin extension functions on the Java List. Many are named the same. For example the custom immutable List class includes the member functions contains, count, drop, dropWhile, filter, get, indexOf, isEmpty, last, map, take, takeWhile and zip among others. Most bear the same signature as their Kotlin counterparts.

The following example demonstrates how easy it is to define one function in terms of those provided. Given two indices from and to, the segment is the list containing the elements between from and to inclusive:

fun <A> segment(from: Int, to: Int, ls: List<A>): List<A> =
    ls.drop(from).take(to - from + 1)

assertEquals(ListF.empty(),     segment(3, 5, ListF.empty<Int>()))
assertEquals(ListF.empty(),     segment(3, 5, ListF.of(1, 2, 3)))
assertEquals(ListF.of(1, 2, 3), segment(3, 5, ListF.of(1, 1, 1, 1, 2, 3, 3, 1, 1, 4, 5, 5, 5, 5)))

Every list is either empty represented by Nil, or is non-empty represented by Cons( ... ). In fact, every list is constructed by repeatedly applying Cons on to an empty list Nil. If we want to define a function over a list we distinguish between empty and non-empty cases. This leads to a constructor pattern over lists which will pattern match against either Nil or Cons.

Function headOption delivers the first element in the list. If the list is empty the function returns None otherwise it returns the head of the list wrapped in a Some. The implementation exploits the pattern matching of sealed classes.

The extension function reverseMap builds a new list by applying the function to all elements of the receiver, collecting the result in reverse order. The local function recReverseMap pattern matches against the empty and non-empty list. As the elements are transformed by the function f they are prepended on to the accumulating parameter acc.

fun <A> headOption(ls: List<A>): Option<A> =
    when (ls) {
        is Nil -> none()
        is Cons -> some(ls.head())
    }

fun <A, B> List<A>.reverseMap(f: (A) -> B): List<B> {
    tailrec
    fun recReverseMap(f: (A) -> B, xs: List<A>, acc: List<B>): List<B> {
        return when(xs) {
            is Nil -> acc
            is Cons -> recReverseMap(f, xs.tail(), cons(f(xs.head()), acc))
        }
    }   // recReverseMap

    return recReverseMap(f, this, ListF.empty())
}   // reverseMap

assertEquals(none(),                            headOption(ListF.empty<Int>()))
assertEquals(some("Ken"),                       headOption(ListF.of("Ken", "John", "Jessie", "Irene")))
assertEquals(ListF.of(16, 9, 4, 1),             ListF.of(1, 2, 3, 4).reverseMap{n -> n * n})
assertEquals(ListF.of("JESSIE", "JOHN", "KEN"),
        ListF.of("Ken", "John", "Jessie").reverseMap{str -> str.toUpperCase()})

Zipping is the process of combining two lists into one list by merging corresponding elements into pairs. The resulting list has the same length as the shorter of the two. Here are some examples:

assertEquals(ListF.of(Pair(1, 4), Pair(2, 5), Pair(3, 6)),    ListF.of(1, 2, 3).zip(ListF.of(4, 5, 6)))
assertEquals(ListF.of(Pair(1, 4), Pair(2, 5)),                ListF.of(1, 2).zip(ListF.of(4, 5, 6)))
assertEquals(ListF.of(Pair(1, 4), Pair(2, 5)),                ListF.of(1, 2, 3).zip(ListF.of(4, 5)))
assertEquals(ListF.empty(),                                   ListF.of(1, 2, 3).zip(ListF.empty<Int>()))
assertEquals(ListF.empty(),                                   ListF.empty<Int>().zip(ListF.of(4, 5, 6)))


In the next illustration the zip function is used to define isOrdered which determines if a List of Ints is in ascending order. Zipping a list with its tail produces pairs of adjacent elements. Function forAll returns true if all the elements match the predicate.

fun isOrdered(list: List<Int>): Boolean =
    list.zip(list.tail()).forAll{pair -> (pair.first <= pair.second)}

assertEquals(true,      isOrdered(ListF.of(1, 2, 3, 4, 5)))
assertEquals(true,      isOrdered(ListF.of(1, 2, 2, 3, 4)))
assertEquals(false,     isOrdered(ListF.of(1, 2, 5, 4, 3)))




Immutable Lists and persistence

Our list data structures are immutable. Member functions of the custom class List and the functions defined in ListF working with our list data structure do not modify the state of the structure. They can only return a new structure if they represent some change operation. In this setting the lists we create are immutable. A further consequence of this approach is that the structures are shared.

In the following figure presents three lists named xs, ys and zs. The list ys is obtained by dropping the first three elements from the xs list sharing its final element. The figure also shows how the list zs also shares part of the original list xsBy comparison, the drop function from the Kotlin standard library implements this by making a copy of the sub-list.

Here is how the member function drop is implemented:

fun drop(n: Int): List<A> {
    tailrec
    fun recDrop(m: Int, xs: List<A>): List<A> {
        return if (m <= 0)
            xs
        else when(xs) {
            is Nil -> xs
            is Cons -> recDrop(m - 1, xs.tail())
        }
    }   // recDrop

    return recDrop(n, this)
}   // drop

One of the benefits of immutable persistent data structures such as the custom List is the performance gains that can be provided by data sharing. Accessing the first element of a list is immediate. Removing the first element is equally fast. The extension function drop in the Kotlin standard library makes a new list from the original with the first n elements removed and copying the remainder into the new list.



Higher order functions over lists

The custom List class includes a number of higher order functions. One of these is member function map which applies a transformation function to all the elements in a List delivering a new List:

val square: (Int) -> Int = {n -> n * n}
val isEven: (Int) -> Boolean = {n -> (n % 2 == 0)}
val list: List<Int> = ListF.of(1, 2, 3, 4, 5)

assertEquals(ListF.of(1, 4, 9, 16, 25),                     list.map(square))
assertEquals(ListF.of(false, true, false, true, false),     list.map(isEven))
assertEquals(list.map(compose(isEven, square)),             list.map(square).map(isEven))

With immutability there can be no side effects. If there are no side effects it is much easier to reason about the correctness of the code. The code is then more predictable than its wholly imperative equivalent.

Immutable data promotes a different programming approach. More emphasis is given to transforming the data instead of in-place mutation of the data. A greater emphasis on transformation can lead to more elegant, more modular and more scalable solutions.

The use of transformation is shown in the following example which determines the names of the given capital cities that have a population that exceed one million. The two-stage solution is first to filter out those cities with populations over one million, then obtain their names. The val bindings for largeCities and nameLargeCities do precisely that. The largeCities binding references a new List as does the binding for nameLargestCities. We demonstrate in the next example how to avoid this.

class City(val name: String, val population: Int)

val capitals: List<City> = ListF.of(
    City("Amsterdam", 730000),
    City("Berlin", 3400000),
    City("Brussels", 450000),
    City("Lisbon", 670000),
    City("London", 7000000),
    City("Madrid", 3000000),
    City("Paris", 2200000),
    City("Stockholm", 720000)
)

fun namesOfCitiesOver1M(cities: List<City>): List<String> {
    val largeCities: List<City> = cities.filter{city -> (city.population > 1000000)}
    val nameLargeCities: List<String> = largeCities.map{city -> city.name}
    return nameLargeCities
}

assertEquals(ListF.of("Berlin", "London", "Madrid", "Paris"), namesOfCitiesOver1M(capitals))

The next example repeats the previous example. In the function citiesOver1M the two functions filter and map are uncurried versions of the member functions filter and map. The val binding largeCities is a concrete version of filter that transforms a List<City> into a List<City>. It is achieved by first currying filter then flipping its arguments so that the predicate function is the first parameter. We then partially apply it with the required predicate. We do the same for the val binding for cityNames. The namesOfCitiesOver1M binding then composes these two by pipelining the filtering through the mapping. The pipelining is achieved by forward composing (pipe) the functions largeCitiesC and cityNamesC so that output of the former is input into the latter.

This is sometime known as the point-free programming style in which the function namesOfCitiesOver1M does not include any reference to its (List) argument, but is defined using combinators and function composition.

class City(val name: String, val population: Int)

val capitals: List<City> = ListF.of(
    City("Amsterdam", 730000),
    City("Berlin", 3400000),
    City("Brussels", 450000),
    City("Lisbon", 670000),
    City("London", 7000000),
    City("Madrid", 3000000),
    City("Paris", 2200000),
    City("Stockholm", 720000)
)

val filter: (List<City>, (City) -> Boolean) -> List<City> = List<City>::filter
val map: (List<City>, (City) -> String) -> List<String> = {ls: List<City>, f: (City) -> String -> ls.map(f)}
val largeCities: (List<City>) -> List<City> = flip(C2(filter))({city: City -> (city.population > 1000000)})
val cityNames: (List<City>) -> List<String> = flip(C2(map))({city: City -> city.name})
val namesOfCitiesOver1M: (List<City>) -> List<String> = largeCities pipe cityNames

assertEquals(ListF.of("Berlin", "London", "Madrid", "Paris"), namesOfCitiesOver1M(capitals))

Function composition as shown provides a powerful mechanism for structuring designs: programs are written as a pipeline of operations, passing the appropriate data structure between each operation. Here, we pipe a List from the largestCities operation through to the cityNames operation. A number of such operations would imply constructing temporary List objects and for large Lists could prove an expensive process. A later blog from this series will investigate Streams as an alternative structure.

It is common to want to reduce a list to a single value. For example, we might wish to sum or multiply all the elements in a list. This is the basis for a particular kind of higher order function called folding. Folding is used to fold a function (such as a summation) into a list of values. Here are some simple examples:

assertEquals(10,        ListF.of(1, 2, 3, 4).foldLeft(0){x -> {y -> x + y}})
assertEquals(0,          ListF.empty<Int>().foldLeft(0){x -> {y -> x + y}})
assertEquals(500500,    ListF.closedRange(1, 1000).foldLeft(0){x -> {y -> x + y}})
assertEquals(ListF.of(1, 2, 3, 4),    ListF.of(1, 2, 3, 4).foldLeft(ListF.empty()){list -> {elem -> list.append(elem)}})


Folding is a surprisingly fundamental operation. Its generality allows us to define many standard list processing operations. In the following example the reverse operation is defined in terms of a foldLeft:

fun <A> reverse(list: List<A>): List<A> =
    list.foldLeft(ListF.empty(), flip(C2(ListF::cons)))

assertEquals(ListF.of(4, 3, 2, 1),                  reverse(ListF.of(1, 2, 3, 4)))
assertEquals(ListF.of("Jessie", "John", "Ken"),     reverse(ListF.of("Ken", "John", "Jessie")))




Higher Order Abstractions

Like the Option type, the List type is a functor (functions fmap, distribute, etc), an applicative functor (functions ap, product2, etc) and a monad (functions bind or flatMap). The following example demonstrates using function fmap on a List. Function fmap is defined in terms of the member function map described above.

assertEquals(ListF.of(1, 4, 9, 16),     ListF.of(1, 2, 3, 4).fmap{n -> n * n})

val replicate: (Int) -> (Int) -> List<Int> = {n -> {x -> ListF.replicate(n, x)}}
assertEquals(
    ListF.of(ListF.of(1, 1, 1), ListF.of(2, 2, 2), ListF.of(3, 3, 3), ListF.of(4, 4, 4)),
    ListF.of(1, 2, 3, 4).fmap(replicate(3))
)

Function fmap is accompanied with the infix equivalent dollar. This allows us the to use the transformation function as the first parameter:

val numbers: List<Int> = ListF.of(1, 2, 3, 4)

assertEquals(ListF.of(2, 3, 4, 5),      {n: Int -> n + 1} dollar numbers)
assertEquals(ListF.of(true, false, true, false),    {n: Int -> (isEven(n))} dollar ({n: Int -> n + 1} dollar numbers))

The following demonstrates how function ap draws every function from the parameter list and applies it to every value in the receiver list:

fun mul(m: Int, n: Int): Int = m * n
fun add(m: Int, n: Int): Int = m + n
fun power(m: Int, n: Int): Int {
    var pow: Int = 1
    for (k in 1..n)
        pow = pow * m
    return pow
}
val mul0: (Int) -> Int = C2(::mul)(0)
val add100: (Int) -> Int = C2(::add)(100)
val pow2: (Int) -> Int = flip(C2(::power))(2)

assertEquals(ListF.of(0, 0, 0, 101, 102, 103, 1, 4, 9),    ListF.of(1, 2, 3).ap(ListF.of(mul0, add100, pow2)))

The infix function fmap and appliedOver support presenting the logic in the applicative style:

val numbers4: List<Int> = ListF.closedRange(1, 4)
val numbers6: List<Int> = ListF.closedRange(1, 6)

assertEquals(
    ListF.of(2, 3, 4, 5, 6, 7, 3, 4, 5, 6, 7, 8, 4, 5, 6, 7, 8, 9, 5, 6, 7, 8, 9, 10),
    {m: Int -> {n: Int -> m + n}} fmap numbers4 appliedOver numbers6
)

This short example shows the monadic bind operation used with lists. The bind operation is also known as flatMap:

assertEquals(ListF.of(-1, 1, -2, 2, -3, 3),     ListF.of(1, 2, 3).bind{n -> ListF.of(-n, n)})



Foldables

Foldable type class instances can be defined for data structures that can be folded to a summary value. Most collection types have foldLeft and foldRight functions. A left fold on lists was presented earlier.

Foldable provides us with a host of useful functions defined on top of foldLeft and foldRight. Many of these are facsimiles of familiar functions from the class: find, exists, toList and so on.

In addition to these familiar functions, Dogs provides two functions that make use of monoids: function fold combines all elements in the list using their monoid; function foldMap maps a user-supplied function over the list and combines the results using a monoid:

assertEquals(15, ListF.of(1, 2, 3, 4, 5).fold(intAddMonoid))
assertEquals(13, ListF.of("Ken", "John", "Jessie").foldMap(intAddMonoid){str: String -> str.length})
assertEquals("1223", ListF.of(1, 22, 3).foldMap(stringMonoid){n: Int -> n.toString()})

The intAddMonoid is a monoid over Ints with zero as its identity element and addition as the binary operation. The first assert adds the Ints in the list  to a start value zero. Function foldMap maps each element of the structure to a monoid, and combines the results. In the second assert we add the lengths of each string in the list. In the final assert each Int is converted to its string form then the strings are concatenated.



Traversables

Traversable structures are collections of elements that can be operated upon with an effectful visitor operation. The visitor function performs a side-effect on each element and composes those side effects whilst retaining the original structure. Since Kotlin does not support higher-kinded types we have various traverse operations named after the effectful type to be used. For example, the function traverseOption has the signature:

fun <A, B> List<A>.traverseOption(f: (A) -> Option<B>): Option<List<B>>

and traverses the List, applying the visitor function to every element and transforming each to an effectful Option type.

assertEquals(
    some(ListF.of(false, true, false, true)),
    ListF.of(1, 2, 3, 4).traverseOption{ n: Int -> some(isEven(n)) }
)

Sometimes you will be given a traversable that has effectful values already, such as a List<Option<A>>. Since the values themselves are effects, traversing with identity will turn the traversable inside out. This is result of the sequenceOption operation:

assertEquals(
    some(ListF.of(1, 2, 3, 4)),
    ListF.of(some(1), some(2), some(3), some(4)).sequenceOption()
)
assertEquals(
    none(),
    ListF.of(some(1), none(), some(3), some(4)).sequenceOption()
)

Here is the sequence operation with values of type Either:

assertEquals(
    right(ListF.of(1, 2, 3, 4)),
    ListF.of<Either<String, Int>>(right(1), right(2), right(3), right(4)).sequenceEither()
)
assertEquals(
    left("one"),
    ListF.of<Either<String, Int>>(left("one"), right(2), left("three"), right(4)).sequenceEither()
)

We can now traverse structures that contain strings parsing them into integers and accumulating failures with Either:

fun parseInt(str: String): Either<String, Int> =
    try {
        right(Integer.parseInt(str))
    } catch (ex: Exception) {
        left("parseInt: string $str not a parsable integer")
    }

assertEquals(
    right(ListF.of(12, 34, 56)),
    ListF.of("12", "34", "56").traverseEither{ str: String -> parseInt(str) }
)
assertEquals(
    left("parseInt: string ab not a parsable integer"),
    ListF.of("12", "ab", "56").traverseEither{ str: String -> parseInt(str) }
)



Case Study

We have seen the point-free style of programming as one in which functions do not identify the parameters on which they operate. Instead the definitions merely compose other function combinators. In this example we show a much more elaborate example. The solution is obtained by following the types when composing the functions. The solution is not optimal but demonstrates how such a development is followed.

The maximum segment sum problem computes the maximum of the sums of all segments in a sequence of integers. A segment is a contiguous sub-sequence of the integers. The sequence -1, 2, -3, 5, -2, 1, 3, -2, -2, -3, 6 has maximum sum 7, the sum of the segment 5, -2, 1, 3.

The problem is specified by:

maximumSegmentSum = segments pipe mapSum pipe maximum

where segments returns a list of all segments of a list with the type List<List<Int>>. The function mapSum computes the sum of each of the segments delivering a List<Int>. Function maximum then finds the greatest in this list.

The function segments can be defined in a number of ways including:

segments = tails pipe mapInits pipe concat

where tails returns all the tail segments of a list. For example the tails of the list [1, 2, 3] is the nested list [[1, 2, 3], [2, 3], [3], []]. Function mapInits is then applied to this List<List<Int>>. Function inits is the counterpart to function tails. When mapInits is applied to the nested list produced from tails we get [[[], [1], [1, 2], [1, 2, 3]], [[], [2], [2, 3]], [[], [3]], []]. Finally, function concat flattens the nested list by one level producing [[], [1], [1, 2], [1, 2, 3], [], [2], [2, 3], [], [3]].

val maximum: (List<Int>) -> Int = {list -> list.foldRight1{m -> {n -> max(m, n)}}}
val mapSum: (List<List<Int>>) -> List<Int> = {list -> list.map(List<Int>::sum)}
val mapInits: (List<List<Int>>) -> List<List<List<Int>>> = {list -> list.map(List<Int>::inits)}
val tails: (List<Int>) -> List<List<Int>> = {list -> list.tails()}
val concat: (List<List<List<Int>>>) -> List<List<Int>> = {list -> shallowFlatten(list)}
val segments: (List<Int>) -> List<List<Int>> = tails pipe mapInits pipe concat
val maximumSegmentSum: (List<Int>) -> Int = segments pipe mapSum pipe maximum

assertEquals(7,     maximumSegmentSum(ListF.of(-1, 2, -3, 5, -2, 1, 3, -2, -2, -3, 6)))



Performance

Linked lists are extremely efficient for sequentially accessing elements from the head end or for adding elements to that head end. They are not designed for accessing elements by index or for some bulk operation like reversing a list.

Immutable and persistent linked lists are missing from Kotlin. As we have shown they are easy to implement and it would benefit Kotlin if they were provided as standard. They are much better for some use cases than what Kotlin offers. For example, adding an element in front of a non-modifiable list (kotlin.collections.List interface) is painful and extremely inefficient. Adding an element to the front raises many awkward questions: how should we do it?; should we use a mutable list?; what if we need to share the list or reuse it later?; and should we make defensive copies?

Using a non-modifiable list as if it were immutable, or using a mutable list when you need an immutable one, can be more problematic than using an immutable, data sharing linked lists. In the following the first two asserts reveal that the MutableList member function subList returns a view of the portion of mut and any change to either mut or sub is reflected in the other. This is a consequence of MutableList having an ArrayList implementation. The solution would be to make a defensive copy of sub (which is slow) before changing the first element.

val mut: MutableList<Int> = mutableListOf(0, 1, 2, 3, 4, 5, 6, 7, 8, 9)
val sub: MutableList<Int> = mut.subList(0, 4)
assertEquals(mutableListOf(0, 1, 2, 3), sub)

sub[0] = 99
assertEquals(mutableListOf(99, 1, 2, 3, 4, 5, 6, 7, 8, 9), mut)



fun List<Int>.subList(from: Int, to: Int): List<Int> =
    this.drop(from).take(to - from)

val list: List<Int> = ListF.of(0, 1, 2, 3, 4, 5, 6, 7, 8, 9)
val subList: List<Int> = list.subList(0, 4)
assertEquals(ListF.of(0, 1, 2, 3), subList)

val subList99: List<Int> = subList.update(0, 99)
assertEquals(ListF.of(0, 1, 2, 3), subList)
assertEquals(ListF.of(99, 1, 2, 3), subList99)
assertEquals(ListF.of(0, 1, 2, 3, 4, 5, 6, 7, 8, 9), list)

The final four asserts demonstrate that immutable persistent linked lists are much safer and much more performant.

A benchmark comparing our immutable persistent linked list, Kotlin's non-modifiable list (read-only interface), Kotlin's mutable list and the PersistentList class from the Immutable Collections Library for Kotlin at [https://github.com/Kotlin/kotlinx.collections.immutable]. The timings measure (in ms) inserting 10000 new elements to the start of the collection:

Insert 10000 elements to the start of our immutable/persistent List:    0.5
Insert 10000 elements to the start of Kotlin's MutableList type:    10.8
Insert 10000 elements to the start of Kotlin's List type:                        435.3
Insert 10000 elements to the start of kotlinx PersistentList type:        291.2

The table reveals how slow is Kotlin's non-modifiable List type.

In the next table we show the timings to filter all the even valued integers from integer lists containing 1, 2, 3, ..., 10000. Our linked list is slower by a factor of 2.

Filter all even-valued integers from our immutable/persistent List: 0.71
Filter all even-valued integers from Kotlin's MutableList type: 0.33
Filter all even-valued integers from Kotlin's List type: 0.33
Filter all even-valued integers from kotlinx PersistentList type: 0.29

The following table presents the timings to insert 10000 new elements to the end of a collection. Clearly, linked lists are not suited for operations of this type..

Insert 10000 elements to the end of our immutable/persistent List:    1547.8
Insert 10000 elements to the end of Kotlin's MutableList type:        0.4
Insert 10000 elements to the end of Kotlin's List type:                        451.8
Insert 10000 elements to the end of kotlinx PersistentList type:        2.5



The code for the Dogs library can be found at:

https://github.com/KenBarclay/TBA