Monday, February 28, 2022

Kotlin: Validation type

Kotlin: Validation type

Any system will have errors and how we handle them is important. Consistent and transparent error handling is critical to any production ready system. Here we explore the functional approach to error handling, developing techniques to capture errors elegantly without contaminating your code with ugly conditionals and try/catch clauses. We consider the Validation class included in the custom library Dogs.



Motivation

Validation can be found in different forms when error(s) are detected. Validation can return immediately when the first error (or exception) has been encountered; the validation result may or may not contain the validation error or exception message. This scenario is called fast failing validation, in which the validation does not validate all the business rules and only zero or one message is returned, and the process shall be cut short upon first error. This simple form of validation is sometimes considered insufficient, as a full validation is not carried out with accumulated errors. See the Either type blog on fast failing.

Validating all the business rules, and accumulating errors, is very different from fast failing validation. Applicative functors are proposed, and they have effectively solved accumulation problems. The Validation data type has an instance of applicative that accumulates on the error side.

The Validation data type is isomorphic to Either. Validation is a container type with two type parameters: A Validation<E, A> instance can contain either an instance of E, or an instance of A (a disjoint type). Validation has exactly two sub types, Failure and Success. If a Validation<E, A> object contains an instance of E, then the Validation is a Failure. Otherwise it contains an instance of A and is a Success. The Validation sealed class is:

sealed class Validation<out E, out A> {

    class Failure<out E, out A>internal constructor(val value: E) : Validation<E, A>()
    class Success<out E, out A>internal constructor(val value: A) : Validation<E, A>()

}

As before, the constructors for Failure and Success are internal and we create instances with the factory functions:

val failed: Validation<String, Int> = failure("bug")
val succeeded: Validation<String, Int> = success(25)

assertEquals(true,      failed.isFailure())
assertEquals(false,     succeeded.isFailure())

Validation is a general purpose type for use with error handling:

fun parseInt(text: String): Validation<String, Int> =
    try {
        success(Integer.parseInt(text))
    } catch (ex: NumberFormatException) {
        failure("parseInt: bad number format: $text")
    }

assertEquals(success(123),                                  parseInt("123"))
assertEquals(failure("parseInt: bad number format: ken"),   parseInt("ken"))



Pattern matching

Like in the Option class the ValidationFailure and Success type names can be used in application code. The type names can be used in user-defined functions. The function swap supports mapping over both arguments at the same time. Its signature is shown in the following code.

fun <E, A> Validation<E, A>.swap(): Validation<A, E> {
    return when(this) {
        is Failure -> success(this.value)
        is Success -> failure(this.value)
    }
}   // swap

val failed: Validation<String, Int> = failure("bug")
val succeeded: Validation<String, Int> = success(25)

assertEquals(success("bug"),    failed.swap())
assertEquals(failure(25),       succeeded.swap())


The function swap flips the Failure/Success values.



Some Validation operations

The Validation class includes many of the functions we saw with the Either class. Validation class functions include exists, fold and map. The map function has the signature:

fun <B> map(f: (A) -> B): Validation<E, B>

and applies function if this is a Success.

val failed: Validation<String, Int> = failure("bug")
val succeeded: Validation<String, Int> = success(25)

assertEquals(failure("bug"),    failed.map(isEven))
assertEquals(success(false),    succeeded.map(isEven))
assertEquals(success(true),     succeeded.map(isOdd))


The Validation type is also an applicative functorThe Validation type as an applicative functor includes the extension function ap:

fun <E, A, B> Validation<E, A>.ap(se: Semigroup<E>, f: Validation<E, (A) -> B>): Validation<E, B>

which applies the wrapped function to the receiver Validation. If two or more errors are encountered they are combined using the semigroup instance.

val failed: Validation<String, Int> = failure("bug")
val succeeded: Validation<String, Int> = success(25)

assertEquals(failure("bug"),      failed.ap(stringSemigroup, success{n: Int -> isEven(n)}))
assertEquals(success(false),      succeeded.ap(stringSemigroup, success{n: Int -> isEven(n)}))


When mapping functions over the Validation functor with fmap/map we have provided a unary function for the mapping. What do we get if we provide a curried binary function? The answer is we get a unary function wrapped in a Validation as shown for the value binding vf and vs.

val failed: Validation<String, Int> = failure("bug")
val err: Validation<String, Int> = failure("error")
val succeeded: Validation<String, Int> = success(25)

val vf: Validation<String, (Int) -> Int> = {m: Int -> {n: Int -> m + n}} dollar failed
val vs: Validation<String, (Int) -> Int> = {m: Int -> {n: Int -> m + n}} dollar succeeded

assertEquals(success(50),           succeeded.ap(stringSemigroup, vs))
assertEquals(failure("errorbug"),   err.ap(stringSemigroup, vf))

Suppose we need to validate a person's name and age where the values are supplied through two text values. The classes we wish to construct from the text are:

data class Name(val name: String)
data class Age(val age: Int)
data class Person(val name: Name, val age: Age)


A name is valid provided it is non-empty and an age is valid if it is non-negative. The checks are made with the functions makeName and makeAge:

fun makeName(name: String): Validation<String, Name> =
    if (name == "") failure("Name is empty") else success(Name(name))

fun makeAge(age: Int): Validation<String, Age> =
    if (age < 0) failure("Age out of range") else success(Age(age))


We are using Validation with the result of these two validation functions. To combine the results we use the applicative fmap2 function, a binary version of fmap:

fun makePerson(name: String, age: Int): Validation<String, Person> =
    fmap2(stringSemigroup, makeName(name), makeAge(age)){name: Name -> {age: Age -> Person(name, age)}}

In the first assert we have an invalid age. In the second assert we have an invalid name and an invalid age. The string semigroup concatenates the two failure messages. The final assertion makes a valid Person instance.

assertEquals(failure("Age out of range"),               makePerson("Ken", -5))
assertEquals(failure("Age out of rangeName is empty"),  makePerson("", -5))
assertEquals(success(Person(Name("Ken"), Age(25))),     makePerson("Ken", 25))



Case Study

We repeat here the second case study introduced in the Either type blog. In this a user form comprises text fields for the user last name, the first name and the email. The requirements are that the last name is fully capitalized, the first name is capitalized on the initial letter, and that the email be correctly structured. After accepting the three fields, objects for the classes LastNameFirstName and Email are constructed then used to create an instance of the Person class.

The standard technique for the construction of objects that need to honor a set of constraints is the smart constructor idiom. This is illustrated for the class LastName, using Validation for the error handling. Since it is meaningless to have a Failure with no errors then we use the type Failure<NonEmptyList<Error>, A> where Error is some error type. We start with:

typealias Error = String
typealias ValidationNelError<A> = ValidationNel<Error, A>

Typically, Error would be a sealed data class with sub-types for the various error types. The type name ValidationNel is provided by Dogs and is an alias for Validation<NonEmptyList<E>, A>. The application alias ValidationNelError provides a compact representation for Validation<NonEmptyList<Error>, A>.

The class LastName is:

data class LastName(val lastName: String) {

    companion object {

        fun create(lastName: String): ValidationNelError<LastName> {
            return if (lastName.length < 2)
                failureNel("Last name too short: $lastName")
            else if (!regex.matches(lastName))
                failureNel("Last name malformed: $lastName")
            else
                successNel(LastName(lastName))
        }   // create

        private val regex: Regex = Regex("[A-Z][A-Z]*")
    }

}   // LastName

The same scheme is used for the classes FirstName and Email.

Since the Either class is a monad we used the monadic bind to implement the smart constructor for the Person class. However, the Validation class is not a monad and we have no bind function. The issue is that the applicative functor implied by Validation being a monad does not equal the applicative functor defined on Validation.

We also know that the monad fails fast and can only identify the first error. Applicatives allow us to compose independent operations and evaluate each one. Even if an intermediate evaluation fails. This allows us to collect error messages instead of returning only the first error that occurred. A classic example where this is useful is the validation of user input. We would like to return a list of all invalid inputs rather than aborting the evaluation after the first error. We see this in the Person class:

data class Person(val lastName: LastName, val firstName: FirstName, val email: Email) {

    companion object {

        fun create(lastName: String, firstName: String, email: String): ValidationNelError<Person> {
            val vLastName: ValidationNelError<LastName> = LastName.create(lastName)
            val vFirstName: ValidationNelError<FirstName> = FirstName.create(firstName)
            val vEmail: ValidationNelError<Email> = Email.create(email)

            val ctor: (LastName) -> (FirstName) -> (Email) -> Person = C3(::Person)
            return ctor dollar vLastName appliedOver vFirstName appliedOver vEmail
        }   // create

    }

}   // Person

and in the following three examples:

assertEquals(
    success(Person(LastName("BARCLAY"), FirstName("Ken"), Email("me@gmail.com"))),
    Person.create("BARCLAY", "Ken", "me@gmail.com")
)
assertEquals(
    failure(NonEmptyListF.singleton("First name malformed: kenneth")),
    Person.create("BARCLAY", "kenneth", "me@gmail.com")
)
assertEquals(
    failure(NonEmptyListF.of("First name malformed: kenneth", "Last name malformed: Barclay")),
    Person.create("Barclay", "kenneth", "me@gmail.com")
)

Observe how the third example identifies the malformed first name and the malformed last name.



The code for the Dogs library can be found at:

https://github.com/KenBarclay/TBA


Sunday, February 27, 2022

Kotlin: Either type

Kotlin: Either type

Just as its english counterpart describes, Either can represent one value or another. Scenarios where this might be the return value from a function where you may get the successful result value or you might get an error value. The class is included in the custom library Dogs.



The Option type allows us to represent failures and exceptions with ordinary values and provide functions that abstract out common patterns of  error handling and recovery. The functions bind and ap over the Option type did just this. One issue with the Option type is that it does not report what is wrong in an exceptional case. All we have is None, indicating that there is no value that can be delivered.

Either is a container type with two type parameters: An Either<A, B> instance can contain either an instance of A, or an instance of B (a disjoint type)Either has exactly two sub types, Left and Right. If an Either<A, B> object contains an instance of A, then the Either is a Left. Otherwise it contains an instance of B and is a Right. The Either sealed class is:

sealed class Either<out A, out B> {

    class Left<out A, out B>internal constructor(val value: A) : Either<A, B>()

    class Right<out A, out B>internal constructor(val value: B) : Either<A, B>()

}

As before, the constructors for Left and Right are internal and we create instances with the factory functions:

assertEquals(true,      left<String, Int>("ken").isLeft())
assertEquals(false,     right<String, Int>(25).isLeft())

Either is a general purpose type for use whenever you need to deal with a result that can be one of two possible values. Nevertheless, error handling is a popular use case for it, and by convention Left represents the error case and Right the success value:

fun parseInt(text: String): Either<String, Int> =
    try {
        right(Integer.parseInt(text))
    } catch (ex: NumberFormatException) {
        left("parseInt: bad number format: $text")
    }

assertEquals(right(123),                                parseInt("123"))
assertEquals(left("parseInt: bad number format: ken"),  parseInt("ken"))



Pattern matching

Like the Option class the Either, Left and Right type names can be used in application code. The type names can be used in user-defined functions. The function bimap supports mapping over both arguments at the same time. Its signature is shown in the following code.

fun <A, B, C, D> bimap(either: Either<A, B>, f: (A) -> C, g: (B) -> D): Either<C, D> =
    when (either) {
        is Either.Left -> left(f(either.value))
        is Either.Right -> right(g(either.value))
    }   // bimap

class DomainError(val text: String?) : Exception(text)

val currentDate: Either<Exception, Calendar> =     // simple definition
    right(GregorianCalendar(2020, 4, 1))

val res: Either<Exception, Long> = bimap(currentDate,
    {ex -> DomainError(ex.message)}, 
    {date -> date.timeInMillis}
)
assertEquals(right(1588287600000),   res)


The function f is applied to the wrapped value if it is a Left instance and the function g is applied to the wrapped value if it is a Right instance. The val binding for currentDate wraps a calendar value in a Right. Applying the bimap function to it we make a DomainError if the first parameter is a Left. If the first parameter is a Right we convert the wrapped calendar to its milliseconds.



Some Either operations

The Either class includes many of the functions we saw with the Option class. Either class functions include exists, fold, getOrElse and map. The fold function has the signature:

fun <C> fold(fa: (A) -> C, fb: (B) -> C): C

and applies function fa if this is a Left or function fb if this is a Right. A consequence is that bimap is a special case of fold as in:

class DomainError(val text: String?) : Exception(text)

val currentDate: Either<Exception, Calendar> =     // simple definition
        right(GregorianCalendar(2020, 4, 1))

val res: Either<Exception, Long> = currentDate.fold(
    {ex -> left(DomainError(ex.message))},
    {date -> right(date.timeInMillis)}
)
assertEquals(right(1588287600000),   res)


The Either type is right-biased, so functions such as map and bind apply only to the Right case. This right-bias makes Either convenient in, for example, a monadic context.

assertEquals(left("Ken"),  left<String, Int>("Ken").map{n -> 2 * n})
assertEquals(right(4),     right<String, Int>(2).map{n -> 2 * n})




Case Study

A Project represents some work hosted on a repository service such as GitHub. Each Project records its URL and the list of contributors. Our purpose is to identify those projects with no contributors and in need of support, and at the same time a list of all contributors. Here is how we process our project list:

class Project(val url: URL, val contributors: List<String>)

val git: List<Project> = ListF.of(
    Project(URL("http://github.com/project/ai"),           ListF.of()),
    Project(URL("http://github.com/project/algol"),        ListF.of("EdsgerD", "JohnB", "PeterN", "KenB")),
    Project(URL("http://github.com/project/antlr"),        ListF.of("TerranceP")),
    Project(URL("http://github.com/project/data"),         ListF.of("KenB", "JohnS")),
    Project(URL("http://github.com/project/ml"),           ListF.of()),
    Project(URL("http://github.com/project/system"),       ListF.of("BrianK", "DennisR"))
)

val checked: List<Either<URL, List<String>>> =
    git.map{project ->
        if (project.contributors.isEmpty())
            left<URL, List<String>>(project.url)
        else
            right<URL, List<String>>(project.contributors)
    }

val support: List<Option<URL>> = checked.bind{either -> ListF.singleton(either.fold({ url -> some(url)}, {none()}))}
val supportReq: List<Option<URL>> = support.filter{option: Option<URL> -> (option.isDefined())}
val supportRequired: Option<List<URL>> = supportReq.sequenceOption()

assertEquals(
    ListF.of(URL("http://github.com/project/ai"), URL("http://github.com/project/ml")),
    supportRequired.getOrElse(ListF.empty())
)

val supporters: List<String> = checked.bind{either -> either.fold({ListF.empty<String>()}, {names -> names})}.removeDuplicates()

assertEquals(
    ListF.of("EdsgerD", "JohnB", "PeterN", "KenB", "TerranceP", "JohnS", "BrianK", "DennisR"),
    supporters
)


We create in checked a list of Either values, with the Left instances representing unsupported projects and the Right instances containing the contributors. The sequence of val bindings support is a List of Options wrapping the URL; supportReq is also a List of Options but without any None instances; supportRequired is a List of URLs wrapped in an Option. The val binding supporters is the List of contributors.



Case Study

A user input form comprises text fields for the user last name, the first name and the email. The requirements are that the last name is fully capitalized, the first name is capitalized on the initial letter, and that the email be correctly structured. After accepting the three text fields for the classes, LastNameFirstName and Email are constructed then used to create an instance of the Person class.

The standard technique for the construction of objects that need to honor a set of constraints is the smart constructor idiom. This is illustrated for the class LastName, using Either for the error handling. We start with:

typealias Error = String

data class LastName(val lastName: String) {

    companion object {

        fun create(lastName: String): Either<Error, LastName> {
            return if (lastName.length < 2)
                left("Last name too short: $lastName")
            else if (!regex.matches(lastName))
                left("Last name malformed: $lastName")
            else
                right(LastName(lastName))
        }   // create

        private val regex: Regex = Regex("[A-Z][A-Z]*")
    }

}   // LastName

Typically, Error would be a sealed class with sub-types for the various error types. Also we might make the constructor for LastName private so that users must use function create.

The same scheme is used for the classes FirstName and Email.

Since the Either class is a monad we used the monadic bind to implement the smart constructor for the Person class:

data class Person(val lastName: LastName, val firstName: FirstName, val email: Email) {

    companion object {

        fun create(lastName: String, firstName: String, email: String): Either<Error, Person> {
            val eLastName: Either<Error, LastName> = LastName.create(lastName)
            val eFirstName: Either<Error, FirstName> = FirstName.create(firstName)
            val eEmail: Either<Error, Email> = Email.create(email)

            return eLastName.bind{lastName ->
                eFirstName.bind{firstName ->
                    eEmail.bind{email ->
                        inject(Person(lastName, firstName, email))
                    }
                }
            }
        }   // create

    }

}   // Person

We know that the monad fails fast and so can only identify the first error. We see this in the second and third assert:

assertEquals(
    right(Person(LastName("BARCLAY"), FirstName("Ken"), Email("me@gmail.com"))),
    Person.create("BARCLAY", "Ken", "me@gmail.com")
)
assertEquals(
    left("First name malformed: kenneth"),
    Person.create("BARCLAY", "kenneth", "me@gmail.com")
)
assertEquals(
    left("Last name malformed: Barclay"),
    Person.create("Barclay", "kenneth", "me@gmail.com")
)



The code for the Dogs library can be found at:

https://github.com/KenBarclay/TBA

Monday, August 17, 2020

Kotlin: Parser Combinator

 Kotlin: Parser Combinator

A parser combinator is a higher-order function that accepts several parsers as input and returns a new parser as its output. In this context, a parser is a function accepting strings as input and returning some structure as output, typically a parse tree. Parser combinators enable a recursive descent parsing strategy that facilitates modular piecewise construction and testing. This parsing technique is called combinatory parsing.

In a programming language such as Kotlin that has first-class functions, parser combinators can be used to combine basic parsers to construct parsers for more complex rules. For example, a production rule of a context-free grammar (CFG) may have one or more alternatives and each alternative may consist of a sequence of non-terminal(s) and/or terminal(s). If a simple parser is available for each of these alternatives, a parser combinator can be used to combine each of these parsers, returning a new parser which can recognise any or all of the alternatives.

This blog describes a parser combinator written in Kotlin. The features of Kotlin made it possible to mirror much of the original Haskell code from which it was developed.



ParsecT

ParsecT is the main parser and the central data type in this library. The class is parametrized and defined as:

class ParsecT<TOK, U, A, Z>

where TOK is the type of elements in the input stream (commonly Char), U is the user state, A is the return type, and Z is the value type when an instance is invoked. The class is represented by a single function type property and the member functions deliver a new instance with a suitably modified function property. Hence the class represents a lazy object which only executes when it is finally invoked.

The most convenient way to define various parsers is to define a custom type synonym for your parser. The library defines a typealias entitled Parser:

typealias Parser<A> = ParsecT<Char, Unit, A, Unit>

where the return type A is still a parameter and the input stream is a sequence of Chars.

The object declaration:

object ParsecTF

includes a set of related functions that either operate on one or more ParsecT parameters or delivers a ParsecT instance.



Object declarations CombinatorF, CharacterF and StringF

The object declaration CombinatorF includes a number of functions used to combine parsers. For example, function many parses none or more occurrences of a given parser. Its signature is:

fun <TOK, U, A, Z> many(parsec: ParsecT<TOK, U, A, Z>): ParsecT<TOK, U, List<A>, Z>

Note that the return type for the generated parser is a List<A> where the given parser returns the type A.

The object declaration CharacterF includes a number of val bindings and function declarations for parsing individual characters. The function declarations in CharacterF are like function oneOf which returns a parser that succeeds if the current input character is in the sequence of character parameters. Its signature is:

fun oneOf(vararg chars: Char): Parser<Char>

The val bindings are predefined character parsers. For example, the parser colon will recognize a ':' character in the input. The parser dollar will recognize a '$' character in the input. The parser digit will recognize a single decimal digit in the input.

The object declaration StringF includes the function parse to execute a parser against some input given as a String. The declaration is:

fun <A> parse(parsec: Parser<A>, name: String, text: String): Either<ParseError, A>

The parsec parameter is the parser to execute, the text is the input the parser operates on and name is used to describe the source. The name might be the name of the file from which the text is obtained.

The return type is either a parse error or the return value from the parser. If the parser fails then the return is a Left instance wrapped around a ParseError. If the parser succeeds then the return is a Right wrapped around the parser return.



The Basics

Most of the val bindings in CharacterF are defined with the member function satisfy:

fun satisfy(predicate: (Char) -> Boolean): Parser<Char>

This looks at the next character from the current input, and if the predicate function returns true for this character, it 'pops' it from the input and returns it. In this way, the current position in the input string is tracked behind the scenes. A sample use of this function is:

val pA: Parser<Char> = satisfy{ch: Char -> (ch == 'A')}
assertEquals(right('A'),  parse(pA, "", "ABC"))
assertTrue(parse(pA, "", "BC").isLeft())
println(parse(pA, "", "BC"))

with the final print producing:

Left(ParseError(sourcePosition=(1, 1), messages=<[SysUnExpect: B]>))

with the error on the first character (of the first line) with the unexpected character 'B'.



CharacterF.character

This function returns a parser that matches the current character in the text that we are parsing to whatever character you provide it. Its signature is:

fun character(ch: Char): Parser<Char>

Sample use of this function is:

val pCharacter: Parser<Char> = character('H')
assertEquals(right('H'),    parse(pCharacter, "", "Hello"))
assertTrue(parse(pCharacter, "", "Goodbye").isLeft())

Here, running character('H') returns a parser that will match the single character 'H'. If we use it in parsing our string, which begins with an H, it is perfectly happy. If we try looking for any letter that isn't 'H', we fail. The result is always of type Either<ParseError, Char>; we get back a Right result if the rule was successful, and Left error if the rule fails.



CharacterF.string

This function returns a parser that attempts to match the given string of characters:

val pString: Parser<String> = string("Hello")
assertEquals(right("Hello"),    parse(pString, "", "Hello world"))
assertTrue(parse(pString, "", "Help").isLeft())

The parser will consume characters from the input one by one until all characters match or one of them is not as expected. Since both of the above attempts begin with 'Hel', the error complains about an unexpected 'p'. This consuming of characters will become significant when multiple rules are chained together.



CharacterF.oneOf

Sometimes we want to match a range of characters; that's where CharacterF.oneOf comes in handy. Similar to CharacterF.character, except you provide this function a list of characters you're OK with matching:

val pVowels: Parser<Char> = oneOf('a', 'e', 'i', 'o', 'u')
assertEquals(right('o'),    parse(pVowels, "", "object"))
assertTrue(parse(pVowels, "", "function").isLeft())

Here, we can see that the parser will consume any single character that is a vowel.

Parsec comes with pre-prepared rules which do just that, for example CharacterF.anyChar which will consume one of anything:

val pSymbols: Parser<Char> = anyCharacter
assertEquals(right('H'),    parse(pSymbols, "", "Hello"))
assertEquals(right('+'),    parse(pSymbols, "", "+-*/"))

The rule CharacterF.letter will happily consume any lower or uppercase letter, CharacterF.lower will consume any lowercase letter, CharacterF.digit will consume any single decimal digit, and CharacterF.alphaNum any letter or digit.



CombinatorF.many

There are times when you want to parse more than just one letter. CombinatorF.many tries to run whatever rule it is given as an argument repeatedly until it fails. Even if the rule matches no times, many will return without error, but just give back an empty result. Let's see how it's used:

assertEquals(right(ListF.of('h', 'h', 'h')),                    parse(many(character('h')), "", "hhheee"))
assertEquals(right(ListF.of()),                                  parse(many(character('e')), "", "hhheee"))
assertEquals(right(ListF.of('h', 'h', 'h', 'e', 'e', 'e')),    parse(many(letter), "", "hhheee"))

As we see,CombinatorF.many can't fail, it will match the rule it's given zero times and just return nothing. It'll go as far as it can and give you back everything that matched. CombinatorF.many1 is similar except that the rule provided must match at least once for it to return successfully.



ParsecT.fmap

The ParsecT data type is a functor by supporting the member function fmap which, given any types A and B lets you apply the function (A) -> B to a ParsecT<TOK, U, A, Z> delivering a ParsecT<TOK, U, B, Z>. Thus we can use fmap to transform the return value of a parser:

val pTab: Parser<Char> = tab
val pSpaces: Parser<String> = pTab.fmap{ch: Char -> "    "}
assertEquals(right("    "),     parse(pSpaces, "", "\t"))
assertTrue(parse(pSpaces, "", "A").isLeft())

Combining fmap, many1 and the single character parser digit we can parse a series of decimal digits into an integer:

val pInteger: Parser<Int> = many1(digit).fmap{digs: List<Char> ->
    digs.foldLeft(0){acc: Int -> {ch: Char -> 10 * acc + (ch.toInt() - '0'.toInt())}}
}
assertEquals(right(123),    parse(pInteger, "", "123"))
assertTrue(parse(pInteger, "", "ABC").isLeft())



Combining rules

The simplest way to combine parsers is to execute them in succession. ParsecT is a monad, and monadic bind is exactly what we use for sequencing our parsers. In the following example we sequence three character parsers to recognize ABC:

val pA: Parser<Char> = character('A')
val pB: Parser<Char> = character('B')
val pC: Parser<Char> = character('C')
val pSequence: Parser<Triple<Char, Char, Char>> =
        pA.bind{chA: Char ->
            pB.bind{chB: Char ->
                pC.bind{chC: Char ->
                    inj(Triple(chA, chB, chC))
                }
            }
        }

assertEquals(right(Triple('A', 'B', 'C')),  parse(pSequence, "", "ABC"))

The object declaration ParsecTF includes the function mapM3 which supports an alternative syntax for sequential execution of monadic parsers:

val pA: Parser<Char> = character('A')
val pB: Parser<Char> = character('B')
val pC: Parser<Char> = character('C')
val pSequence: Parser<Triple<Char, Char, Char>> =
        mapM3(pA, pB, pC){chA: Char -> {chB: Char -> {chC: Char -> Triple(chA, chB, chC)}}}

assertEquals(right(Triple('A', 'B', 'C')),  parse(pSequence, "", "ABC"))



Kotlin package declaration parsing

In this section we develop a simple parser for a Kotlin package declaration. A typical package declaration is:

package com.adt.kotlin.parsec

comprising the package keyword, one or more spaces and a qualified name. First we define the parsers for the keyword and the dot separator:

val pPackageKeyword: Parser<String> = StringF.lexeme(string("package"))
val pDot: Parser<Char> = character('.')

In the original parsec documentation, one of the concepts mentioned is the idea of 'lexeme' parsing. This is a style in which every token parser should also consume and ignore any trailing whitespace. This is a simple convention which with a bit of care allows skipping whitespace exactly once wherever it needs to be skipped. The function lexeme performs this task, skipping any whitespace following the package keyword.

An identifier must start with a letter or underscore, and then followed by zero or more letters, underscores or digits in any combination:

val leading: Parser<Char> = satisfy{ch: Char -> Character.isJavaIdentifierStart(ch)}
val trailing: Parser<Char> = satisfy{ch: Char -> Character.isJavaIdentifierPart(ch)}
val pIdentifier: Parser<String> =
        leading.bind{leadingChar: Char ->
            many(trailing).bind{trailingChars: List<Char> ->
                inj(ListF.cons(leadingChar, trailingChars).charsToString())
            }
        }   // pIdentifier

A qualified identifier comprises one or more identifiers separated by a dot.

val pQualifiedName: Parser<QualifiedName> =
        CombinatorF.sepBy1(pIdentifier, pDot).fmap{names: List<String> -> QualifiedName(names)}

Function sepBy1 takes two arguments, a rule to parse items, and a rule to parse separators and returns a list of whatever the rule returns. Using fmap we convert this into a QualifiedName:

data class QualifiedName(val names: List<String>)

The parser for a package declaration is then:

val pPackageDeclaration: Parser<PackageDeclaration> =
        pPackageKeyword.bind{_: String ->
            pQualifiedName.bind{qualifiedName: QualifiedName ->
                inj(PackageDeclaration(qualifiedName))
            }
        }   // pPackageDeclaration

where:

data class PackageDeclaration(val qualifiedName: QualifiedName)

We demonstrate it with:

assertEquals(right(PackageDeclaration(QualifiedName(ListF.of("test", "kotlin")))),
    parse(pPackageDeclaration, "", "package test.kotlin"))
assertTrue(parse(pPackageDeclaration, "", "package test.*").isLeft())
assertTrue(parse(pPackageDeclaration, "", "package").isLeft())
assertTrue(parse(pPackageDeclaration, "", "import test.kotlin").isLeft())



The code for Parsec can be found at:

https://github.com/KenBarclay/Parsec