Thursday, May 21, 2020

Kotlin: Property Based Testing

Kotlin: Property Based Testing

In this blog we introduce property based testing, how it works, show where it might fit into your testing strategy, and walk you through building property based tests with the custom KwikCheck testing framework. In the follow-on blog we look at property based testing of immutable and persistent data structures and property based testing of stateful systems.


In traditional unit testing, you run your code with specific inputs and check that you get the expected outputs. This means you need to anticipate possible problematic inputs in advance. In the following illustration we unit test a newly developed sort function by trying it with a list of mixed values; a list where the values are already in sort order; a list with duplicates present; a singleton list and an empty list. With the correct results produced we declare our function correct.

sort([4, 2, 7, 3, 1]) == [1, 2, 3, 4, 7]
sort([1, 2, 3, 4, 7]) == [1, 2, 3, 4, 7]
sort([4, 2, 7, 2, 1]) == [1, 2, 2, 4, 7]
sort([7]) == [7]
sort([]) == []


One of the challenges of traditional unit testing is that it represents example based testing. Each unit test essentially establishes specific input data points, runs the code under test, and then checks the output. Writing many of these tests does not mean we will account for all possible behaviors. As the complexity of our software grows so does the probability that we will fail to account for a key scenario.

Property-based testing is a technique where your tests describe the properties and behavior you expect your code to have, and the testing framework tries to find inputs that violate those expectations. Rather than testing a function against specific inputs we try to reason about its behavior over a range of inputs.

Test properties are presented as logical propositions. For example, the stack data type is our standard last-in-first-out data store. Operation push injects a new element on to the stack top, while operation top delivers the item at the stack top. For all stack st and all element el the following proposition is a statement that affirms or denies the predicate:

top(push(el, st)) == el

We test this property embedded in a Unit test framework. The forAll function accepts a generator that produces a randomized stack containing randomized integer values, and a generator that delivers randomized integer values. The generated stack and the generated integer are captured in the lambda parameter to forAll as st and el respectively. Function prop expects a predicate for our logical proposition and wraps it into a Property instance. Function check then runs the test and delivers a CheckResult value which, since we are using a Unit test harness, we can complete with an assert.

@Test fun stackOperation() {
    val property = forAll(genStack(genInt), genInt) { st, el ->
        prop(st.push(el).top() == el)
    }
    val checkResult: CheckResult = property.check()
    //assertTrue(checkResult.isPassed())
    println(checkResult.summary())
}

However, for the purpose of this blog, it is more convenient to print the result from the function summary:

OK, passed 100 tests

which informs us that 100 (the default) tests were generated and that the property holds.

Property-based testing was popularized by the Haskell library QuickCheck. You specify properties of your code that should hold regardless of input. QuickCheck then randomly generates the input data to try to find a case for which the property fails. This form of testing became known as property based testing and is actually the principal form of testing in the Haskell ecosystem.

The KwikCheck test framework for Kotlin (as described here) is modeled after the QuickCheck framework. Property-based testing is generative testing. You do not supply specific example inputs with expected outputs as with unit tests. Instead, you define properties about the code and use the generative-testing engine to create randomized inputs to ensure the defined properties are correct. KwikCheck allows us to express these properties and generate the randomized inputs.

Automated test generation means, if you've got a function that operates on a particular input, you specify a property (or several properties) about that function, and then generators will give you variations on the input data to test the function, the edge cases, etc., and tell you whether the property holds. You can use the default generators for known types provided by KwikCheck, or define your own.

Test properties are presented as logical propositions. Examples include:

!P || !Q == !(P && Q)
(S1 + S2).endsWith(S2)
L.reverse().reverse() == L
L1.reverse().zip(L2.reverse()) == L1.zip(L2).reverse()     // UNTRUE

The first example is De Morgan's law which relates conjunctions and disjunctions of propositions through negation (where P and Q are boolean values). The second example claims that the concatenation of the strings S1 and S2 will end with the string S2. The third example specifies that if we reverse a list L then reverse the result it will deliver the original list. The fourth example, while untrue, can look correct especially if unit tested with a few examples. It involves zipping and reversing two lists.

Here is how KwikCheck expresses the second example above:

@Test fun endsWithOperation() {
    val genStr1: Gen<String> = genString
    val genStr2: Gen<String> = genString

    val prop: Property = forAll(genStr1, genStr2){str1, str2 ->
        prop((str1 + str2).endsWith(str2))
    }
    val checkResult: CheckResult = prop.check()
    println(checkResult.summary())
}

with the output:

OK, passed 100 tests

Here is the fourth example above:

@Test fun reverseZipOperation() {
    val genInt: Gen<Int> = choose(1, 10)
    val genIntList: Gen<List<Int>> = genList(genInt)
    val genAlpha: Gen<Char> = genAlphaUpperChar
    val genAlphaList: Gen<List<Char>> = genList(genAlpha)

    val property = forAll(genIntList, genAlphaList) { xs, ys ->
        prop(xs.reverse().zip(ys.reverse()) == xs.zip(ys).reverse())
    }
    val checkResult: CheckResult = property.check()
    println(checkResult.summary())
}


It is, of course, a failing test and KwikCheck displays the counter-example:

Falsified after 4 passed tests with (<[10]>, <[C, R]>)

which informs us that four generated tests were initially successful but when tested with the two lists shown the property failed. The example is using the immutable and persistent List class from the custom library KData noted at the end of this article.

Properties which are simple equations are conveniently represented by booleans. Many laws, however, only hold under certain conditions. KwikCheck provides an implication combinator to represent such conditionals. In the following the law that if m <= n then the max(m, n) is n:

@Test fun conditionalOperation() {
    val property = forAll(genInt, genInt){m, n ->
        (m <= n) implies prop(max(m, n) == n)
    }
    val checkResult: CheckResult = property.check()
    println(checkResult.summary())
}


The output is:

OK, passed 100 tests (105 discarded)

and reveals the condition was not met on 105 occasions.

We can also combine existing properties into new ones using the and and or combinators. In the following we incorrectly define function mul. Three separate properties are combined to test the result from applying function mul.

@Test fun combiningOperation() {
    fun mul(m: Int, n: Int): Int = m + n    // ERROR

    val property = forAll(choose(1, 10), choose(101, 110)){m, n ->
        val res = mul(m, n)
        prop(res >= m) and prop(res >= n) and prop(res < m + n)
    }
    val checkResult: CheckResult = property.check()
    println(checkResult.summary())
}

This fails immediately with m = 3 and n = 108:

Falsified after 0 passed tests with (3, 108)

Complex properties with many conditions can make it difficult to decide exactly what is wrong when a property fails. The next example repeats the previous example, labeling the individual properties. The output reveals the first test fails because the result does not sum.

@Test fun labellingOperation() {
    fun mul(m: Int, n: Int): Int = m + n    // ERROR

    val property = forAll(choose(1, 10), choose(101, 110)){m, n ->
        val res = mul(m, n)
        prop(res >= m).label("res >= m") and
                prop(res >= n).label("res >= n") and
                prop(res < m + n).label("result not sum")
    }
    val checkResult: CheckResult = property.check()
    println(checkResult.summary())
}

Output:

Falsified after 0 passed tests with (7, 109)
Labels of failing property: result not sum

KwikCheck supports collecting statistics about the test data that has been generated during property evaluation. This is useful for inspecting test case generation and ensuring all cases are tested and not simply trivial ones. The next example operates on an ordered list and the property we check is that if the list is ordered then the insert function maintains that ordering.

@Test fun collectSortedListOperation() {
    fun insert(n: Int, list: List<Int>): List<Int> =
        when (list) {
            is List.Nil -> ListF.singleton(n)
            is List.Cons -> if (n < list.head())
                ListF.cons(n, list)
            else
                ListF.cons(list.head(), insert(n, list.tail()))
        }   // insert

    fun isOrdered(list: List<Int>): Boolean =
        if (list.isEmpty())
            true
        else
            list.zip(list.drop(1)).forAll{pr -> (pr.first <= pr.second)}

    val property = forAll(genInt, genList(genInt)){n, list ->
        isOrdered(list) implies prop(isOrdered(insert(n, list))).collect(list.size())
    }
    val checkResult: CheckResult = property.check(maxDiscarded = 10000)
    println(checkResult.summary())
}

Here, the property is not tested with large lists since they are all randomly generated. Lists of size 0, 1, 2, 3 and 4 have been tested. The output suggests we need a special List generator that generates large ordered lists.

OK, passed 100 tests (3340 discarded)
Collected test data:
42% 0
31% 1
19% 2
7% 3
1% 4

The classify combinator does not change the meaning of a property but classifies some of the test cases. Here, those lists with size greater than 5 are classified as large and the others classified as small.

@Test fun classifyReverseListOperation() {
    val property = forAll(genList(genInt)) {list->
        prop(list.reverse().reverse() == list).classify(list.size() > 5, "large", "small")
    }
    val checkResult: CheckResult = property.check(maxDiscarded = 10000)
    println(checkResult.summary())
}

The output is:

OK, passed 100 tests

Collected test data:

73% large
27% small

The core of the KwikCheck library comprises the classes Gen<A> and Property. The class Gen<A> is the generator class for values of type A. The class Property represents an algebraic property that may be checked for its truth. Its most important member function is check which delivers a result.

Interestingly both the Gen and Property classes are immutable and have a single (different) function for their representation. Most member functions for these classes return a new instance with a composed function for its representation. The implementation functions for both Gen and Property do not execute until called by check in Property. Both classes use function representations in order to be lazy.

With KwikCheck you specify properties of your code that should hold regardless of input. KwikCheck then randomly generates the input data to try to find a case for which the property fails. In the following example we claim (incorrectly) that the largest example in a list of integers is the item at the front of the list.

@Test fun failingLargestOperation() {
    fun largest(list: List<Int>): Int = list.head()

    val property = forAll(genList(1, genInt)) { list ->
        prop(largest(list) == list.sort { a, b -> a.compareTo(b) }.last())
    }
    val checkResult: CheckResult = property.check()
    println(checkResult.summary())
}


with the output:

Falsified after 6 passed tests with <[3, 5, 0]>

We then correct this in the next example with a correct definition for function largest.

@Test fun successfulLargestOperation() {
    fun largest(list: List<Int>): Int {
        fun biggest(list: List<Int>, big: Int): Int =
            when (list) {
                is Nil -> big
                is Cons -> if (list.head() > big)
                    biggest(list.tail(), list.head())
                else
                    biggest(list.tail(), big)
            }   // biggest

        return biggest(list.tail(), list.head())
    }   // largest

    val property = forAll(genList(1, genInt)) { list ->
        prop(largest(list) == list.sort { a, b -> a.compareTo(b) }.last())
    }
    val checkResult: CheckResult = property.check()
    println(checkResult.summary())
}


with the output:

OK, passed 100 tests

KwikCheck randomly generates the input data to try to find a case for which the property fails. When such a case is found, we can request it tries to find a minimal test case by shrinking the input data. This minimal test case is then reported for easy inspection. It will try to find smaller inputs that also violate the property, in order to give the developer a clearer message about the nature of the failure. The List class member function removeDuplicates removes duplicates from a list retaining the first of each occurrence. In this example we claim that removing all duplicates returns the same list which, of course, is false:

@Test fun failingRemoveDuplicatesOperation() {
    val genIntList: Gen<List<Int>> = genList(genInt)
    val property = forAll(genIntList) {list ->
        prop(list.removeDuplicates() == list)
    }
    val checkResult: CheckResult = property.check()
    println(checkResult.summary())
}


with the output:

Falsified after 6 passed tests with <[3, 4, -4, 3, 3, -2]>

While large inputs may produce the errors, KwikCheck will attempt to shrink the input sequence to the smallest possible that will reproduce the error. The smaller the input, the easier it is to fix. The next example repeats the same false claim but requests that the list should be shrunk to a more manageable size.

@Test fun shrinkingRemoveDuplicatesOperation() {
    val genIntList: Gen<List<Int>> = genList(genInt)
    val shrinkIntList: Shrink<List<Int>> = shrinkList(shrinkInt)
    val property = forAll(genIntList, shrinkIntList) { list ->
        prop(list.removeDuplicates() == list)
    }
    val checkResult: CheckResult = property.check()
    println(checkResult.summary())
}


with output:

Falsified after 8 passed tests with <[0, 0]>

where it is now clear if we remove one of the duplicates the resulting list cannot be the same as this list with two zeros.

KwikCheck supports defining custom shrinking functions. We define a function that returns a Shrink<A> instance. The shrink factory method is used for this purpose. It expects a single function parameter with an A for its argument and returns a List<A>. An example is the library function shrinkPair:

fun <A, B> shrinkPair(sa: Shrink<A>, sb: Shrink<B>): Shrink<Pair<A, B>> =
    shrink{pair: Pair<A, B> ->
        sa.shrink(pair.first).map{a: A -> Pair(a, pair.second)}.append(sb.shrink(pair.second).map{b: B -> Pair(pair.first, b)})
    }


KwikCheck supports creating custom generators. The following example declares the domain model class Account representing a bank account with account number and account balance. The class is immutable and both member functions deliver new instances. Function credit increases the balance by the amount. Function debit only reduces the balance provided it does not become negative.

@Test fun domainModelGeneratorOperation() {
    class Account(val number: String, val balance: Int) {

        fun credit(amount: Int): Account = Account(number, balance + amount)

        fun debit(amount: Int): Account = Account(number, if (balance >= amount) balance - amount else balance)

    }

    val genAccount: Gen<Account> = genAlphaNumStr.bind{number ->
        genPosInt.bind{balance ->
            value(Account(number, balance))
        }
    }
    val property = forAll(genAccount, genPosInt) { account, amount ->
        prop(account.debit(amount).balance >= 0)
    }
    val checkResult: CheckResult = property.check()
    println(checkResult.summary())
}


with output:

OK, passed 100 tests

The property we are checking is that all debit operations leave the balance positive. We establish the value binding genAccount for a generator Gen<Account> that delivers a random Account instance. It is common to implement such generators using the monadic features of the Gen class. Using the generator genAlphaNumStr we bind what it delivers to the lambda parameter number. Similarly, using the genPosInt generator we bind what it delivers to the lambda parameter balance. Using these two random values we produce the required random Account.

Case Study

A common business use-case is validating a user login or a user registration. In the latter we might require the user to provide a name and an age. Both values might be supplied by entering values into text fields that populate a Form class:

data class Form(val name: String, val age: Int)

The validation logic is provided by the function:

fun validateForm(form: Form): ValidationNel<Error, User>

which accepts a Form as parameter and returns a ValidationNel value. The ValidationNel type is an alias for the class Validation:

typealias ValidationNel<E, A> = Validation<NonEmptyList<E>, A>

where the type parameter E represents failure and the type parameter A represents a successfully validated value. Failures are accumulated into a NonEmptyList when combining multiple Validation values using an applicative style.

If the Form is valid function validateForm returns a User object:

data class User(val name: String, val age: Int)

If the Form is invalid validateForm returns a non-empty list of Error values:

sealed class Error {
    data class NameTooLong(val text: String) : Error()
    data class NameTooShort(val text: String) : Error()
    data class AgeTooLow(val num: Int) : Error()
    data class AgeTooHigh(val num: Int) : Error()
}   // Error


To develop some property tests we will need to generate a random form. In the first test we produce a valid form by generating a valid name and a valid age. A valid user name is an alphanumeric string of 3 to 30 characters. A valid age is an integer between 18 and 100 inclusive.

val genValidName: Gen<String> = genAlphaNumStr(3, 30)
val genValidAge: Gen<Int> = choose(18, 100)
val genValidForm: Gen<Form> = form fmap genValidName ap genValidAge


An applicative style (see both the Option and Validation blogs) is used to generate the valid form from the valid name and valid age and where form is a curried variant of the Form class constructor:

val form: (String) -> (Int) -> Form = { name -> { age -> Form(name, age) }}

The test on valid data is:

val property = forAll(genValidForm){form ->
    val validation: ValidationNel<Error, User> = validateForm(form)
    prop(validation.isSuccess())
}
val checkResult: CheckResult = property.check()
println(checkResult.summary())


producing the usual OK, passed 100 tests.

We might also run tests with incorrect data. For example we might produce a form with an incorrect user name:

val genInvalidName: Gen<String> = genAlphaNumStr(50)
val genInvalidNameForm: Gen<Form> = form fmap genInvalidName ap genValidAge

Here the user name is any string of length 0 up to length 50. Both extremes are outwith our legal limits of 3 to 30 inclusive and function validateForm will fail.

val property = forAll(genInvalidNameForm){form ->
    val validation: ValidationNel<Error, User> = validateForm(form)
    val ok: Boolean = when (validation) {
        is FailureNel<Error, User> ->
            (validation.value.size() == 1) && isNameError(validation.value.head())
        is SuccessNel<Error, User> -> true
    }
    prop(ok)
}
val checkResult: CheckResult = property.check()
println(checkResult.summary())

The proposition we check is determined by the ok value. If the validation is a SuccessNel then the property is true. If the validation is a FailureNel then we assert the NonEmptyList has size 1 and that single entry is a name error:

fun isNameError(error: Error): Boolean =
    when (error) {
        is Error.NameTooLong -> true
        is Error.NameTooShort -> true
        else -> false
    }

A test on a form with a valid name but an invalid age follows the same pattern.

In the next test we provide a form with both the name and the age invalid:

val property = forAll(genInvalidNameAgeForm){form ->
    val validation: ValidationNel<Error, User> = validateForm(form)
    val ok: Boolean = when (validation) {
        is FailureNel<Error, User> ->
            (validation.value.size() <= 2) && validation.value.forAll{error -> isNameError(error) || isAgeError(error)}
        is SuccessNel<Error, User> -> true
    }
    prop(ok)
}
val checkResult: CheckResult = property.check()
println(checkResult.summary())

if the validation is a failure we may have accumulated one or two errors: a valid name but an invalid age; an invalid name but a valid age; or an invalid name and invalid age. If the validation is a FailureNel then we assert the list has size 1 or 2 and that the list elements are name or age errors.

A variation of this use case is one in which the users birth date is used instead of their age. The Form and the User have Calendar properties:

data class Form(val name: String, val birthDate: Calendar)
data class User(val name: String, val birthDate: Calendar)

The validation function is modified as:

fun validateForm(today: Calendar, form: Form): ValidationNel<Error, User> =
    user fmap validateName(form.name) ap validateAge(today, form.birthDate)

with today's date provided so we can validate the age from the birth date in the form.

The test appears below. Today's date is generated from the overloaded choose function with the start and end of the period. The randomly generated form is produced with a valid name and a valid birth date.

val property = forAll(choose(GregorianCalendar(1900, 1, 1), GregorianCalendar(2099, 12, 31))){today ->
    forAll(form fmap genValidName ap genValidBirthDate(today)){form ->
        val validation: ValidationNel<Error, User> = validateForm(today, form)
        prop(validation.isSuccess())
    }
}
val checkResult: CheckResult = property.check()
println(checkResult.summary())

To generate a valid birth date we must know today's date then generate a date that is 18 to 100 years earlier matching our initial requirements.

fun genValidBirthDate(today: Calendar): Gen<Calendar> =
    choose(18, 100).map{n ->
        val calendar: Calendar = today.clone() as Calendar
        val year: Int = calendar.get(YEAR)
        calendar.set(YEAR, year - n)
        calendar
    }   // genValidBirthDate



The KwikCheck sources are at []
The Dogs sources are at []

Ken Barclay

No comments: