Kotlin: Functional Domain Modeling #0
In this blog we start investigating how to use the Kotlin type system to accurately capture application domain models in code. We will see that types can act as documentation; documentation that does not get out of sync with the design because the latter is represented in the code itself.
Functions
In functional programming, the concept of a pure function is one that holds the following properties:
- The function is defined solely in terms of its parameters.
- The function should always return the same result for the same input parameters.
- The function should not cause any side effects by mutating any external state.
In other words, a pure function has no observable effect on the execution of the program other than to compute its result given its inputs; it has no side effects. One familiar pure function is the length function of a String. For any given string, the same length is always returned.
Functional programming (FP) is based on this simple premise: we construct our programs using only pure functions. This raises many issues for those with an OO background: how do we program without mutating state; how do we write programs that loop; how do we handle errors without throwing exceptions; how do we program with I/O. Over the course of these blogs we will show how to express all our programs without side effects. We will see how pure functions are easier to test, reuse and reason about.
Important in FP is the notion of higher order functions: functions that are parameters to other functions. We see that functions can also be the results of executing functions. In this way we create functions as values in our executing program, rather than defining them directly.
Through functions as parameters and functions as results we are able to fully exploit the machinery of these general functions, such as function composition. The resulting function definitions can be both more concise and more readable than traditional approaches.
This capability is important for a number of reasons. Having higher order functions means that you can eliminate entire categories of functions for a standard data structure by building general mechanisms that traverse the structures and apply higher order functions to each element. By enabling functions as return values, we have the opportunity to build dynamic, adaptable systems.
A common technique in FP is the concept of currying. Currying is the process of transforming a function of multiple parameters into a series of functions, each of which accepts a single parameter. We call a curried function with its single parameter as in f(a). This in turn delivers another function also requiring a single parameter as in f(a)(b). This continues until all the parameters have been provided. The Dogs library includes constructs to curry given functions.
Closely related to curried functions is the notion of partial function application. When we partially apply a curried function by providing some of the parameters the result is a function which may have the required signature that we can supply to a higher order function.
Functions are the building blocks that can be composed to build other functions. For example, consider the function that converts a String into an Int, and the function that converts an Int into a Boolean. Using function composition we can build larger functions out of smaller ones. We can compose these two functions to create one that converts a String into a Boolean. Whilst Kotlin does not support function composition out of the box, it provides enough features that we can create the support ourselves. This feature is provided by the Dogs library.
Managing side effects
In a banking application consider the function to open a new Account. The function may expect various parameters that define the properties of the Account such as the account number, the initial balance, the date the account is opened, etc. One would automatically expect the return type for open to be an Account type. But then the open function could fail because of validation errors. For example, the opening date may be before the current date. In the functional world we do not throw exceptions and expect users to catch them.
The Dogs library offers abstractions that helps address the issue of exceptions. You manage exceptions as effects that compose along with other abstractions in your domain model. An effect adds capabilities to your computation so you do not need to use side effects.
An effectful computation adds some capabilities to a computation. An effect is modeled with a type constructor that incorporates these additional capabilities. For example the open function might return the type Either<String, Account> which adds the effect of giving a String error in the event of failure otherwise the Account is wrapped in an Either.
Algebraic data types
An algebraic data type (ADT) is a kind of composite type: a type formed by combining other types. Two examples of algebraic types are product types (record types) and sum types (variant types). An ADT is a data types defined by one or more data constructors, each of which may contain zero or more type arguments. We say that the data type is the sum of its data constructors, and each data constructor is the product of its type arguments. These ideas map directly to constructs in Kotlin. The sum type is captured by Kotlin's sealed class, and the product type by a simple class declaration.
ADTs define the structure of data in your model. Sum types and product types provide the necessary abstractions for structuring the various data of our domain models. Sum types let you model the variations within a data type, product types help to cluster related data into larger abstractions.
An advantage of working with ADTs is that the compiler validates the various tags to ensure it contains valid combinations of data types. The compiler enforces the rules you have defined for an ADT.
Immutability
One essential characteristic of FP is that all data is immutable: an object whose state cannot be modified after it is created. This is in contrast to a mutable object which can be modified after it is created. This is typical of how a Java programmer would develop his code. FP encourages immutability and eschews in-place mutation.
In Kotlin once you define an immutable ADT you cannot modify any of its properties. To effect a change you create another abstraction from the original with the modified property. For example, we might represent a bank Account with a number an a balance property. The function credit would have an amount to be credited as parameter. The function would return another Account with the same number and a balance obtained from adding the original balance and the credit amount.
The data types provided by the Dogs library are all immutable. The library includes immutable Lists, immutable Options, immutable MultiMaps, etc. Imperative data structures are ephemeral in the sense that making a change to the structure destroys the old version leaving only the new version. A distinctive property of functional data structures is that they are persistent - updating a persistent functional data structure does not destroy the existing version, but creates a new version that coexists with the old one. Because nodes in the data structure are never modified, all nodes that are unaffected by an update can be shared between the old and the new version without worrying that a change in one will be visible to the other. Dogs data types are immutable and persistent.
Modeling Simple Values
As developers we have a tendency to focus on technical issues. However, the domain experts for whom we are developing an application think in terms of domain concepts such an order id, a product code, a customer name or a customer address. If we are to have a successful project it is important that we, as developers, fully embrace the domain experts requirements. To do that we must use the same vocabulary. So, instead of thinking in terms of Int and String, we think in terms of order id and address, even where the order id is an Int and the address is a String.
As our development proceeds it is important that a product code and a name are not mixed up. Just because they are both represented by Strings, say, they are not interchangeable. To make clear these types are distinct we employ a wrapper type that wraps the primitive representation. In an application, the classes ProductCode and Name would be wrapper classes around a String. Creating simple types like this ensures that they cannot be accidentally mixed.
However, wrapper classes introduce a runtime overhead due to additional heap allocations. Moreover, if the wrapped type is primitive, the performance hit is terrible, because primitive types are usually heavily optimized by the runtime, while their wrappers don't get any special treatment. To solve such issues, Kotlin introduces a special kind of class called the inline class. At runtime, instances of the inline class will be represented using this single property. This is the main feature of inline classes, which inspired the name inline: data of the class is inlined into its usages (similar to how the content of inlined functions is inlined to call sites).
Almost always simple types are constrained in some way, such as having to be in a certain range or match a certain pattern. It is very unusual to have an unbound Int or String in a real-world domain. For example, a customer last name may be represented by a string, but that does not mean that it should contain a tab character or a symbolic character such as #.
The standard technique for constructing objects that need to honour a set of constraints is known as the smart constructor idiom. You prohibit the user from invoking the standard constructor and instead provide a factory function that ensures the user gets a data type from which she can recover a valid instance or a reason for failure. This would be an example of an effectful computation. Thereafter, because the data is immutable, the value never needs to be checked again.
Modeling Aggregate Values
We said that product types help to aggregate related data into larger abstractions. The clustering is represented by a Kotlin class. The individual class properties might themselves be represented by other ADTs. Consider a banking application with a customer Account class that contains properties such as the customer Address and the Date the account was opened. If the Account, Address and Date properties are immutable, then to change a customer's address we need to make the change at the aggregate Account level, not at the level of the Address.
The aggregate acts as the consistency boundary: when one part of the aggregate is updated, others might also require updating to ensure consistency. For example, in an order taking system, an aggregate Order might have a number of OrderLines and might have a total price property. If one order line changes its unit price then the total must be updated to keep the Order data consistent.
The aggregate is also where invariants are enforced. In our order taking example, there might be a business rule that any Order must have at least one OrderLine. If we try to delete multiple OrderLines from an Order then we must ensure there is at least one OrderLine remaining.
Nested data structures are a pain to inspect and change. The pain increases with the depth of the data structures. The most common functional approach is to avoid in-place mutation and generate a new instance instead with the updated values. Kotlin's data classes provide syntactic sugar for this operation. Consider the classes:
data class Address(val streetNumber: Int, val streetName: String)
data class Person(val name: String, val age: Int, val address: Address)
val addr = Address(10, "High Street")
val newAddr = addr.copy(streetNumber = 11)
delivers a new Address instance with the streetNumber as 11 and the streetName as High Street.
We update the street number of for the address of a specific person using the same approach:
val per = Person("Ken", 25, addr)
val newPer = per.copy(address = per.address.copy(streetNumber = 11))
The greater the level of nesting of the objects, the more cluttered the syntax becomes. This situation demands a better abstraction and is provided by lenses. We can describe a lens as a group of functions that allows us to manipulate data inside a deeply nested class structure.
Enforcing invariants with the type system
An invariant is a condition that is always true. A business application might require that a customer last name must be all uppercase letters. That is an example of an invariant. Of course, the smart constructor is one technique to ensure invariants are honoured at the point of creation.
An Order comprises a number of OrderLines. A business requirement is that there must always be at least one OrderLine in an Order. We can capture this invariant in the type system if we use the class NonEmptyList rather than the more usual List class. The NonEmptyList class is provided by the custom Dogs library.
A customer Name might be presented as a product type comprising a LastName, a FirstName and a possible MiddleName. The Dogs library supports the Option type and we might use Option<MiddleName> to ensure the correct semantics.
Functional patterns in domain models
Functional design patterns are reusable abstractions. Object oriented design patterns are also reusable abstractions that are examples of best-practice. OO design patterns offer standard solutions for problems that occur repeatedly in programming. However the OO programmer has no code from which to develop a solution, but has to implement the patterns against the problem domain and will have to do this repeatedly each time the pattern is deployed. OO design patterns offer little reusability.
Functional design patterns offer much more reusability than their OO counterparts. Each functional pattern includes generic and reusable code which is invariant across the contexts where they are used. Context specific implementations are provided for instances where the pattern is applied which can be deployed in application code. Further, these patterns are accompanied with a set of laws. All instances of such patterns are expected to behave in a certain way and satisfy the laws associated with the respective implementations.
The KwikCheck test framework for Kotlin (as described here) is modeled after the QuickCheck framework. Property-based testing is generative testing. You do not supply specific example inputs with expected outputs as with unit tests. Instead, you define properties about the code and use the generative-testing engine to create randomized inputs to ensure the defined properties are correct. KwikCheck allows us to express these properties and generate the randomized inputs. We use KwikCheck to test the functional design pattern laws on the instances of the pattern. The KwikCheck test framework is built on the Dogs library.
We also use KwikCheck to verify the properties of our domain models. As an example, when we model the transfer of funds between two customer accounts in a banking application, a domain rule for this is the amount debited from one account must equal the amount credited to the other. We can think of this domain rule as a property that must be verified for a transfer operation.
We are familiar with the map function from the List class which applies a transformer function to each element in the List and delivers a new List of transformed values. Because List, Option, etc all share this behavior, we can document it this way as an extension function, in which F is a place-marker for the actual types for which it applies:
fun <A, B> F<A>.fmap(f: (A) -> B): F<B>
This is called a functor and abstracts the capability of mapping over a data type with a regular function. If F is List then we expect the List extension function:
fun <A, B> List<A>.fmap(f: (A) -> B): List<B>
In the Dogs library the data types Option, Either, List, Map and others are functors and have their implementation for function fmap.
The signature for fmap might suggest it would be a member function of a Functor interface which the data types Option, Either, List etc would implement. However, in such an interface, F would be a generic type so that F<A> and F<B> would give rise to what are called higher-kinded types that Kotlin does not support.
One immediate use for this abstraction is to discover other useful functions. If we have an F<Pair<A, B>> we can distribute the F over the Pair to get a Pair<F<A>, F<B>>. Again, with F the List type:
fun <A, B> List<Pair<A, B>>.distribute(): Pair<List<A>, List<B>> =
Pair(this.fmap{pr -> pr.first}, this.fmap{pr -> pr.second})
We get two Lists of the same length, one with all the As and one with all the Bs.
In a library application we might have the following functions:
fun classifiedBooks(dewey: String): List<Book> = ...
fun isWithdrawn(book: Book): Boolean = ...
Function classifiedBooks returns a List of Books that match the dewey decimal classification. Function isWithdrawn determines if the given Book has been withdrawn from service. We can now use fmap to implement a domain specific behavior:
fun withdrawnByClassification(dewey: String): List<Boolean> =
classifiedBooks(dewey).fmap(::isWithdrawn)
The functor is just one of many functional design patterns we include in the Dogs library.
A second functional design patterns is known as the monad and is defined for various types defined in the Dogs library. Monads allow the programmer to build up computations using sequential building blocks. The monad determines how combined computations form a new computation and frees the programmer from having to code the combination manually each time it is required. It is useful to think of a monad as a strategy for combining computations into more complex computations.
Again, we document this as an extension function bind, in which F is the place-marker for the actual types for which it applies:
fun <A, B> F<A>.bind(f: (A) -> F<B>): F<B>
It looks similar to the function fmap defined for the functor. The only difference is the signature of the function f that bind takes as parameter. For function fmap the function parameter returns a value, but for function bind it returns an instance of the type of the monad, here F. When you apply f to the receiver type F<A> in bind, you effectively end up with F<F<B>>. However, bind then flattens them into a single F<B>. Semantically, bind is equivalent to an fmap operation followed by a flatten operation. For this reason bind is often also known by the name flatMap.
The basic intuition behind the bind operation is that it allows us to combine two computations into one more complex computation and allows them to interact with one another. The receiver type F<A> represents the first computation. Significantly, the parameter type of bind is (A) -> F<B> which can, given the result of the first computation, produce a second computation to run. In other words fa.bind{a -> ... } is a computation which runs fa (some instance of type F<A>) and binds the value wrapped in fa to the function literal parameter a. The function body then decides what computation to run using the value for a.
The monad also includes a second function inject. Function inject lets us put a value into a monadic context. For example, for the List type inject is the same as creating a List with a single element. Often inject is simply the class constructor. Its signature is:
fun <A> inject(a: A): F<A>
The Dogs library supports these functions for the data types Option, Either, List, NonEmptyList, etc.
Consider the function divide which returns an optional integer Option<Int>. If the numerator is exactly divisible by the denominator the function succeeds and returns their quotient wrapped in a Some, otherwise it fails and returns a None. Some and None are the concrete implementations of the Option type and are created with the factory functions some and none.
fun divide(num: Int, den: Int): Option<Int> {
return if (num % den != 0) none() else some(num / den)
}
Consider now the function bindDivision which returns an optional pair of integers if the first two parameters are exactly divisible by the third parameter. The returned pair are quotients of the first and third parameter, and the second and third parameter.
fun bindDivision(a: Int, b: Int, c: Int): Option<Pair<Int, Int>> {
return divide(a, c).bind{ac: Int ->
divide(b, c).bind{bc: Int ->
inject(Pair(ac, bc))
}
}
}
Consider the implementation of bindDivision. The outer call to divide(a, c).bind{ ac -> ...} has an Option value produced from the expression divide(a, c). Should this be a Some value, then the function literal is called and its formal parameter ac is bound to the result from the Some value. In the inner call divide(b, c).bind{ bc -> ...} the inner function literal is called with the formal parameter bc bound to the Some value produced from divide(b, c). If the two calls to divide both deliver Some values then a final Some result is produced carrying the pair we seek. If a None value is delivered by either call to divide then a None value is the final result.
Functional languages use monads to turn complicated sequences of functions into succinct pipelines that abstract away control flow, and side-effects. The programmer composes a sequence of function calls (a pipeline) with several bind operators chained together in an expression. Each function call transforms its input plain-type value, and the bind operator handles the returned monadic value, which is fed into the next step in the sequence. The side effect of the divide function returning None is automatically handled. The bind function only executes the following operation if the value is a Some value, otherwise it just passes the None along.
In later blogs we shall explore other functional design patterns such as the applicative functor, the foldable, the traversable alongside others. These patterns forms the basis of effectful computation in programming. They offer abstractions for handling effects within the domain models we develop. We will demonstrate the support provided by the custom Dogs library.
The code for the Dogs library can be found at:
https://github.com/KenBarclay/TBA
https://github.com/KenBarclay/TBA
No comments:
Post a Comment