People learning about Haskell frequently run into the problem of "understanding Monads". This is not surprising, as this is a distinguishing feature of the language: Haskell-Code is often written in terms of monads. I want to give an introduction to monads and their main application, state monads, for people who are practically inclined, but explicitly not a practical introduction: My objective is to make these people understand monads, I do not want to argue whether monads are useful. Before I even start explaining monads, I will explain lots of secondary stuff, which is, in my opinion, source of confusion about monads and functional programming in general. And of course, this introduction is about my personal opinions and experiences at the time when I write this. They might change, they might not be correct. As always, I am thankful for suggestions and criticism.
This is not an introduction to functional programming. Basic knowledge about writing simple programs is required, specifically, what λ means, and what a recursive function is. I also assume good familiarity with imperative languages. If you think you are a good programmer, and you still don't understand most things explained here, you should probably think again. If something isn't clear, Wikipedia and DuckDuckGo are your friend.
C's records are called "structs". They are more low-level than the records of most other languages, but conceptually the same. A point on a plane can be defined via
struct point {
float x;
float y;
};
An approximation for the distance between two points can be defined as
float distance (a, b)
struct point* a;
struct point* b;
{ union {
float f;
int l; } x_;
float y_;
int z;
x_.f = b->x - a->x;
y_ = b->y - a->y;
x_.f = 1/(x_.f*x_.f + y_*y_);
y_ = x_.f * 0.5f;
x_.l = 0x5f3759df - (x_.l >> 1);
for (z = 0; z < 10; ++z)
x_.f *= (1.5f - (y_ * x_.f * x_.f));
return x_.f;
}
Structs can also contain arrays, but only when they have fixed length; otherwise, one has to use pointers:
struct point_7d {
float coordinates[7];
};
There is a language extension which allows the use of zero-length
arrays under certain conditions. A person without prior C knowledge
may wonder about the use case of an array of length zero. Well, in C,
array access is insecure: b[n]
is just an abbrevitation for
*(b+n)
. Therefore, declaring an array of length zero while knowing
that more elements are there is perfectly legal, and it often occurs
that some file format has some header, which can be written as a
struct, and which indicates the length of the rest of the file. The
struct itself then only contains the header information, but the
content of the file can still be accessed through this struct. As an
example, the following program generates a bitmap file, using a struct
with a zero-length array:
#include <stdint.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/mman.h>
#include <fcntl.h>
#include <strings.h>
#pragma pack(push,1)
struct bmp {
uint16_t magic;
uint32_t filesize;
uint16_t reserved1, reserved2;
uint32_t pixel_offset, dib_hdr_size;
int32_t width, height;
uint16_t planes, bits_per_pixel;
uint32_t compression;
uint32_t image_size;
uint32_t xppm, yppm, coltab, important_colors;
uint32_t imagedata[0];
};
#pragma pack(pop)
int mmapfile (const char* filename, off_t s, void** r) {
int ret = open(filename, O_CREAT|O_RDWR, S_IRWXU);
if (ret < 0) exit(EXIT_FAILURE);
if (ftruncate(ret, s) < 0) exit(EXIT_FAILURE);
*r = mmap(NULL, s, PROT_READ|PROT_WRITE, MAP_SHARED, ret, 0);
if (*r == MAP_FAILED) exit(EXIT_FAILURE);
return ret;
}
struct bmp* bmp_file (const char* filename,
uint32_t width, uint32_t height) {
size_t
imgs = width*height*4,
fs = sizeof(struct bmp)+imgs;
struct bmp* b;
mmapfile(filename, fs, (void**) &b);
b->magic = 19778;
b->filesize = fs;
b->reserved1 = b->reserved2 =
b->compression = b->xppm = b->yppm =
b->coltab = b->important_colors = 0;
b->pixel_offset = sizeof(struct bmp);
b->dib_hdr_size = 40;
b->width = width;
b->height = height;
b->planes = 1;
b->bits_per_pixel = 32;
b->image_size = imgs;
return b;
}
int main (int argc, char** argv) {
int i;
if (argc < 2) exit(EXIT_FAILURE);
struct bmp* b = bmp_file (argv[1], 100, 100);
bzero(b->imagedata, b->image_size);
for (i = 0; i < 100; ++i) {
b->imagedata[i + 100*i] = ~0;
}
exit(EXIT_SUCCESS);
}
This code is not really portable, and it is arguable whether it is beautiful. I consider this use of array elements outside of the declared bounds as a hack, and as such, it is beautiful. It is a creative solution to a common problem. However, it is not the kind of code one should be writing these days, except when you really really need to squeeze out every processor cycle you have. However, it is a way of thinking: C's strength is that it does not enforce its abstractions. It lets you do whatever you want, without warning, so you get very efficient code. For today's measures, on the continuum of the abstraction-optimization tradeoff, it is on the far right side.
Haskell, on the other hand, is on the far left side: Abstraction is extremely valuable, and should only be broken if there is no alternative, and even then some overhead will be accepted to keep it behaving in a clean way. It is a radically different way of thinking. Scott Meyers said that everything in C++ has a reason. The same holds for Haskell. It's just that Haskell sets other priorities. One can argue whether this is better or worse, whether it is practically relevant or purely academic; but this is not the topic of this post. To understand why monads were invented, you need to think this way; if you're not willing to think this way, you will not be able to understand monads.
Functions in Haskell should not have side-effects. Some of them do,
for example, Debug.Trace.trace
does, but it is only meant for
debugging. The same Haskell expression should always yield the same
result. This way, Haskell can utilize lazy evaluation (or
call-by-need): An expression is evaluated when its result is
actually used. Therefore, Haskell can handle infinite structures in a
nice way. For example, the following expression is a standard
motivating example, it defines the infinite list of Fibonacci numbers,
plus a function that then actually returns the n-th Fibonacci
number.
fibs = 0 : 1 : addHd fibs
where
addHd (x : r@(y : _)) = (x + y) : addHd r
fib n = fibs !! n
From an old-school software engineer's point of view, this might look
horrible. And in fact, most programs written this way are slower than
their highly-optimized counterparts. On the other hand, the definition
is easy to read and to write (as soon as you got to know Haskell):
relying on lazy evaluation can make writing and understanding software
easier. And today's software problems are seldom about efficiency,
they are about maintainability. A further nice property is that we do
not need to have a special case for constructs like if-then-else
. We
can define
ifthenelse :: Bool -> a -> a -> a
ifthenelse True a _ = a
ifthenelse False _ b = b
A seemingly equivalent C function would be
enum BOOL { FALSE, TRUE };
void* ifthenelse (b, yes, no)
enum BOOL b;
void *yes, *no;
{
switch (b) {
case FALSE:
return no;
case TRUE:
return yes;
}
}
The difference is that C has eager evaluation: Every argument is evaluated before a function is called. In Haskell,
ifthenelse True 2 (mod 1 0)
will yield 2
, while in C,
ifthenelse (TRUE, (void*)2, (void*)(1 % 0));
will yield an division-by-zero error (well, actually, most modern
compilers will at least warn you about this, or optimize it out
through dead-code-detection, but you get the idea). Notice that most
other functional programming languages allow side effects, and also
use eager evaluation and run into the same problem as C does in this
case. Common Lisp (and generally most Lisp dialects) has a nice way of
defining things like ifthenelse
as macros. Anyway, Haskell does
lazy evaluation, with all its benefits and drawbacks.
And one major drawback is that the presence of side-effects breaks lazy evaluation. For example, the following C program cannot be implemented in Haskell directly:
#include <stdio.h>
int i = 0;
int inc() {
return i++;
}
int main (void) {
int j;
for (j = 0; j < 10; ++j)
printf ("%d\n", inc());
}
Subsequent calls to inc
return different results, the results depend
on the hidden state i
. Assume we had such a function in Haskell,
and we would define
incs = inc : incs
Then inc !! n
would yield the n-th element of the list to be
called. The contents of the list would depend on the order of the
calls to (!!)
. There might be scenarios where this is desirable, but
in most cases, this just breaks stuff. As I wrote previously, it is
possible to destructively modify values in Haskell: On the one hand,
Haskell has an FFI which lets us call C functions; on the other hand,
Haskell has "unsafe" functions like unsafePerformIO
. Their use is
discouraged, but there are situations in which it is useful, for
example for debugging, and for developing abstractions which are only
using mutable state in the background which cannot be accessed through
the frontend, like DiffArrays.
The absence of side-effects makes certain programs impossible. For example, Quicksort, and in-situ-sorting-algorithms in general, cannot be realized without destructive modifications of an underlying array. However, there are good purely functional substitutes for these algorithms. More problematic is the absence of I/O: Accessing and modifying files is stateful. But not having any kind of I/O limits the usability of a programming language. This is the problem state monads try to solve.
For this section, let us talk only about pure functions, that is, functions that do not have side-effects. Let A, B, C be some types and f:A→B, g:B→C, then we can define f↣g:=λₓg(f(x)), that is, (f↣g)(x)=g(f(x)). It is easy to show that ↣ is associative. Notice that the same does not hold for functions with side effects (I'll leave the finding of a counterexample as an exercise).
A common optimization, which came from but is not limited to functional programming, is tail call optimization (TCO). Functional programs use recursion a lot more than imperative programs. One possible definition of the Gaussian sum is
gsum 0 = 0
gsum n = n + gsum (n - 1)
The problem with recursion is that it consumes stack space. Therefore,
for large values of n
, this definition may exhaust the
stack. However, if the last expression is the unmodified recursive
call, the compiler can just optimize out the call that would otherwise
consume stack. The above function would have to be rewritten to
gsum_tr_ 0 r = r
gsum_tr_ n r = gsum_tr_ (n - 1) (n + r)
gsum_tr n = gsum_tr_ n 0
The function gsum_tr_
would be called tail recursive, and the
recursive call does not consume stack space. In some functional
languages, this is still the method of choice. In Haskell, it is less
relevant, because in many cases, rewriting a recursive function into a
tail recursive function is straightforward, and Haskell applies this
optimization. However, let us stick to the example gsum_tr_
for the
moment. A definition in Java would look like
public static int gsum_tr_ (int n, int r) {
if (n == 0) return r;
return gsum_tr_ (n-1, n+r);
}
The corresponding generated Java Assembly (output of javap -c
with
comments by me, I chose Java Assembly because it is stack based) is:
public static int gsum_tr_(int, int);
Code:
0: iload_0 //load first argument (n)
1: ifne 6 //if n == 0 jump to byte 6
4: iload_1 //load second argument (r)
5: ireturn
6: iload_0 //load first argument (n)
7: iconst_1
8: isub // => n - 1
9: iload_0
10: iload_1
11: iadd // => n + r
12: invokestatic #2 // stack: n - 1, n + r
15: ireturn
Before the recursive call, the arguments are pushed to the stack, just to be popped again directly by the called function. Then the return value is taken from the recursive call directly. We might as well just modify the values directly, in-place:
public static int gsum_tr__ (int n, int r) {
while (true) {
if (n == 0) return r;
r = n + r;
n = n - 1;
}
}
And get
public static int gsum_tr__(int, int);
Code:
0: iload_0
1: ifne 6
4: iload_1
5: ireturn
6: iload_0
7: iload_1
8: iadd
9: istore_1
10: iload_0
11: iconst_1
12: isub
13: istore_0
14: goto 0
Which is almost identical: an imperative loop with destructive updates. That is, under certain conditions, we can have destructive updates inside purely functional code. Whenever we know that the computation cannot depend on the former value, we can destructively overwrite it. More specifically, a sufficient condition is that the value occurs in the function body exactly once. This can be trivially checked at compile time, and in fact, this condition is reflected in so-called uniqueness types in some functional languages (but not in Haskell).
I hope that by now you get an impression of how much thought has been put into the concepts of functional programming. The concepts are not trivial, but neither are the concepts of higher imperative programming languages. Make sure you understood everything so far.
We now want to embed stateful operations into purely functional code,
but without the use of uniqueness types, but using state
monads. Firstly, let us define a type Ref a
, which is a reference to
some object of type a
.
data Ref a = Pto a
Creating a new reference is so simple that it almost makes no sense. However, for instructional purposes, we define it:
ret :: a -> Ref a
ret x = Pto x
This definition is polymorphic. Let us look at the case a = Int
. Now
consider the following functions:
inc :: Int -> Ref Int
inc x = Pto $ x + 1
twice :: Int -> Ref Int
twice x = Pto $ 2 * x
These functions just read our pointer, and return a pointer to the increment/double of the former pointer. We furthermore need
(>=>) :: (a -> Ref b) -> (b -> Ref c) -> a -> Ref c
(>=>) f g x = case f x of { Pto y -> g y }
run :: Ref a -> a
run (Pto x) = x
Now we can write the following:
run ((inc >=> twice >=> twice >=> twice >=> inc >=> ret) 0)
Before you read on, try to find out what this function returns. Answer: hover The function returns 9 hover. The nice thing is that the reference itself is handed over to the next function directly as first argument. Therefore, there is no reason to ever have more than this one reference, and an optimizing compiler can (and will) not require stack space, and regard the operations on the reference as destructive, the same way it can be done with TCO. Notice that inside this definition, except for the first function call, we never actually talk about the reference, we only tell what to do with it.
Make sure you have understood this so far. If not, I failed at explaining, but it also makes no sense to read on.
The (>=>)
is somewhat similar to our previous ↣. Especially, one can
show that it is associative. Furthermore, the function ret
is a
neutral element on both sides. These properties make our (>=>)
a
Kleisli combinator, and Ref
a monad with respect to this Kleisli
combinator. Any structure with a combinator and a return function
that satisfies these axioms is a monad, monads are mathematical
structures, much like groups, rings, fields and vector spaces: They
are collections of objects which satisfy certain axioms. And like
fields, which can be used in cryptography, and vector spaces, which
can be used for computer graphics, monads can be used to describe and
solve specific problems, and one such problem is the description
stateful operations in a purely functional context. But as for groups,
fields and vector spaces, there are many different structures that
obey the monad axioms.
Let us for a moment look at the group axioms: A group is a triple (G, ∘, e), where ∘:G→G→G e:G, such that
The monad axioms for a Kleisli combinator are quite similar:
return >=> g = g
f >=> return = f
(f >=> g) >=> h = f >=> (g >=> h)
We have a neutral element (left and right neutral), and the combinator is associative.
I chose the Kleisli combinator because I think that it is easier to
understand in the case of state monads: In our case, the only thing it
does is passing the value of the returned reference of the left
function as a first argument to the right function, making sure that
the rest of the computation does not depend on the former value, so it
can be optimized out. However, using the Kleisli combinator has
several disadvantages for actual programming. The usual definition
uses the bind
combinator (>>=)
. We can define the bind
combinator in terms of the Kleisli combinator:
(>>=) :: Ref a -> (a -> Ref b) -> Ref b
(>>=) ma g = (id >=> g) ma
But for the moment, let us stick with the Kleisli combinator. So far, we did not do anything except writing definitions and exploiting certain optimizations.
Now we want to write something to a file. The main-function of a
Haskell program is of the type IO ()
, which is a monad. This monad
references the "outside world" – and it is a monad, since there can
only be one such world. This outside world is stateful. In C, we can
write
#include <stdio.h>
void wr (FILE* d) {
fprintf(d, "Hello!\n");
}
int main (int argc, char** argv) {
FILE* d = fopen(argv[1], "a");
wr(d);
printf("%d\n", ftell(d)); // 7
wr(d);
printf("%d\n", ftell(d)); // 14
}
The problem is obvious: Every call to fprintf(3)
advances the file
pointer, so the ftell(3)
calls return different file positions – and
violate referential transparency. We could, however, rewrite this
program in the following way:
#include <stdio.h>
FILE* wr (FILE* d) {
fprintf(d, "Hello!\n");
return d;
}
int main (int argc, char** argv) {
FILE* d1 = fopen(argv[1], "a");
FILE* d2 = wr(d1);
printf("%d\n", ftell(d2)); // 7
FILE* d3 = wr(d2);
printf("%d\n", ftell(d3)); // 14
}
This is how it would be done with uniqueness types: Every modification
of the file consumes a file handle, but produces a new one. Notice
that we would have to encapsulate the printf(3)
statements as well –
they change the "world". Therefore, we can define a WORLD*
type
which passed instead, and think of this WORLD*
type as a map from
handles to resources:
#include <stdio.h>
#include <stdarg.h>
typedef WORLD; // dummy
struct wfret_k { WORLD* w; void* t; };
struct wfret_k wfopen (WORLD* w, const char* name, const char* mode) {
struct wfret_k x;
x.w = w;
x.t = (void*) fopen(name, mode);
return x;
}
struct wfret_k wfprintf (WORLD* w, FILE* d, const char* format, ...) {
struct wfret_k x;
va_list aq;
va_start(aq, format);
vfprintf(d, format, aq);
x.w = w;
x.t = NULL;
return x;
}
struct wfret_k wftell (WORLD* w, FILE* d) {
struct wfret_k wt;
wt.w = w;
wt.t = (void*) ftell(d);
return wt;
}
struct wfret_k wprintf (WORLD* w, const char* format, ...) {
struct wfret_k wt;
va_list aq;
va_start(aq, format);
vprintf(format, aq);
wt.w = w;
wt.t = NULL;
return wt;
}
struct wfret_k wr (WORLD* w, FILE* d) {
return wfprintf (w, d, "Hello!\n");
}
int main (int argc, char** argv) {
WORLD* w0; //assume this was "outside"
struct wfret_k o1 = wfopen(w0, argv[1], "a");
WORLD* w1 = o1.w;
FILE* f = o1.t;
struct wfret_k o2 = wr(w1, f);
WORLD* w2 = o2.w;
struct wfret_k o3 = wftell(w2, f);
WORLD* w3 = o3.w;
struct wfret_k o4 = wprintf(w3, "%d\n", o3.t);
WORLD* w4 = o4.w;
struct wfret_k o5 = wr(w4, f);
WORLD* w5 = o5.w;
struct wfret_k o6 = wftell(w5, f);
WORLD* w6 = o6.w;
struct wfret_k o7 = wprintf(w6, "%d\n", o6.t);
}
Now, this is very complicated to do in plain C. Functional languages
usually allow for overloading variable names, and when working with
uniqueness types, this is usually the way it is done. At first sight,
this WORLD*
object does not contain anything, and is therefore
useless. However, in the scope of uniqueness types, every WORLD*
can
only be passed to another function once. Then it is gone: If we
tried to access it a second time, the code would not compile. Even
though this WORLD*
object does not contain any data, it prevents
some code from being compiled: It is a formal guarantee that can be
checked at compile time. At runtime, we could just optimize it out –
but that is fine, since we checked the correctness at compile time.
The limits of C are now reached; especially, we need currying from now
on. We will continue with pseudocode to define a Kleisli
combinator. Our wrapper functions return a polymorphic pair wfret_k
(which is realized via void*
, because C does not have polymorphism),
consisting of the world itself, and some return value.
We can now define a "modified" kleisli combinator (notice that this is pseudocode, an unholy mixture of C and Haskell):
m_kleisli :: (WORLD* → A → (WORLD*, B)) → (WORLD* → B → (WORLD*, C)) → (WORLD* → A → (WORLD*, C))
m_kleisli f g w0 a =
let (w1, b) = f w0 a
in g w1 b
Notice that except for the calls to the libc, we never do anything but
passing the worlds – which was useful to assure linearity. However, if
we take the Kleisli combinator as the given method of doing I/O,
passing the world is not really necessary anymore, because all
functions that do I/O will be combinations of smaller functions using
the Kleisli combinator, and it will sooner or later decompose into
calls to our elementary wf
-functions. Instead, we assure linearity
by defining a type functor IO a
, which will replace our former
(WORLD*, a)
:
(>=>) :: (a → IO b) → (b → IO c) → (a → IO c)
We now would still have to prove the monad laws, but this is a good exercise for you.
As mentioned earlier, it is possible to define the bind-operator
(>>=) :: IO a → (a → IO b) → IO b
which is more common. In Haskell, we can therefore write
import Debug.Trace
import System.IO
import System.Environment
main :: IO ()
main = getArgs >>=
\ args ->
openFile (args !! 0) AppendMode >>=
\ handle -> hPutStr handle "Hello!\n" >>=
\ _ -> hPutStr handle "Hello!\n" >>=
\ _ -> hClose handle
There is a shorthand notation that makes these things (sometimes) easier:
import Debug.Trace
import System.IO
import System.Environment
main :: IO ()
main = do args <- getArgs
handle <- openFile (args !! 0) AppendMode
hPutStr handle "Hello!\n"
hPutStr handle "Hello!\n"
hClose handle