People learning about Haskell frequently run into the problem of "understanding Monads". This is not surprising, as this is a distinguishing feature of the language: HaskellCode is often written in terms of monads. I want to give an introduction to monads and their main application, state monads, for people who are practically inclined, but explicitly not a practical introduction: My objective is to make these people understand monads, I do not want to argue whether monads are useful. Before I even start explaining monads, I will explain lots of secondary stuff, which is, in my opinion, source of confusion about monads and functional programming in general. And of course, this introduction is about my personal opinions and experiences at the time when I write this. They might change, they might not be correct. As always, I am thankful for suggestions and criticism.
This is not an introduction to functional programming. Basic knowledge about writing simple programs is required, specifically, what λ means, and what a recursive function is. I also assume good familiarity with imperative languages. If you think you are a good programmer, and you still don't understand most things explained here, you should probably think again. If something isn't clear, Wikipedia and DuckDuckGo are your friend.
C's records are called "structs". They are more lowlevel than the records of most other languages, but conceptually the same. A point on a plane can be defined via
struct point {
float x;
float y;
};
An approximation for the distance between two points can be defined as
float distance (a, b)
struct point* a;
struct point* b;
{ union {
float f;
int l; } x_;
float y_;
int z;
x_.f = b>x  a>x;
y_ = b>y  a>y;
x_.f = 1/(x_.f*x_.f + y_*y_);
y_ = x_.f * 0.5f;
x_.l = 0x5f3759df  (x_.l >> 1);
for (z = 0; z < 10; ++z)
x_.f *= (1.5f  (y_ * x_.f * x_.f));
return x_.f;
}
Structs can also contain arrays, but only when they have fixed length; otherwise, one has to use pointers:
struct point_7d {
float coordinates[7];
};
There is a language extension which allows the use of zerolength
arrays under certain conditions. A person without prior C knowledge
may wonder about the use case of an array of length zero. Well, in C,
array access is insecure: b[n]
is just an abbrevitation for
*(b+n)
. Therefore, declaring an array of length zero while knowing
that more elements are there is perfectly legal, and it often occurs
that some file format has some header, which can be written as a
struct, and which indicates the length of the rest of the file. The
struct itself then only contains the header information, but the
content of the file can still be accessed through this struct. As an
example, the following program generates a bitmap file, using a struct
with a zerolength array:
#include <stdint.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/mman.h>
#include <fcntl.h>
#include <strings.h>
#pragma pack(push,1)
struct bmp {
uint16_t magic;
uint32_t filesize;
uint16_t reserved1, reserved2;
uint32_t pixel_offset, dib_hdr_size;
int32_t width, height;
uint16_t planes, bits_per_pixel;
uint32_t compression;
uint32_t image_size;
uint32_t xppm, yppm, coltab, important_colors;
uint32_t imagedata[0];
};
#pragma pack(pop)
int mmapfile (const char* filename, off_t s, void** r) {
int ret = open(filename, O_CREATO_RDWR, S_IRWXU);
if (ret < 0) exit(EXIT_FAILURE);
if (ftruncate(ret, s) < 0) exit(EXIT_FAILURE);
*r = mmap(NULL, s, PROT_READPROT_WRITE, MAP_SHARED, ret, 0);
if (*r == MAP_FAILED) exit(EXIT_FAILURE);
return ret;
}
struct bmp* bmp_file (const char* filename,
uint32_t width, uint32_t height) {
size_t
imgs = width*height*4,
fs = sizeof(struct bmp)+imgs;
struct bmp* b;
mmapfile(filename, fs, (void**) &b);
b>magic = 19778;
b>filesize = fs;
b>reserved1 = b>reserved2 =
b>compression = b>xppm = b>yppm =
b>coltab = b>important_colors = 0;
b>pixel_offset = sizeof(struct bmp);
b>dib_hdr_size = 40;
b>width = width;
b>height = height;
b>planes = 1;
b>bits_per_pixel = 32;
b>image_size = imgs;
return b;
}
int main (int argc, char** argv) {
int i;
if (argc < 2) exit(EXIT_FAILURE);
struct bmp* b = bmp_file (argv[1], 100, 100);
bzero(b>imagedata, b>image_size);
for (i = 0; i < 100; ++i) {
b>imagedata[i + 100*i] = ~0;
}
exit(EXIT_SUCCESS);
}
This code is not really portable, and it is arguable whether it is beautiful. I consider this use of array elements outside of the declared bounds as a hack, and as such, it is beautiful. It is a creative solution to a common problem. However, it is not the kind of code one should be writing these days, except when you really really need to squeeze out every processor cycle you have. However, it is a way of thinking: C's strength is that it does not enforce its abstractions. It lets you do whatever you want, without warning, so you get very efficient code. For today's measures, on the continuum of the abstractionoptimization tradeoff, it is on the far right side.
Haskell, on the other hand, is on the far left side: Abstraction is extremely valuable, and should only be broken if there is no alternative, and even then some overhead will be accepted to keep it behaving in a clean way. It is a radically different way of thinking. Scott Meyers said that everything in C++ has a reason. The same holds for Haskell. It's just that Haskell sets other priorities. One can argue whether this is better or worse, whether it is practically relevant or purely academic; but this is not the topic of this post. To understand why monads were invented, you need to think this way; if you're not willing to think this way, you will not be able to understand monads.
Functions in Haskell should not have sideeffects. Some of them do,
for example, Debug.Trace.trace
does, but it is only meant for
debugging. The same Haskell expression should always yield the same
result. This way, Haskell can utilize lazy evaluation (or
callbyneed): An expression is evaluated when its result is
actually used. Therefore, Haskell can handle infinite structures in a
nice way. For example, the following expression is a standard
motivating example, it defines the infinite list of Fibonacci numbers,
plus a function that then actually returns the nth Fibonacci
number.
fibs = 0 : 1 : addHd fibs
where
addHd (x : r@(y : _)) = (x + y) : addHd r
fib n = fibs !! n
From an oldschool software engineer's point of view, this might look
horrible. And in fact, most programs written this way are slower than
their highlyoptimized counterparts. On the other hand, the definition
is easy to read and to write (as soon as you got to know Haskell):
relying on lazy evaluation can make writing and understanding software
easier. And today's software problems are seldom about efficiency,
they are about maintainability. A further nice property is that we do
not need to have a special case for constructs like ifthenelse
. We
can define
ifthenelse :: Bool > a > a > a
ifthenelse True a _ = a
ifthenelse False _ b = b
A seemingly equivalent C function would be
enum BOOL { FALSE, TRUE };
void* ifthenelse (b, yes, no)
enum BOOL b;
void *yes, *no;
{
switch (b) {
case FALSE:
return no;
case TRUE:
return yes;
}
}
The difference is that C has eager evaluation: Every argument is evaluated before a function is called. In Haskell,
ifthenelse True 2 (mod 1 0)
will yield 2
, while in C,
ifthenelse (TRUE, (void*)2, (void*)(1 % 0));
will yield an divisionbyzero error (well, actually, most modern
compilers will at least warn you about this, or optimize it out
through deadcodedetection, but you get the idea). Notice that most
other functional programming languages allow side effects, and also
use eager evaluation and run into the same problem as C does in this
case. Common Lisp (and generally most Lisp dialects) has a nice way of
defining things like ifthenelse
as macros. Anyway, Haskell does
lazy evaluation, with all its benefits and drawbacks.
And one major drawback is that the presence of sideeffects breaks lazy evaluation. For example, the following C program cannot be implemented in Haskell directly:
#include <stdio.h>
int i = 0;
int inc() {
return i++;
}
int main (void) {
int j;
for (j = 0; j < 10; ++j)
printf ("%d\n", inc());
}
Subsequent calls to inc
return different results, the results depend
on the hidden state i
. Assume we had such a function in Haskell,
and we would define
incs = inc : incs
Then inc !! n
would yield the nth element of the list to be
called. The contents of the list would depend on the order of the
calls to (!!)
. There might be scenarios where this is desirable, but
in most cases, this just breaks stuff. As I wrote previously, it is
possible to destructively modify values in Haskell: On the one hand,
Haskell has an FFI which lets us call C functions; on the other hand,
Haskell has "unsafe" functions like unsafePerformIO
. Their use is
discouraged, but there are situations in which it is useful, for
example for debugging, and for developing abstractions which are only
using mutable state in the background which cannot be accessed through
the frontend, like DiffArrays.
The absence of sideeffects makes certain programs impossible. For example, Quicksort, and insitusortingalgorithms in general, cannot be realized without destructive modifications of an underlying array. However, there are good purely functional substitutes for these algorithms. More problematic is the absence of I/O: Accessing and modifying files is stateful. But not having any kind of I/O limits the usability of a programming language. This is the problem state monads try to solve.
For this section, let us talk only about pure functions, that is, functions that do not have sideeffects. Let A, B, C be some types and f:A→B, g:B→C, then we can define f↣g:=λₓg(f(x)), that is, (f↣g)(x)=g(f(x)). It is easy to show that ↣ is associative. Notice that the same does not hold for functions with side effects (I'll leave the finding of a counterexample as an exercise).
A common optimization, which came from but is not limited to functional programming, is tail call optimization (TCO). Functional programs use recursion a lot more than imperative programs. One possible definition of the Gaussian sum is
gsum 0 = 0
gsum n = n + gsum (n  1)
The problem with recursion is that it consumes stack space. Therefore,
for large values of n
, this definition may exhaust the
stack. However, if the last expression is the unmodified recursive
call, the compiler can just optimize out the call that would otherwise
consume stack. The above function would have to be rewritten to
gsum_tr_ 0 r = r
gsum_tr_ n r = gsum_tr_ (n  1) (n + r)
gsum_tr n = gsum_tr_ n 0
The function gsum_tr_
would be called tail recursive, and the
recursive call does not consume stack space. In some functional
languages, this is still the method of choice. In Haskell, it is less
relevant, because in many cases, rewriting a recursive function into a
tail recursive function is straightforward, and Haskell applies this
optimization. However, let us stick to the example gsum_tr_
for the
moment. A definition in Java would look like
public static int gsum_tr_ (int n, int r) {
if (n == 0) return r;
return gsum_tr_ (n1, n+r);
}
The corresponding generated Java Assembly (output of javap c
with
comments by me, I chose Java Assembly because it is stack based) is:
public static int gsum_tr_(int, int);
Code:
0: iload_0 //load first argument (n)
1: ifne 6 //if n == 0 jump to byte 6
4: iload_1 //load second argument (r)
5: ireturn
6: iload_0 //load first argument (n)
7: iconst_1
8: isub // => n  1
9: iload_0
10: iload_1
11: iadd // => n + r
12: invokestatic #2 // stack: n  1, n + r
15: ireturn
Before the recursive call, the arguments are pushed to the stack, just to be popped again directly by the called function. Then the return value is taken from the recursive call directly. We might as well just modify the values directly, inplace:
public static int gsum_tr__ (int n, int r) {
while (true) {
if (n == 0) return r;
r = n + r;
n = n  1;
}
}
And get
public static int gsum_tr__(int, int);
Code:
0: iload_0
1: ifne 6
4: iload_1
5: ireturn
6: iload_0
7: iload_1
8: iadd
9: istore_1
10: iload_0
11: iconst_1
12: isub
13: istore_0
14: goto 0
Which is almost identical: an imperative loop with destructive updates. That is, under certain conditions, we can have destructive updates inside purely functional code. Whenever we know that the computation cannot depend on the former value, we can destructively overwrite it. More specifically, a sufficient condition is that the value occurs in the function body exactly once. This can be trivially checked at compile time, and in fact, this condition is reflected in socalled uniqueness types in some functional languages (but not in Haskell).
I hope that by now you get an impression of how much thought has been put into the concepts of functional programming. The concepts are not trivial, but neither are the concepts of higher imperative programming languages. Make sure you understood everything so far.
We now want to embed stateful operations into purely functional code,
but without the use of uniqueness types, but using state
monads. Firstly, let us define a type Ref a
, which is a reference to
some object of type a
.
data Ref a = Pto a
Creating a new reference is so simple that it almost makes no sense. However, for instructional purposes, we define it:
ret :: a > Ref a
ret x = Pto x
This definition is polymorphic. Let us look at the case a = Int
. Now
consider the following functions:
inc :: Int > Ref Int
inc x = Pto $ x + 1
twice :: Int > Ref Int
twice x = Pto $ 2 * x
These functions just read our pointer, and return a pointer to the increment/double of the former pointer. We furthermore need
(>=>) :: (a > Ref b) > (b > Ref c) > a > Ref c
(>=>) f g x = case f x of { Pto y > g y }
run :: Ref a > a
run (Pto x) = x
Now we can write the following:
run ((inc >=> twice >=> twice >=> twice >=> inc >=> ret) 0)
Before you read on, try to find out what this function returns. Answer: hover The function returns 9 hover. The nice thing is that the reference itself is handed over to the next function directly as first argument. Therefore, there is no reason to ever have more than this one reference, and an optimizing compiler can (and will) not require stack space, and regard the operations on the reference as destructive, the same way it can be done with TCO. Notice that inside this definition, except for the first function call, we never actually talk about the reference, we only tell what to do with it.
Make sure you have understood this so far. If not, I failed at explaining, but it also makes no sense to read on.
The (>=>)
is somewhat similar to our previous ↣. Especially, one can
show that it is associative. Furthermore, the function ret
is a
neutral element on both sides. These properties make our (>=>)
a
Kleisli combinator, and Ref
a monad with respect to this Kleisli
combinator. Any structure with a combinator and a return function
that satisfies these axioms is a monad, monads are mathematical
structures, much like groups, rings, fields and vector spaces: They
are collections of objects which satisfy certain axioms. And like
fields, which can be used in cryptography, and vector spaces, which
can be used for computer graphics, monads can be used to describe and
solve specific problems, and one such problem is the description
stateful operations in a purely functional context. But as for groups,
fields and vector spaces, there are many different structures that
obey the monad axioms.
Let us for a moment look at the group axioms: A group is a triple (G, ∘, e), where ∘:G→G→G e:G, such that
The monad axioms for a Kleisli combinator are quite similar:
return >=> g = g
f >=> return = f
(f >=> g) >=> h = f >=> (g >=> h)
We have a neutral element (left and right neutral), and the combinator is associative.
I chose the Kleisli combinator because I think that it is easier to
understand in the case of state monads: In our case, the only thing it
does is passing the value of the returned reference of the left
function as a first argument to the right function, making sure that
the rest of the computation does not depend on the former value, so it
can be optimized out. However, using the Kleisli combinator has
several disadvantages for actual programming. The usual definition
uses the bind
combinator (>>=)
. We can define the bind
combinator in terms of the Kleisli combinator:
(>>=) :: Ref a > (a > Ref b) > Ref b
(>>=) ma g = (id >=> g) ma
But for the moment, let us stick with the Kleisli combinator. So far, we did not do anything except writing definitions and exploiting certain optimizations.
Now we want to write something to a file. The mainfunction of a
Haskell program is of the type IO ()
, which is a monad. This monad
references the "outside world" – and it is a monad, since there can
only be one such world. This outside world is stateful. In C, we can
write
#include <stdio.h>
void wr (FILE* d) {
fprintf(d, "Hello!\n");
}
int main (int argc, char** argv) {
FILE* d = fopen(argv[1], "a");
wr(d);
printf("%d\n", ftell(d)); // 7
wr(d);
printf("%d\n", ftell(d)); // 14
}
The problem is obvious: Every call to fprintf(3)
advances the file
pointer, so the ftell(3)
calls return different file positions – and
violate referential transparency. We could, however, rewrite this
program in the following way:
#include <stdio.h>
FILE* wr (FILE* d) {
fprintf(d, "Hello!\n");
return d;
}
int main (int argc, char** argv) {
FILE* d1 = fopen(argv[1], "a");
FILE* d2 = wr(d1);
printf("%d\n", ftell(d2)); // 7
FILE* d3 = wr(d2);
printf("%d\n", ftell(d3)); // 14
}
This is how it would be done with uniqueness types: Every modification
of the file consumes a file handle, but produces a new one. Notice
that we would have to encapsulate the printf(3)
statements as well –
they change the "world". Therefore, we can define a WORLD*
type
which passed instead, and think of this WORLD*
type as a map from
handles to resources:
#include <stdio.h>
#include <stdarg.h>
typedef WORLD; // dummy
struct wfret_k { WORLD* w; void* t; };
struct wfret_k wfopen (WORLD* w, const char* name, const char* mode) {
struct wfret_k x;
x.w = w;
x.t = (void*) fopen(name, mode);
return x;
}
struct wfret_k wfprintf (WORLD* w, FILE* d, const char* format, ...) {
struct wfret_k x;
va_list aq;
va_start(aq, format);
vfprintf(d, format, aq);
x.w = w;
x.t = NULL;
return x;
}
struct wfret_k wftell (WORLD* w, FILE* d) {
struct wfret_k wt;
wt.w = w;
wt.t = (void*) ftell(d);
return wt;
}
struct wfret_k wprintf (WORLD* w, const char* format, ...) {
struct wfret_k wt;
va_list aq;
va_start(aq, format);
vprintf(format, aq);
wt.w = w;
wt.t = NULL;
return wt;
}
struct wfret_k wr (WORLD* w, FILE* d) {
return wfprintf (w, d, "Hello!\n");
}
int main (int argc, char** argv) {
WORLD* w0; //assume this was "outside"
struct wfret_k o1 = wfopen(w0, argv[1], "a");
WORLD* w1 = o1.w;
FILE* f = o1.t;
struct wfret_k o2 = wr(w1, f);
WORLD* w2 = o2.w;
struct wfret_k o3 = wftell(w2, f);
WORLD* w3 = o3.w;
struct wfret_k o4 = wprintf(w3, "%d\n", o3.t);
WORLD* w4 = o4.w;
struct wfret_k o5 = wr(w4, f);
WORLD* w5 = o5.w;
struct wfret_k o6 = wftell(w5, f);
WORLD* w6 = o6.w;
struct wfret_k o7 = wprintf(w6, "%d\n", o6.t);
}
Now, this is very complicated to do in plain C. Functional languages
usually allow for overloading variable names, and when working with
uniqueness types, this is usually the way it is done. At first sight,
this WORLD*
object does not contain anything, and is therefore
useless. However, in the scope of uniqueness types, every WORLD*
can
only be passed to another function once. Then it is gone: If we
tried to access it a second time, the code would not compile. Even
though this WORLD*
object does not contain any data, it prevents
some code from being compiled: It is a formal guarantee that can be
checked at compile time. At runtime, we could just optimize it out –
but that is fine, since we checked the correctness at compile time.
The limits of C are now reached; especially, we need currying from now
on. We will continue with pseudocode to define a Kleisli
combinator. Our wrapper functions return a polymorphic pair wfret_k
(which is realized via void*
, because C does not have polymorphism),
consisting of the world itself, and some return value.
We can now define a "modified" kleisli combinator (notice that this is pseudocode, an unholy mixture of C and Haskell):
m_kleisli :: (WORLD* → A → (WORLD*, B)) → (WORLD* → B → (WORLD*, C)) → (WORLD* → A → (WORLD*, C))
m_kleisli f g w0 a =
let (w1, b) = f w0 a
in g w1 b
Notice that except for the calls to the libc, we never do anything but
passing the worlds – which was useful to assure linearity. However, if
we take the Kleisli combinator as the given method of doing I/O,
passing the world is not really necessary anymore, because all
functions that do I/O will be combinations of smaller functions using
the Kleisli combinator, and it will sooner or later decompose into
calls to our elementary wf
functions. Instead, we assure linearity
by defining a type functor IO a
, which will replace our former
(WORLD*, a)
:
(>=>) :: (a → IO b) → (b → IO c) → (a → IO c)
We now would still have to prove the monad laws, but this is a good exercise for you.
As mentioned earlier, it is possible to define the bindoperator
(>>=) :: IO a → (a → IO b) → IO b
which is more common. In Haskell, we can therefore write
import Debug.Trace
import System.IO
import System.Environment
main :: IO ()
main = getArgs >>=
\ args >
openFile (args !! 0) AppendMode >>=
\ handle > hPutStr handle "Hello!\n" >>=
\ _ > hPutStr handle "Hello!\n" >>=
\ _ > hClose handle
There is a shorthand notation that makes these things (sometimes) easier:
import Debug.Trace
import System.IO
import System.Environment
main :: IO ()
main = do args < getArgs
handle < openFile (args !! 0) AppendMode
hPutStr handle "Hello!\n"
hPutStr handle "Hello!\n"
hClose handle
Again, a talk for the best mathematics club in the multivere. Again the intended audience are pupils, so it is not a scientific talk, and the focus lies in breadth rather than depth.
We will use the concept of "classical logic" before we define it – to make it easier to understand afterwards. For now, classical logic shall mean "the stuff you use in mathematic competitions without knowing it".
The first question we ask ourselves is what a mathematical proposition actually is. There are some patterns that regularily occur:
Not all of {∀,∃,∧,∨,→,↔,¬} is really needed. For example, A↔B is usually defined as (A→B)∧(B→A). Classically, we can define A→B as ¬A∨B. Furthermore, there are different connectives, like the exclusive or ("XOR"), which holds if exactly one of the two proposition holds, and could be defined by A∨B∧¬(A∧B). Which subset we use axiomatically and which ones we define depends on the need of the theory.
The theory of this talk requires of the characters mentioned so far the subset {∀,∃,∧,∨,→}. However, we need two more: ⊤, which is defined to be the proposition that is always true (the "verum"), and ⊥, which is always false (the "falsum"). We can define negation through → and ⊥:
¬A := A→⊥
The intuition is that ⊥ can only follow from something wrong. Of course, this subset is redundant. In classical logic, we can define everything by means of ∀, → and ⊥ (read ↔ᶜ as "is classically equivalent") :
∃xA ↔ᶜ ¬∀x¬A
A∧B ↔ᶜ ¬(A→B→⊥)
A∨B ↔ᶜ ¬(¬A∧¬B)
We still lack a possibility of actually expressing something meaningful in any common mathematical theory. Let us, for the moment, stick with a simple but useful theory, the theory of Real Closed Fields¹ – it is an axiomatization of real numbers. Quantifiers are over ℝ. We need to add a few additional symbols:
0 and 1 are constant symbols (or sometimes "nullary function symbols"). + and · are binary function symbols. P and = are socalled relation symbols: Their "return value" is not a number, but a proposition. A proposition of the form P(x) or of the form x=y is called an atom (or prime formula, or atomic formula): It cannot logically be "split". We leave out the nasty details of operator precedence and syntactical correctness, as well as the defined semantics.
With these symbols, we can express many propositions. For example, we can express the axioms of a commutative ring (which all hold in ℝ):
We regard these propositions as axioms. Notice that none of them contains implications (this will be relevant later). Furthermore, we have the following axioms:
For the further axioms, we introduce a notation: We define
x<y :⇔ P(yx) (which is short for ∃z.z+x=y∧P(x))
We add infinitely many axioms, using a socalled axiom scheme: It is easy to enumerate all axioms of this axiom scheme, and it is easy to decide whether some proposition is in this axiom scheme.
A polynomial is what we intuitively would regard as one: a₀+a₁x+a₂x²+…+aₙxⁿ. Now, we add one axiom for every possible polynomial:
This axiom system is called GRCF. Now that we have axioms, we want to derive theorems from them. The question that bothered the early logicians was whether a computer can automatically decide whether a given proposition, encoded as a string, is true or not. This is not possible in general, though there are special cases where it is.
The next question is whether we can at least make it possible for a computer to check proofs. To be more clear, the question is whether there is a finite set of rules, which allows us to form exactly the true propositions. And this is possible.
Sets of rules are called calculi, and a calculus with this property is called sound and complete, where sound means it allows to derive only correct propositions, and complete means that it actually allows to derive all correct ones. A very simple such calculus is the calculus of natural deduction. It is a very simple set of rules, that is easy to learn and work with. The notation is of the form
input1 input2 …
───────────────
output
The first rule is essentially modus ponens:
A A→B
───────
B
Introduction of "and" and "or" and "exists" is also easy:
A B A B A[x:=t]
─── ─── ─── ───────
A∨B A∨B A∧B ∃xA
where A[x:=t] means A with t inserted for x in A.
Elimination of "and" and "forall" is also easy:
A∧B A∧B ∀xA
─── ─── ───────
A B A[x:=t]
The other rules are slightly more complicated, because they require the concept of a free assumption: An assumption is something we just propose, without proving it – it is a proposition that is on the top of a proof tree. New assumptions are free, but can be bound by other rules. We will denote bound assumptions in brackets. A proof is correct only when all free assumptions are axioms.
The simplest rule that binds a free assumption is the introduction rule for implications:
[A]

B
───
A→B
This rule means: if we have a proof with the free assumption A and we can derive B, then we are allowed to bind this assumption and derive A→B. This rule essentially says: If B follows from A, then A→B. Notice that one can also bind assumptions that do not occur at all:
B
─── (bind A)
A→B
For disjunction elimination we have
A∨B A→C B→C
───────────
C
The remaining rules have variable conditions. A variable in some proposition is called free, if it is not in the scope of ∃ or ∀.
∃xA ∀x(A→C)
────────────
C
if x does not occur freely in C.
A
───
∀xA
if x does not occur freely in any free assumption above A except for A itself. These rules are the rules of minimal logic: Everything derivable by these rules is said to hold minimally. Notice that this system does not have a concept of negation. However, we can derive some simple theorems from it.
[A] A→B
───────
B B→C
────────
C
─── (bind A)
A→C
This proves that A→B and B→C implies A→C. We can even prove some basic facts about negation – for minimal logic, ⊥ has no special meaning:
A ¬A
────
⊥
However, there are things we can not prove. For example, ((A∨B)∧¬A)→B holds intuitively, but it is not derivable by the calculus so far. Instead of adding new rules, we add axioms: For every proposition A, we add the axiom "exfalsoquodlibet": ∀x∀y… ⊥→A. We can then derive
[(A∨B)∧¬A]
──────────
[A] ¬A
──────────
⊥ ⊥→B
────────────────
[(A∨B)∧¬A] B [B]
────────── ─── (bind A) ─── (bind B)
A∨B A→B B→B
────────────────────────────────
B
──────────── (bind (A∨B)∧¬A)
((A∨B)∧¬A)→B
This system is called intuitionistic logic, propositions derivable with it are said to hold intuitionistically. Usually, when people talk about constructive mathematics, they mean mathematics formalized with intuitionistic logic. Intuitionistic logic has the property that every object that can be shown to exist can be calculated.
One thing that can not be derived is the Drinker's Problem: In every pub, there is a person, such that if this person is drunk, all other persons are drunk:
∃d. D(d)→∀q.D(q)
This is not derivable in intuitionistic logic. The usual proof is:
Assume all people are drunk. Then we are done. Assume at least one person is not drunk. Then this person satisfies the condition: (S)he is not drunk, and not all people are drunk.
This proof distinguishes between ∀q.D(q) and ¬∀q.D(q). And this is the thing that cannot be done in intuitionistic logic. This has a good reason: Consider a pub with infinitely many guests (of course, this breaks the example, but logically, we can express this). Let us say we enumerate the persons by natural numbers. Then the state of a pub can be expressed by a function d : ℕ→{0,1}, where 0 means "not drunk", and 1 means "drunk". For such a function, it is not generally possible to decide whether it will always be 1. The above proof has no computational content.
If we really want the full logical strength of ordinary mathematics, you need classical logic: You need to add stability (or duplex negatio affirmat)
Lemma: Tertium non datur holds classically, that is, for every proposition A with free variables x, y, z, ∀x∀y∀z….A∨¬A.
[A]
────
[¬(A∨¬A)] A∨¬A
───────────────
⊥
──── (bind A)
¬A
────
[¬(A∨¬A)] A∨¬A
───────────────────────
⊥
──────── (bind ¬(A∨¬A))
¬¬(A∨¬A) ¬¬(A∨¬A)→(A∨¬A)
────────────────────────────────────────
A∨¬A
Lemma: ¬∀q.D(q)→∃q.¬D(q)
[D(q)]
───────
¬∀q.D(q) ∀q.D(q)
──────────────────
⊥
───── (bind D(q))
¬D(q)
────────
∃q.¬D(q)
We can formalize the former proof:
[¬D(d)] [D(d)]
───────────────
⊥→∀q.D(q) ⊥
──────────────────────────
∀q.D(q)
──────────── (bind D(d))
D(d)→∀q.D(q)
───────────────
∀q.D(q) ¬∀q.D(q) ∃d.D(d)→∀q.D(q)
──────────── ──────── ────────────────────────
D(d)→∀q.D(q) ∃q.¬D(q) ∀d.¬D(d)→∃d.D(d)→∀q.D(q)
─────────────── ─────────────────────────────────
(∀q.D(q))∨(¬∀q.D(q)) ∃d.D(d)→∀q.D(q) ∃d.D(d)→∀q.D(q)
──────────────────────────────────────────────────────────
∃d.D(d)→∀q.D(q)
Another proof that shows the problem with nonconstructive proofs:
Lemma: There exist two numbers a,b∈ℝℚ such that aᵇ∈ℚ
Proof: If (√2)^(√2)∈ℚ, we are done. If not, then ((√2)^(√2))^(√2)=2∈ℚ.
Constructive mathematics have nice properties, but classical mathematics is easier. There are theories for which classical derivability implies intuitionistic derivability. We will look at one special kind of theory which satisfies this property, the socalled geometric theories.
The continuation translation² P/A of a formula P for a formula A is recursively defined as:
P/⊥ is called the GödelGentzen negative translation of P.
Let Γ be a set of axioms. Then Γ/A:={P/AP∈Γ}.
Lemma: Let A not contain free variables that occur bound in C, and C not contain implications or universial quantifiers. Then C/A↔((C→A)→A) holds intuitionistically.
The proof goes by structural induction on C. It is long and technical. We only sketch it³. For atomic C, this follows by definition. For C=∃xB, we may assume – by induction – that B/A↔((B→A)→A). We have to prove that ((∀x.B/A→A)→A)↔((∃xB→A)→A).
The → direction:
[C]
───
[∃x.C→A] ∃xC
────────────
A C/A→(C→A)→A [C/A]
─── (bind C) ─────────────────
C→A (C→A)→A
──────────────────────────
A
───── (bind C/A)
C/A→A
────────
(∀x.C/A→A)→A ∀x.C/A→A
──────────────────────
A
────────── (bind ∃x.C→A)
(∃x.C→A)→A
[C→A] [C]
──────────
A
───────── (bind (C→A)
[∀x.C/A→A] ((C→A)→A)→C/A (C→A)→A
────────── ───────────────────────
C/A→A C/A
──────────────────────
A
─── (bind C)
∃xC C→A
─────────────────
A
────── (bind ∃xC)
(∃x.C→A)→A ∃xC→A
──────────────────────────
A
──────────── (bind ∀x.C/A→A)
(∀x.C/A→A)→A
Lemma: (¬¬B→B)/A is derivable in intuitionistic logic.
The proof is also technical, so we omit it. We can now, however, show that we can convert a classical proof of B from a theory Γ into an intuitionistic proof of B/A from Γ/A:
Definition A geometric proposition is a proposition that does not contain → and ∀. A geometric implication is a formula of the form ∀x…(A→B), where A and B are geometric propositions. geometric theory is a theory which only contains geometric implications.
Theorem: From Γ follows Γ/A intuitionistically, for geometric theories.
Theorem (Barr): Let B be a geometric implication, and Γ be a geometric theory, such that B is classically derivable from Γ. Then B is intuitionistically derivable from Γ.
Notice that GRCF only contains geometric implications or geometric propositions. A geometric proposition B can be replaced by an equivalent geometric implication by ⊤→A. So GRCF is a geometric theory.
───────────
Footnotes:
¹We use the axiomatization from E. Palmgren
²See my diploma thesis
³For the long version, see my diploma thesis
It's time for another slightly political post.
During my lifetime so far, I have heard two statements several times: One is, that communism and socialism have lost against capitalism, and the other one is that capitalism will collapse sooner or later. And in fact, the breakdown of the Soviet Union suggests the first one, the current financial crysis suggests the second one. In my opinion, however, both statements are not directly wrong, but they miss a certain point: Liberal capitalism is not the presence, but the absence of a system. Similar to night being the absence of light, and cold being the absence of heat. Capitalism is more like economic anarchy.
Now, what can happen is that some model of social market economy collapses – because it is the presence of a system. It can happen that the patent system collapses – because it is the presence of a system. But saying capitalism can collapse is like saying darkness can go away: It is a mere metaphor for something filling the gap that we named.
And whenever a real system collapses, it defaults to the absence of a system, which is capitalism. This is why it seems like capitalism is the superior strategy to everything else: It is simply the thing that is there when nothing else is there.
This is, of course, not an argument against capitalism. Laissez faire capitalism is like evolution: Good things can evolve from it. But so can very bad things. It is up to us to find ways to cope with it, and make the best out of it.
External Content not shown. Further Information.
Unfortunately, Common LISP support in NixOS is worse than one would expect (and unfortunately I do not have the time to change this) – nothing comparable to the commonlispcontroller, as far as I see. In NixOS, I do not want to install sbcl globally when I don't need it for anything else but developing my lisp projects, which is something that should be done inside a nixshell.
As probably many lispers, I like getting my packages through QuickLisp. According to the FAQ, it is possible to install packages into other directories, so it would be possible – in theory – to have different nixshells with a different quicklisp package directory; but then, I would probably install the same software multiple times, and the compilecache would not work as desired. The directory structure of QuickLisp looks like it should be possible to have different branches, but I do not know of such a mechanism, and for my use case, one QuickLisp directory is sufficient. It would, though, be nice to have something like Cabal2Nix for QuickLisp.
Anyway, for my use case, I need SLIME and the CFFI, and some other packages. One of my Nix expressions, for example, is
with import <nixpkgs> {};
runCommand "bash"
{
buildInputs = [rlwrap stdenv bash emacs php sbcl imagemagick gnome3.eog git geoip libressl lispPackages.cffi emacs24PackagesNg.markdownmode];
} ""
The problem is that the CFFI appears not to follow the Nix environment
when loading dynamic libraries. However, the variable
$NIX_LISP_LD_LIBRARY_PATH
is set automatically, and it is sufficient
to append it to the $LD_LIBRARY_PATH
.
$ LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$NIX_LISP_LD_LIBRARY_PATH rlwrap sbcl
Now, through my global installation of QuickLisp, I can install
swank. In the QuickLisp directory, there will be the files for
SLIME. My custom emacs configuration, which I save in a file
emacsconfig.el
in this nixshell's directory (where the
default.nix
is stored), looks like
(addtolist 'loadpath "~/quicklisp/dists/quicklisp/software/slime2.14")
(require 'slimeautoloads)
(setq inferiorlispprogram "sbcl")
(addtolist 'slimecontribs 'slimefancy)
Now I can run
$ LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$NIX_LISP_LD_LIBRARY_PATH emacs q l emacsconfig.el
and start SLIME as usual via Mx slime
.
I added the functionality of Disquscomments to my blog software. Because why not.
However, the comments are not shown normally, to protect your privacy. They are only shown on singleentrypages, and only when you actively pass &disqus=yes (or follow a link that does so), that is, as long as you do not click on the corresponding links, there should be no difference (well, except for the links, of course).
External Content not shown. Further Information. Hotels, hostels and cafés often have public wifi hotspots which require a log in that is given to guests. The usual behavior then often is that the first website that one tries to open is being redirected to some login page. Mostly, this login page requires javascript, cookies and all the other shitty stuff. And often, these wifi connections are unencrypted.
Many websites, apps and other stuff one has to use have the default
behavior to automatically reconnect to some host if they are
disconnected. They can easily get confused, depending on the
redirection mechanism which is used. Sometimes you are explicitly
redirected through a HTTP proxy  that is, you at least get an
explicit HTTP redirection header. Sometimes they just use a
transparent proxy, so it appears as if the requested IP address would
answer. Sometimes it just answers the first DNS query with a wrong IP,
which in turn can confuse nscd(8)
.
Except for advertisement (which I really cannot stand anyway – if I pay for a hotel, then I want to have wifi included, without additional ads), the only reason for doing this that I can think of is to identify people for legal reasons. But then, please just let everyone sign a contract on paper, and just use wpa2psk. This is a solved problem. The configuration is easy, and it works.
If you really want to send messages to the user, at least write on your tickets which website they have to visit in advance, and reject all other connections beforehand. Yes, this might sound a little less userfriendly at first sight. But visiting a website is something every internet user should know anyway, while leaving apps and programs in an indefinite state can really mess systems up.
External Content not shown. Further Information. With all the new social networks, Jabber is getting less and less popular, and it has gotten a lot harder to convince even people with computer experience to get an Jabber account and be online on it.
In fact, there are some problems with the XMPP protocol.
Probably the worst one is the fact that I haven't seen any server that correctly handles multiple clients, and doesn't lose messages when used on a mobile phone. My server has tcp keepalives activated and pings in regular intervals – which increases the traffic but also makes it sigificantly less probable to lose messages on connection loss. There are proposed solutions to this, which will hopefully be more widespread some time in the future; for now, this problem exists.
Another severe problem came up when Google dropped its XMPP federation support. I still don't quite understand why they did so, but keep this in mind: A Google account ist not a full Jabber account anymore!
Yet another argument I often hear is that Jabber is not widespread. "None of my friends use it." – well, then convince them to use it. Be the first to have an account and spread it.
It's not like it costs anything or is complicated. It's not hard to have a jabber account and be online from time to time even though you don't use it much. If you worry about the system resources and the traffic, I can assure you that your Facebook and Whatsapp and Whatnotapps are probably worse.
It is not even about destroying Facebook, Microsoft, Google, etc. – it is about limiting their power. It is about having a fallback solution when they decide to become evil – so they have a pressure not to become evil.
It's your little part for the freedom of the internet, for your freedom. Is that really too much?