Panel 1: A father is shown standing behind his nerdy looking son. its 3 am. The son is sitting in front of his laptop connected to a printer. The fathers speech bubble: "What are you doing, son?". The sons speech bubble says: "Compiling printer drivers. Need to print 15 pages for a presentation at 8 am." --- Panel 2: Closeup of the father. Thought bubble: "Why doesn't he just use our Windows PC... What did we do wrong?"

External Content not shown. Further Information.

Popular Culture:

Nerd Culture:

Science, Software, Hardware:

Zeroth World:

Comics/Images/Audios/Videos/Games:

Quotes:

Panel 1-2: It is night, some stars are visible. A person is standing at a bridge handrail, saying: "*sigh*, so this is how it ends." -- Panel 3: The person sits down on the sidewalk and opens up a laptop, saying: "I will write a mail to the police. They have people who are prepared to see such things. Hm. But I have to make sure that it does not arrive before… it is finished." -- Panel 4: "Ok, let's see. I need at(1p). And then I will invoke mutt(1). Hm. What was the commandline again? Let's see…" -- Panel 5: It is dawn. The person says: "Hm. Ok, I have to set LANG=C. But what timezone will it use then? Hm. And was it RFC 2822 time or RFC 3339?" -- Panel 6: It is day. A city cleaner with a broom appears and sais: "Sir, I have to sweep here." The person says: "Ok, sorry.", grabs his laptop and stands up.

People learning about Haskell frequently run into the problem of "understanding Monads". This is not surprising, as this is a distinguishing feature of the language: Haskell-Code is often written in terms of monads. I want to give an introduction to monads and their main application, state monads, for people who are practically inclined, but explicitly not a practical introduction: My objective is to make these people understand monads, I do not want to argue whether monads are useful. Before I even start explaining monads, I will explain lots of secondary stuff, which is, in my opinion, source of confusion about monads and functional programming in general. And of course, this introduction is about my personal opinions and experiences at the time when I write this. They might change, they might not be correct. As always, I am thankful for suggestions and criticism.

This is not an introduction to functional programming. Basic knowledge about writing simple programs is required, specifically, what λ means, and what a recursive function is. I also assume good familiarity with imperative languages. If you think you are a good programmer, and you still don't understand most things explained here, you should probably think again. If something isn't clear, Wikipedia and DuckDuckGo are your friend.

Different ways of thinking

C's records are called "structs". They are more low-level than the records of most other languages, but conceptually the same. A point on a plane can be defined via

struct point {
  float x;
  float y;
};

An approximation for the distance between two points can be defined as


float distance (a, b)
 struct point* a;
 struct point* b;
{ union {
   float f;
   int l; } x_;
  float y_;
  int z;
  x_.f = b->x - a->x;
  y_ = b->y - a->y;
  x_.f = 1/(x_.f*x_.f + y_*y_);
  y_ = x_.f * 0.5f;
  x_.l = 0x5f3759df - (x_.l >> 1);
  for (z = 0; z < 10; ++z)
   x_.f *= (1.5f - (y_ * x_.f * x_.f));
 return x_.f;
}

Structs can also contain arrays, but only when they have fixed length; otherwise, one has to use pointers:

struct point_7d {
 float coordinates[7];
};

There is a language extension which allows the use of zero-length arrays under certain conditions. A person without prior C knowledge may wonder about the use case of an array of length zero. Well, in C, array access is insecure: b[n] is just an abbrevitation for *(b+n). Therefore, declaring an array of length zero while knowing that more elements are there is perfectly legal, and it often occurs that some file format has some header, which can be written as a struct, and which indicates the length of the rest of the file. The struct itself then only contains the header information, but the content of the file can still be accessed through this struct. As an example, the following program generates a bitmap file, using a struct with a zero-length array:

#include <stdint.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/mman.h>
#include <fcntl.h>
#include <strings.h>

#pragma pack(push,1)
struct bmp {
  uint16_t magic;
  uint32_t filesize;
  uint16_t reserved1, reserved2;
  uint32_t pixel_offset, dib_hdr_size;
  int32_t width, height;
  uint16_t planes, bits_per_pixel;
  uint32_t compression;
  uint32_t image_size;
  uint32_t xppm, yppm, coltab, important_colors;
  uint32_t imagedata[0];
};
#pragma pack(pop)

int mmapfile (const char* filename, off_t s, void** r) {
  int ret = open(filename, O_CREAT|O_RDWR, S_IRWXU);
  if (ret < 0) exit(EXIT_FAILURE);
  if (ftruncate(ret, s) < 0) exit(EXIT_FAILURE);
  *r = mmap(NULL, s, PROT_READ|PROT_WRITE, MAP_SHARED, ret, 0);
  if (*r == MAP_FAILED) exit(EXIT_FAILURE);
  return ret;
}

struct bmp* bmp_file (const char* filename,
              uint32_t width, uint32_t height) {
  size_t
    imgs = width*height*4,
    fs = sizeof(struct bmp)+imgs;
  struct bmp* b;

  mmapfile(filename, fs, (void**) &b);

  b->magic = 19778;
  b->filesize = fs;
  b->reserved1 = b->reserved2 =
    b->compression = b->xppm = b->yppm =
    b->coltab = b->important_colors = 0;
  b->pixel_offset = sizeof(struct bmp);
  b->dib_hdr_size = 40;
  b->width = width;
  b->height = height;
  b->planes = 1;
  b->bits_per_pixel = 32;
  b->image_size = imgs;
  return b;
}

int main (int argc, char** argv) {
  int i;
  if (argc < 2) exit(EXIT_FAILURE);
  struct bmp* b = bmp_file (argv[1], 100, 100);
  bzero(b->imagedata, b->image_size);
  for (i = 0; i < 100; ++i) {
    b->imagedata[i + 100*i] = ~0;
  }
  exit(EXIT_SUCCESS);
}

This code is not really portable, and it is arguable whether it is beautiful. I consider this use of array elements outside of the declared bounds as a hack, and as such, it is beautiful. It is a creative solution to a common problem. However, it is not the kind of code one should be writing these days, except when you really really need to squeeze out every processor cycle you have. However, it is a way of thinking: C's strength is that it does not enforce its abstractions. It lets you do whatever you want, without warning, so you get very efficient code. For today's measures, on the continuum of the abstraction-optimization tradeoff, it is on the far right side.

Haskell, on the other hand, is on the far left side: Abstraction is extremely valuable, and should only be broken if there is no alternative, and even then some overhead will be accepted to keep it behaving in a clean way. It is a radically different way of thinking. Scott Meyers said that everything in C++ has a reason. The same holds for Haskell. It's just that Haskell sets other priorities. One can argue whether this is better or worse, whether it is practically relevant or purely academic; but this is not the topic of this post. To understand why monads were invented, you need to think this way; if you're not willing to think this way, you will not be able to understand monads.

The problem

Functions in Haskell should not have side-effects. Some of them do, for example, Debug.Trace.trace does, but it is only meant for debugging. The same Haskell expression should always yield the same result. This way, Haskell can utilize lazy evaluation (or call-by-need): An expression is evaluated when its result is actually used. Therefore, Haskell can handle infinite structures in a nice way. For example, the following expression is a standard motivating example, it defines the infinite list of Fibonacci numbers, plus a function that then actually returns the n-th Fibonacci number.

fibs = 0 : 1 : addHd fibs
       where
         addHd (x : r@(y : _)) = (x + y) : addHd r

fib n = fibs !! n

From an old-school software engineer's point of view, this might look horrible. And in fact, most programs written this way are slower than their highly-optimized counterparts. On the other hand, the definition is easy to read and to write (as soon as you got to know Haskell): relying on lazy evaluation can make writing and understanding software easier. And today's software problems are seldom about efficiency, they are about maintainability. A further nice property is that we do not need to have a special case for constructs like if-then-else. We can define

ifthenelse :: Bool -> a -> a -> a
ifthenelse True  a _ = a
ifthenelse False _ b = b

A seemingly equivalent C function would be

enum BOOL { FALSE, TRUE };

void* ifthenelse (b, yes, no)
     enum BOOL b;
     void *yes, *no;
{
  switch (b) {
  case FALSE:
    return no;
  case TRUE:
    return yes;
  }
}

The difference is that C has eager evaluation: Every argument is evaluated before a function is called. In Haskell,

ifthenelse True 2 (mod 1 0)

will yield 2, while in C,

ifthenelse (TRUE, (void*)2, (void*)(1 % 0));

will yield an division-by-zero error (well, actually, most modern compilers will at least warn you about this, or optimize it out through dead-code-detection, but you get the idea). Notice that most other functional programming languages allow side effects, and also use eager evaluation and run into the same problem as C does in this case. Common Lisp (and generally most Lisp dialects) has a nice way of defining things like ifthenelse as macros. Anyway, Haskell does lazy evaluation, with all its benefits and drawbacks.

And one major drawback is that the presence of side-effects breaks lazy evaluation. For example, the following C program cannot be implemented in Haskell directly:

#include <stdio.h>

int i = 0;

int inc() {
  return i++;
}

int main (void) {
  int j;
  for (j = 0; j < 10; ++j)
    printf ("%d\n", inc());
}

Subsequent calls to inc return different results, the results depend on the hidden state i. Assume we had such a function in Haskell, and we would define

incs = inc : incs

Then inc !! n would yield the n-th element of the list to be called. The contents of the list would depend on the order of the calls to (!!). There might be scenarios where this is desirable, but in most cases, this just breaks stuff. As I wrote previously, it is possible to destructively modify values in Haskell: On the one hand, Haskell has an FFI which lets us call C functions; on the other hand, Haskell has "unsafe" functions like unsafePerformIO. Their use is discouraged, but there are situations in which it is useful, for example for debugging, and for developing abstractions which are only using mutable state in the background which cannot be accessed through the frontend, like DiffArrays.

The absence of side-effects makes certain programs impossible. For example, Quicksort, and in-situ-sorting-algorithms in general, cannot be realized without destructive modifications of an underlying array. However, there are good purely functional substitutes for these algorithms. More problematic is the absence of I/O: Accessing and modifying files is stateful. But not having any kind of I/O limits the usability of a programming language. This is the problem state monads try to solve.

Properties of pure functions

For this section, let us talk only about pure functions, that is, functions that do not have side-effects. Let A, B, C be some types and f:A→B, g:B→C, then we can define f↣g:=λₓg(f(x)), that is, (f↣g)(x)=g(f(x)). It is easy to show that ↣ is associative. Notice that the same does not hold for functions with side effects (I'll leave the finding of a counterexample as an exercise).

A common optimization, which came from but is not limited to functional programming, is tail call optimization (TCO). Functional programs use recursion a lot more than imperative programs. One possible definition of the Gaussian sum is

gsum 0 = 0
gsum n = n + gsum (n - 1)

The problem with recursion is that it consumes stack space. Therefore, for large values of n, this definition may exhaust the stack. However, if the last expression is the unmodified recursive call, the compiler can just optimize out the call that would otherwise consume stack. The above function would have to be rewritten to

gsum_tr_ 0 r = r
gsum_tr_ n r = gsum_tr_ (n - 1) (n + r)
gsum_tr n = gsum_tr_ n 0

The function gsum_tr_ would be called tail recursive, and the recursive call does not consume stack space. In some functional languages, this is still the method of choice. In Haskell, it is less relevant, because in many cases, rewriting a recursive function into a tail recursive function is straightforward, and Haskell applies this optimization. However, let us stick to the example gsum_tr_ for the moment. A definition in Java would look like

public static int gsum_tr_ (int n, int r) {
    if (n == 0) return r;
    return gsum_tr_ (n-1, n+r);
}

The corresponding generated Java Assembly (output of javap -c with comments by me, I chose Java Assembly because it is stack based) is:

public static int gsum_tr_(int, int);
Code:
   0: iload_0  //load first argument (n)
   1: ifne 6   //if n == 0 jump to byte 6
   4: iload_1  //load second argument (r)
   5: ireturn
   6: iload_0  //load first argument (n)
   7: iconst_1
   8: isub     // => n - 1
   9: iload_0
  10: iload_1
  11: iadd     // => n + r
  12: invokestatic  #2 // stack: n - 1, n + r
  15: ireturn

Before the recursive call, the arguments are pushed to the stack, just to be popped again directly by the called function. Then the return value is taken from the recursive call directly. We might as well just modify the values directly, in-place:

public static int gsum_tr__ (int n, int r) {
    while (true) {
        if (n == 0) return r;
        r = n + r;
        n = n - 1;
    }
}

And get

public static int gsum_tr__(int, int);
 Code:
   0: iload_0
   1: ifne 6
   4: iload_1
   5: ireturn
   6: iload_0
   7: iload_1
   8: iadd
   9: istore_1
  10: iload_0
  11: iconst_1
  12: isub
  13: istore_0
  14: goto 0

Which is almost identical: an imperative loop with destructive updates. That is, under certain conditions, we can have destructive updates inside purely functional code. Whenever we know that the computation cannot depend on the former value, we can destructively overwrite it. More specifically, a sufficient condition is that the value occurs in the function body exactly once. This can be trivially checked at compile time, and in fact, this condition is reflected in so-called uniqueness types in some functional languages (but not in Haskell).

Putting the pieces together

I hope that by now you get an impression of how much thought has been put into the concepts of functional programming. The concepts are not trivial, but neither are the concepts of higher imperative programming languages. Make sure you understood everything so far.

We now want to embed stateful operations into purely functional code, but without the use of uniqueness types, but using state monads. Firstly, let us define a type Ref a, which is a reference to some object of type a.

data Ref a = Pto a

Creating a new reference is so simple that it almost makes no sense. However, for instructional purposes, we define it:

ret :: a -> Ref a
ret x = Pto x

This definition is polymorphic. Let us look at the case a = Int. Now consider the following functions:

inc :: Int -> Ref Int
inc x = Pto $ x + 1

twice :: Int -> Ref Int
twice x = Pto $ 2 * x

These functions just read our pointer, and return a pointer to the increment/double of the former pointer. We furthermore need

(>=>) :: (a -> Ref b) -> (b -> Ref c) -> a -> Ref c
(>=>) f g x = case f x of { Pto y -> g y }

run :: Ref a -> a
run (Pto x) = x

Now we can write the following:

run ((inc >=> twice >=> twice >=> twice >=> inc >=> ret) 0)

Before you read on, try to find out what this function returns. Answer: hover The function returns 9 hover. The nice thing is that the reference itself is handed over to the next function directly as first argument. Therefore, there is no reason to ever have more than this one reference, and an optimizing compiler can (and will) not require stack space, and regard the operations on the reference as destructive, the same way it can be done with TCO. Notice that inside this definition, except for the first function call, we never actually talk about the reference, we only tell what to do with it.

Make sure you have understood this so far. If not, I failed at explaining, but it also makes no sense to read on.

The (>=>) is somewhat similar to our previous ↣. Especially, one can show that it is associative. Furthermore, the function ret is a neutral element on both sides. These properties make our (>=>) a Kleisli combinator, and Ref a monad with respect to this Kleisli combinator. Any structure with a combinator and a return function that satisfies these axioms is a monad, monads are mathematical structures, much like groups, rings, fields and vector spaces: They are collections of objects which satisfy certain axioms. And like fields, which can be used in cryptography, and vector spaces, which can be used for computer graphics, monads can be used to describe and solve specific problems, and one such problem is the description stateful operations in a purely functional context. But as for groups, fields and vector spaces, there are many different structures that obey the monad axioms.

Let us for a moment look at the group axioms: A group is a triple (G, ∘, e), where ∘:G→G→G e:G, such that

  • ∘ is associative: a∘(b∘c)=(a∘b)∘c.
  • e is a left-neutral element of ∘: e∘a=a.
  • Every a:G has a left-inverse a⁻¹:G, such that a⁻¹∘a=e.

The monad axioms for a Kleisli combinator are quite similar:

  • return >=> g = g
  • f >=> return = f
  • (f >=> g) >=> h = f >=> (g >=> h)

We have a neutral element (left and right neutral), and the combinator is associative.

I chose the Kleisli combinator because I think that it is easier to understand in the case of state monads: In our case, the only thing it does is passing the value of the returned reference of the left function as a first argument to the right function, making sure that the rest of the computation does not depend on the former value, so it can be optimized out. However, using the Kleisli combinator has several disadvantages for actual programming. The usual definition uses the bind combinator (>>=). We can define the bind combinator in terms of the Kleisli combinator:

(>>=) :: Ref a -> (a -> Ref b) -> Ref b
(>>=) ma g = (id >=> g) ma

But for the moment, let us stick with the Kleisli combinator. So far, we did not do anything except writing definitions and exploiting certain optimizations.

Haskell's IO monad

Now we want to write something to a file. The main-function of a Haskell program is of the type IO (), which is a monad. This monad references the "outside world" – and it is a monad, since there can only be one such world. This outside world is stateful. In C, we can write

#include <stdio.h>

void wr (FILE* d) {
  fprintf(d, "Hello!\n");
}

int main (int argc, char** argv) {
  FILE* d = fopen(argv[1], "a");
  wr(d);
  printf("%d\n", ftell(d)); // 7
  wr(d);
  printf("%d\n", ftell(d)); // 14
}

The problem is obvious: Every call to fprintf(3) advances the file pointer, so the ftell(3) calls return different file positions – and violate referential transparency. We could, however, rewrite this program in the following way:

#include <stdio.h>

FILE* wr (FILE* d) {
  fprintf(d, "Hello!\n");
  return d;
}

int main (int argc, char** argv) {
  FILE* d1 = fopen(argv[1], "a");
  FILE* d2 = wr(d1);
  printf("%d\n", ftell(d2)); // 7
  FILE* d3 = wr(d2);
  printf("%d\n", ftell(d3)); // 14
}

This is how it would be done with uniqueness types: Every modification of the file consumes a file handle, but produces a new one. Notice that we would have to encapsulate the printf(3) statements as well – they change the "world". Therefore, we can define a WORLD* type which passed instead, and think of this WORLD* type as a map from handles to resources:

#include <stdio.h>
#include <stdarg.h>

typedef WORLD; // dummy

struct wfret_k { WORLD* w; void* t; };

struct wfret_k wfopen (WORLD* w, const char* name, const char* mode) {
  struct wfret_k x;
  x.w = w;
  x.t = (void*) fopen(name, mode);
  return x;
}

struct wfret_k wfprintf (WORLD* w, FILE* d, const char* format, ...) {
  struct wfret_k x;
  va_list aq;
  va_start(aq, format);
  vfprintf(d, format, aq);
  x.w = w;
  x.t = NULL;
  return x;
}

struct wfret_k wftell (WORLD* w, FILE* d) {
  struct wfret_k wt;
  wt.w = w;
  wt.t = (void*) ftell(d);
  return wt;
}

    struct wfret_k wprintf (WORLD* w, const char* format, ...) {
  struct wfret_k wt;
  va_list aq;
  va_start(aq, format);
  vprintf(format, aq);
  wt.w = w;
  wt.t = NULL;
  return wt;
}

struct wfret_k wr (WORLD* w, FILE* d) {
  return wfprintf (w, d, "Hello!\n");
}

int main (int argc, char** argv) {
  WORLD* w0; //assume this was "outside"

  struct wfret_k o1 = wfopen(w0, argv[1], "a");
  WORLD* w1 = o1.w;
  FILE* f = o1.t;

  struct wfret_k o2 = wr(w1, f);
  WORLD* w2 = o2.w;
  struct wfret_k o3 = wftell(w2, f);
  WORLD* w3 = o3.w;

  struct wfret_k o4 = wprintf(w3, "%d\n", o3.t);
  WORLD* w4 = o4.w;

  struct wfret_k o5 = wr(w4, f);
  WORLD* w5 = o5.w;

  struct wfret_k o6 = wftell(w5, f);
  WORLD* w6 = o6.w;
  struct wfret_k o7 = wprintf(w6, "%d\n", o6.t);
}

Now, this is very complicated to do in plain C. Functional languages usually allow for overloading variable names, and when working with uniqueness types, this is usually the way it is done. At first sight, this WORLD* object does not contain anything, and is therefore useless. However, in the scope of uniqueness types, every WORLD* can only be passed to another function once. Then it is gone: If we tried to access it a second time, the code would not compile. Even though this WORLD* object does not contain any data, it prevents some code from being compiled: It is a formal guarantee that can be checked at compile time. At runtime, we could just optimize it out – but that is fine, since we checked the correctness at compile time.

The limits of C are now reached; especially, we need currying from now on. We will continue with pseudocode to define a Kleisli combinator. Our wrapper functions return a polymorphic pair wfret_k (which is realized via void*, because C does not have polymorphism), consisting of the world itself, and some return value.

We can now define a "modified" kleisli combinator (notice that this is pseudocode, an unholy mixture of C and Haskell):

m_kleisli :: (WORLD* → A → (WORLD*, B)) → (WORLD* → B → (WORLD*, C)) → (WORLD* → A → (WORLD*, C))
m_kleisli f g w0 a =
let (w1, b) = f w0 a
in g w1 b

Notice that except for the calls to the libc, we never do anything but passing the worlds – which was useful to assure linearity. However, if we take the Kleisli combinator as the given method of doing I/O, passing the world is not really necessary anymore, because all functions that do I/O will be combinations of smaller functions using the Kleisli combinator, and it will sooner or later decompose into calls to our elementary wf-functions. Instead, we assure linearity by defining a type functor IO a, which will replace our former (WORLD*, a):

(>=>) :: (a → IO b) → (b → IO c) → (a → IO c)

We now would still have to prove the monad laws, but this is a good exercise for you.

As mentioned earlier, it is possible to define the bind-operator

(>>=) :: IO a → (a → IO b) → IO b

which is more common. In Haskell, we can therefore write

import Debug.Trace
import System.IO
import System.Environment

main :: IO ()
main = getArgs >>=
       \ args ->
           openFile (args !! 0) AppendMode >>=
           \ handle -> hPutStr handle "Hello!\n" >>=
           \ _ -> hPutStr handle "Hello!\n" >>=
           \ _ -> hClose handle

There is a shorthand notation that makes these things (sometimes) easier:

import Debug.Trace
import System.IO
import System.Environment

main :: IO ()
main = do args <- getArgs
          handle <- openFile (args !! 0) AppendMode
          hPutStr handle "Hello!\n"
          hPutStr handle "Hello!\n"
          hClose handle

External Content not shown. Further Information.

Popular Culture:

Nerd Culture:

Science, Software, Hardware:

Zeroth World:

Comics/Images/Audios/Videos/Games:

Quotes:

Panel 1: A person with a surprised facial expression is depicted. Speech Bubble: "Fuck. Being social actually works ..." --- Panel 2: His cellphone i shown to have 1 new call!

Again, a talk for the best mathematics club in the multivere. Again the intended audience are pupils, so it is not a scientific talk, and the focus lies in breadth rather than depth.

We will use the concept of "classical logic" before we define it – to make it easier to understand afterwards. For now, classical logic shall mean "the stuff you use in mathematic competitions without knowing it".

The first question we ask ourselves is what a mathematical proposition actually is. There are some patterns that regularily occur:

  • Some proposition A holds for all values that a variable x can contain. We then write ∀xA and say "forall x A".
  • Some proposition A holds for some value that a variable x can contain. We then write ∃xA and say "there exists an x such that A".
  • Two propositions A and B both hold. We write A∧B and say "A and B".
  • At least one of two propositions A, B, holds. We write A∨B and say "A or B".
  • If one proposition A holds, then another proposition B also holds. We write A→B and say "A implies B".
  • One proposition A holds if and only if a proposition B holds. We write A↔B and say "A is equivalent to B".
  • Some proposition A does not hold at all. We write ¬A and say "not A".

Not all of {∀,∃,∧,∨,→,↔,¬} is really needed. For example, A↔B is usually defined as (A→B)∧(B→A). Classically, we can define A→B as ¬A∨B. Furthermore, there are different connectives, like the exclusive or ("XOR"), which holds if exactly one of the two proposition holds, and could be defined by A∨B∧¬(A∧B). Which subset we use axiomatically and which ones we define depends on the need of the theory.

The theory of this talk requires of the characters mentioned so far the subset {∀,∃,∧,∨,→}. However, we need two more: ⊤, which is defined to be the proposition that is always true (the "verum"), and ⊥, which is always false (the "falsum"). We can define negation through → and ⊥:

¬A := A→⊥

The intuition is that ⊥ can only follow from something wrong. Of course, this subset is redundant. In classical logic, we can define everything by means of ∀, → and ⊥ (read ↔ᶜ as "is classically equivalent") :

∃xA ↔ᶜ ¬∀x¬A
A∧B ↔ᶜ ¬(A→B→⊥)
A∨B ↔ᶜ ¬(¬A∧¬B)

We still lack a possibility of actually expressing something meaningful in any common mathematical theory. Let us, for the moment, stick with a simple but useful theory, the theory of Real Closed Fields¹ – it is an axiomatization of real numbers. Quantifiers are over ℝ. We need to add a few additional symbols:

  • The infix symbol =, which denotes equality of numbers.
  • The infix symbol +, which denotes addition.
  • The infix symbol ·, which denotes multiplication.
  • The symbols 0 and 1, denoting zero and one.
  • The symbol P: P(x) shall mean "x is positive".

0 and 1 are constant symbols (or sometimes "nullary function symbols"). + and · are binary function symbols. P and = are so-called relation symbols: Their "return value" is not a number, but a proposition. A proposition of the form P(x) or of the form x=y is called an atom (or prime formula, or atomic formula): It cannot logically be "split". We leave out the nasty details of operator precedence and syntactical correctness, as well as the defined semantics.

With these symbols, we can express many propositions. For example, we can express the axioms of a commutative ring (which all hold in ℝ):

  • ∀a∀b (a+b=b+a ∧ a·b=b·a)
  • ∀a (a+0=a ∧ a·1=a)
  • ∀a∃b a+b=0
  • ∀a∀b∀c (a+(b+c)=(a+b)+c ∧ a·(b·c)=(a·b)·c)
  • ∀a∀b∀c a·(b+c)=a·b+a·c

We regard these propositions as axioms. Notice that none of them contains implications (this will be relevant later). Furthermore, we have the following axioms:

  • 1 is positive: P(1)
  • If P(-x) or P(x), then x is invertible: ∀x∀y. (x+y=0 ∧ (P(y)∨P(x)))→∃z.(x·z=1∨y·z=1)
  • 0 is not positive: P(0)→⊥
  • If x and y are positive, then so are their sum and product: ∀x∀y. (P(x)∧P(y))→P(x+y)∧P(x·y).
  • Trichotomy: Either P(-x) or x=0 or P(x): ∀x∀y. x+y=0 → (P(y)∨x=0∨P(x)).
  • If the sum of two numbers is positive, then one of these numbers must be positive: ∀x∀y. P(x+y)→(P(x)∨P(y))

For the further axioms, we introduce a notation: We define

x<y :⇔ P(y-x) (which is short for ∃z.z+x=y∧P(x))

We add infinitely many axioms, using a so-called axiom scheme: It is easy to enumerate all axioms of this axiom scheme, and it is easy to decide whether some proposition is in this axiom scheme.

A polynomial is what we intuitively would regard as one: a₀+a₁x+a₂x²+…+aₙxⁿ. Now, we add one axiom for every possible polynomial:

  • Intermediate value theorem for polynomials: ∀a₀∀a₁∀a₂…∀aₙ∀u∀v. a₀+a₁u+a₂u²+…+aₙuⁿ < 0 < a₀+a₁v+a₂v²+…+aₙvⁿ → ∃x. u

This axiom system is called GRCF. Now that we have axioms, we want to derive theorems from them. The question that bothered the early logicians was whether a computer can automatically decide whether a given proposition, encoded as a string, is true or not. This is not possible in general, though there are special cases where it is.

The next question is whether we can at least make it possible for a computer to check proofs. To be more clear, the question is whether there is a finite set of rules, which allows us to form exactly the true propositions. And this is possible.

Sets of rules are called calculi, and a calculus with this property is called sound and complete, where sound means it allows to derive only correct propositions, and complete means that it actually allows to derive all correct ones. A very simple such calculus is the calculus of natural deduction. It is a very simple set of rules, that is easy to learn and work with. The notation is of the form

input1 input2 …
───────────────
    output

The first rule is essentially modus ponens:

A   A→B
───────
   B

Introduction of "and" and "or" and "exists" is also easy:

 A    B   A B  A[x:=t]
───  ───  ───  ───────
A∨B  A∨B  A∧B    ∃xA

where A[x:=t] means A with t inserted for x in A.

Elimination of "and" and "forall" is also easy:

A∧B  A∧B    ∀xA
───  ───  ───────
 A    B   A[x:=t]

The other rules are slightly more complicated, because they require the concept of a free assumption: An assumption is something we just propose, without proving it – it is a proposition that is on the top of a proof tree. New assumptions are free, but can be bound by other rules. We will denote bound assumptions in brackets. A proof is correct only when all free assumptions are axioms.

The simplest rule that binds a free assumption is the introduction rule for implications:

[A]
 |
 B
───
A→B

This rule means: if we have a proof with the free assumption A and we can derive B, then we are allowed to bind this assumption and derive A→B. This rule essentially says: If B follows from A, then A→B. Notice that one can also bind assumptions that do not occur at all:

 B
─── (bind A)
A→B

For disjunction elimination we have

A∨B A→C B→C
───────────
     C

The remaining rules have variable conditions. A variable in some proposition is called free, if it is not in the scope of ∃ or ∀.

∃xA  ∀x(A→C)
────────────
     C

if x does not occur freely in C.

 A
───
∀xA

if x does not occur freely in any free assumption above A except for A itself. These rules are the rules of minimal logic: Everything derivable by these rules is said to hold minimally. Notice that this system does not have a concept of negation. However, we can derive some simple theorems from it.

[A] A→B
───────
   B    B→C
   ────────
      C
     ─── (bind A)
     A→C

This proves that A→B and B→C implies A→C. We can even prove some basic facts about negation – for minimal logic, ⊥ has no special meaning:

A ¬A
────
 ⊥

However, there are things we can not prove. For example, ((A∨B)∧¬A)→B holds intuitively, but it is not derivable by the calculus so far. Instead of adding new rules, we add axioms: For every proposition A, we add the axiom "ex-falso-quodlibet": ∀x∀y… ⊥→A. We can then derive

               [(A∨B)∧¬A]
               ──────────
           [A]    ¬A
           ──────────
               ⊥        ⊥→B
           ────────────────
[(A∨B)∧¬A]      B            [B]
──────────     ─── (bind A)  ─── (bind B)
  A∨B          A→B           B→B
────────────────────────────────
               B
          ──────────── (bind (A∨B)∧¬A)
          ((A∨B)∧¬A)→B

This system is called intuitionistic logic, propositions derivable with it are said to hold intuitionistically. Usually, when people talk about constructive mathematics, they mean mathematics formalized with intuitionistic logic. Intuitionistic logic has the property that every object that can be shown to exist can be calculated.

One thing that can not be derived is the Drinker's Problem: In every pub, there is a person, such that if this person is drunk, all other persons are drunk:

∃d. D(d)→∀q.D(q)

This is not derivable in intuitionistic logic. The usual proof is:

Assume all people are drunk. Then we are done. Assume at least one person is not drunk. Then this person satisfies the condition: (S)he is not drunk, and not all people are drunk.

This proof distinguishes between ∀q.D(q) and ¬∀q.D(q). And this is the thing that cannot be done in intuitionistic logic. This has a good reason: Consider a pub with infinitely many guests (of course, this breaks the example, but logically, we can express this). Let us say we enumerate the persons by natural numbers. Then the state of a pub can be expressed by a function d : ℕ→{0,1}, where 0 means "not drunk", and 1 means "drunk". For such a function, it is not generally possible to decide whether it will always be 1. The above proof has no computational content.

If we really want the full logical strength of ordinary mathematics, you need classical logic: You need to add stability (or duplex negatio affirmat)

  • For every proposition A with free variables x, y, z, …, we add ∀x∀y∀z….¬¬A→A.

Lemma: Tertium non datur holds classically, that is, for every proposition A with free variables x, y, z, ∀x∀y∀z….A∨¬A.

                        [A]
                       ────
            [¬(A∨¬A)]  A∨¬A
            ───────────────
                    ⊥
                   ──── (bind A)
                    ¬A
                   ────
[¬(A∨¬A)]          A∨¬A
───────────────────────
             ⊥
         ──────── (bind ¬(A∨¬A))
         ¬¬(A∨¬A)                 ¬¬(A∨¬A)→(A∨¬A)
         ────────────────────────────────────────
                         A∨¬A

Lemma: ¬∀q.D(q)→∃q.¬D(q)

           [D(q)]
           ───────
¬∀q.D(q)   ∀q.D(q)
──────────────────
         ⊥
       ───── (bind D(q))
       ¬D(q)
     ────────
     ∃q.¬D(q)

We can formalize the former proof:

                                                 [¬D(d)]  [D(d)]
                                                 ───────────────
                                      ⊥→∀q.D(q)         ⊥         
                                      ──────────────────────────
                                                ∀q.D(q)
                                               ──────────── (bind D(d))
                                               D(d)→∀q.D(q)    
                                               ───────────────
                       ∀q.D(q)        ¬∀q.D(q) ∃d.D(d)→∀q.D(q)
                      ────────────    ──────── ────────────────────────
                      D(d)→∀q.D(q)    ∃q.¬D(q) ∀d.¬D(d)→∃d.D(d)→∀q.D(q)
                      ─────────────── ─────────────────────────────────
 (∀q.D(q))∨(¬∀q.D(q)) ∃d.D(d)→∀q.D(q)       ∃d.D(d)→∀q.D(q)
 ──────────────────────────────────────────────────────────
                        ∃d.D(d)→∀q.D(q)

Another proof that shows the problem with non-constructive proofs:

Lemma: There exist two numbers a,b∈ℝ-ℚ such that aᵇ∈ℚ

Proof: If (√2)^(√2)∈ℚ, we are done. If not, then ((√2)^(√2))^(√2)=2∈ℚ.

Constructive mathematics have nice properties, but classical mathematics is easier. There are theories for which classical derivability implies intuitionistic derivability. We will look at one special kind of theory which satisfies this property, the so-called geometric theories.

The continuation translation² P/A of a formula P for a formula A is recursively defined as:

  • ⊥/A = A
  • P/A = (P→A)→A for P atomic, P \neq ⊥
  • (B∨C)/A = ((B/A→A)∧(C/A→A))→A
  • (∃xB)/A = (∀x.B/A→A)→A
  • (B→C)/A = B/A→C/A
  • (B∧C)/A = B/A∧C/A
  • (∀xB)/A = ∀x(B/A)

P/⊥ is called the Gödel-Gentzen negative translation of P.

Let Γ be a set of axioms. Then Γ/A:={P/A|P∈Γ}.

Lemma: Let A not contain free variables that occur bound in C, and C not contain implications or universial quantifiers. Then C/A↔((C→A)→A) holds intuitionistically.

The proof goes by structural induction on C. It is long and technical. We only sketch it³. For atomic C, this follows by definition. For C=∃xB, we may assume – by induction – that B/A↔((B→A)→A). We have to prove that ((∀x.B/A→A)→A)↔((∃xB→A)→A).

The → direction:

           [C]
           ───
  [∃x.C→A] ∃xC
  ────────────
    A            C/A→(C→A)→A [C/A]
   ─── (bind C)  ─────────────────
   C→A                (C→A)→A
   ──────────────────────────
                 A
               ───── (bind C/A)
               C/A→A
              ────────
(∀x.C/A→A)→A  ∀x.C/A→A
──────────────────────
            A
        ────────── (bind ∃x.C→A)
        (∃x.C→A)→A


                                               [C→A]  [C]
                                               ──────────
                                                   A
                                               ───────── (bind (C→A)
                    [∀x.C/A→A]   ((C→A)→A)→C/A   (C→A)→A
                    ──────────   ───────────────────────
                       C/A→A              C/A
                       ──────────────────────
                                 A
                                ─── (bind C)
                  ∃xC           C→A
                  ─────────────────
                         A
                      ────── (bind ∃xC)
 (∃x.C→A)→A           ∃xC→A
 ──────────────────────────
              A
        ──────────── (bind ∀x.C/A→A)
        (∀x.C/A→A)→A

Lemma: (¬¬B→B)/A is derivable in intuitionistic logic.

The proof is also technical, so we omit it. We can now, however, show that we can convert a classical proof of B from a theory Γ into an intuitionistic proof of B/A from Γ/A:

Definition A geometric proposition is a proposition that does not contain → and ∀. A geometric implication is a formula of the form ∀x…(A→B), where A and B are geometric propositions. geometric theory is a theory which only contains geometric implications.

Theorem: From Γ follows Γ/A intuitionistically, for geometric theories.

Theorem (Barr): Let B be a geometric implication, and Γ be a geometric theory, such that B is classically derivable from Γ. Then B is intuitionistically derivable from Γ.

Notice that GRCF only contains geometric implications or geometric propositions. A geometric proposition B can be replaced by an equivalent geometric implication by ⊤→A. So GRCF is a geometric theory.

───────────
Footnotes:
¹We use the axiomatization from E. Palmgren
²See my diploma thesis
³For the long version, see my diploma thesis

External Content not shown. Further Information.

Popular Culture:

Nerd Culture:

Science, Software, Hardware:

Zeroth World:

Comics/Images/Audios/Videos/Games:

Quotes: