Programmierer können jede Programmiersprache benutzen,
solange sie imperativ ist. (Chris Okasaki)

This is the first of four parts of my posts about the irrefutability of AC in ZF.

It will be about the basic logical background.

Canonically, an alphabet is a set of symbols. It is not necessarily finite. A string is a finite sequence of symbols. A language is a set of strings.

Let us fix the alphabet of logical symbols {\mathcal S}:=\{\forall,\rightarrow,\wedge,\bot\}. Furthermore, we define the sets REL_n:=\{R_i^n|i\in\mathbb{N}_0\} of n-ary relation symbols for n\in\mathbb{N}_0, and the sets FUN_n:=\{f_i^n|i\in\mathbb{N}_0\} of n-ary function symbols, and the set VAR:=\{x_i|i\in\mathbb{N}_0\} of variable symbols. Of course, they will stand for relations, functions and variables, but so far, they are only symbols with no meaning.

We define the set TER, the language of terms, as the smallest superset of VAR satisfying the closure condition t_1,\ldots,t_n\in VAR\Rightarrow f_i^n t_1\ldots t_n\in VAR for all i,n. This resembles terms in polish notation, but still, so far TER is just a set of strings with no specified meaning.

We define the set {\mathcal L}, the language of formulae, as the smallest set containing all strings of the form R_i^n t_1\ldots t_n for t_1,\ldots,t_n\in TER for all i,n, and satisfying the closure conditions
  • \bot\in{\mathcal L}
  • A\in{\mathcal L}, x_i\in VAR \Rightarrow \forall_x A\in{\mathcal L}
  • A\in{\mathcal L}, B\in{\mathcal L} \Rightarrow A\rightarrow B\in{\mathcal L}
  • A\in{\mathcal L}, B\in{\mathcal L} \Rightarrow A\wedge B\in{\mathcal L}
\bot stands for the falsum, the proposition which is always false.

This intuitively resembles logical formulae built up from relations between terms and their conjunctions, implications and quantifications. In most theories, we have term symbols different from f_i^n, and we may safely use these instead of f_i^n, same for relation symbols, same for variable symbols. Some binary function symbols and relation symbols use infix notation, and we may as well allow these and brackets to change the precedence as usual. So for example, instead of \forall_{x_0}\forall_{x_1}\forall_{x_2} R_0^2 f_0^2 f_0^2 x_0 x_1 x_2 f_0^2 x_0 f_0^2 x_1 x_2 we may write \forall_x\forall_y\forall_z (x+y)+z = x+(y+z), as it is clear how to convert this formula into our defined notation.

Furthermore, we define \lnot A, "not A", to be A\rightarrow\bot, and A\vee B, "A or B", to be \lnot (\lnot A\wedge \lnot B), and \exists_x A, "there exists an x satisfying A", to be \lnot\forall_x\lnot A.

A set of such formulae is called (first-order) theory. Intuitively, a formula A follows from a theory T, if all structures satisfying the formulae in T also satisfy A. Make sure that you understood this intuition, it is crucial for the further definitions. It motivates to define what exactly a structure is, and what it means for a structure to satisfy a formula.

We define the concept of an interpretation \mathcal M, which maps function symbols and relation symbols to functions and relations, and bounds quantifiers. Mathematical structures usually have relations and functions on a domain, for example for arithmetic this is the set of natural numbers. So an interpretation has a domain \mathcal D. Having fixed this domain, we need a rather unintuitive object, a function \eta:VAR\rightarrow{\mathcal D}, the valuation. This function is only a "helper" for some edge cases in our setting. And then finally, we need two functions {\mathcal I}_R and {\mathcal I}_f, mapping REL_n and FUN_n to n-ary functions and relations on \mathcal D. We unite these functions into a single function \mathcal I, as it is clear which one of them we mean by the argument we give to it. Now, an interpretation {\mathcal M}=({\mathcal D},{\mathcal I},\eta) is a triple of these functions and a domain.

An example would be \mathcal D=\mathbb{N}, \eta:i\mapsto x_i, {\mathcal I}_R(R_0^2)=\{(x,x)|x\in\mathbb{N}\}, {\mathcal I}_f(f_0^2))= (x,y)\mapsto x+y, which would give a standard interpretation of the above formula.

We can extend {\mathcal I} to general terms by {\mathcal I}(x_i):=\eta(x_i) for variables x_i, and {\mathcal I}(f_i^n t_1\ldots t_n):= ({\mathcal I}(f_i^n))({\mathcal I}(t_1),\ldots,{\mathcal I}(t_n)). (As you can see, the valuation's purpose is mainly to give a value to free variables.)

Now we can define what it means for an interpretation {\mathcal M}=({\mathcal D},{\mathcal I},\eta) to satisfy a formula A. For this, we write {\mathcal M}\models A, saying \mathcal M models A, or that \mathcal M is a model of A:
  • {\mathcal M}\not\models\bot
  • {\mathcal M}\models R_i^n t_1\ldots t_n iff ({\mathcal I}(t_1),\ldots,{\mathcal I}(t_n))\in{\mathcal I}(R_i^n).
  • {\mathcal M}\models A\wedge B iff {\mathcal M}\models A and {\mathcal M}\models B
  • {\mathcal M}\models A\rightarrow B iff {\mathcal M}\models A implies {\mathcal M}\models B
  • {\mathcal M}\models \forall_{x_i} A iff for all x\in{\mathcal D} we have ({\mathcal D},{\mathcal I},\theta)\models A, where \theta(y)=\left\{\begin{array}{rl} x & \mbox{for } y=x_i \\ \eta(y) & \mbox{otherwise}\end{array}\right.
For a theory T we define {\mathcal M}\models T by {\mathcal M}\models A for all A\in T.

To add an additional level of confusion, we define a second meaning for \models, namely a relation between theories. Let T,U be theories. Then T\models U iff for all models {\mathcal M}, {\mathcal M}\models T implies {\mathcal M}\models U. Furthermore, we definde T\models A for a formula A to be T\models\{A\}. T\models A now finally means that A follows from T, which is what we wanted to achieve.

There are some well-known theories like ZFC, NGB and PA. But there are also simpler theories like the theory of groups, {\mathcal G}=\{\forall_x\forall_y\forall_z (x+y)+z=x+(y+z), \forall_x\exists_y x+y=0, \forall_x x+0=x\}. Every group is a model of {\mathcal G}.

Theories which have at least one model are called satisfiable, it is provable that this implies that no contradiction can be derived from them, that is, they are also consistent. There are some theories which do not have a model, for example, trivially \{\bot\} does not have one. Less trivial, we can define the theory {\mathcal F} of fields similar to the theory of groups. Then {\mathcal F}\cup\{\exists_{a_1}\ldots\exists_{a_6}\forall_x((\bigvee\limits_{i=1}^6 x=a_i)\wedge(\bigwedge\limits_{1\le i < j\le 6} a_i\neq a_j))\} is not satisfiable, as it can only be satisfied by a field with exactly six elements, which clearly does not exist.

The other question, which will be relevant in the next parts, is the question whether formulae A are decided by a theory T anyway, that is, whether T\vdash A or T\vdash\lnot A. Gödel proved that this is not the case when the theory is of sufficient complexity. Furthermore, there is a proof that this does not hold for the theory of ZF and the axiom of choice. Such propositions are called independent.

But there are much simpler examples. Consider \mathcal G again, and the formula A=\forall_x\forall_y x+y=y+x. A is independent from \mathcal G: There is a model of \mathcal G which satisfies A, and a model of \mathcal G which does not satisfy A.

For example, (\mathbb{Z},\begin{array}{c} + \rightarrow (x,y)\mapsto x+y \\ 0 \rightarrow 0 \\ = \rightarrow \{(x,x)|x\in\mathbb{Z}\}\end{array}, i\mapsto x_i\} \models A, since the group of integers is abelian. But for example, the multiplicative group of invertible matrices is not commutative.

In the end, the independence proof of AC from ZFC is not different from this: A model of set theory is given which satisfies AC, and another model is given which satisfies the negation of AC.