Nie dürft ihr so tief sinken, von dem Kakao,
durch den man euch zieht, auch noch zu trinken.
(Erich Kästner)

And again, a talk for the best mathematics club of the multiverse. This time, I thought I'd have a slightly more abstract topic, namely Kripke models. As always, this is not a scientific talk, and it focuses on breadth rather than depth, and neglects formality in favour of comprehensibility. Its audience will mainly consist of young students who will probably not have experience in mathematical logic for the largest part.

Anyway, feel free to drop a comment if you notice any mistakes.

A simply-typed lambda calculus

The lambda calculus is the prototype of all functional programming languages. We will give a short informal introduction to it.

A lambda term is inductively defined as

  • There are variables, as the ones known from programming languages. Every variable is a lambda-term. Thus, x is a lambda-term.
  • If a,b are lambda-terms, then the application ab is a lambda-term. It regards a as a function, and b as its first argument. A more usual notation would be a(b), but there are reasons why ab is preferred.
  • If a is a lambda-term and x is a variable, then the abstraction \lambda_xa is a lambda-term. The variable x will usually occur in a. The intuition is that \lambda_xa is the function that maps to some value b the term a[x:=b], where every occurence of x in a is replaced by b. For example, if a=cxd, then (\lambda_x (cxd))b = cbd. Another notation that is more common in some other parts of mathematics would be x\mapsto a.

The idea of the untyped lambda calculus is that everything is a function. Functions map functions to functions. Lambda terms are the only objects one talks about.

A problem that arises from the above definition is that you might have a lambda term \lambda_x (a\lambda_x (bx)). If we just replaced x, we would have (\lambda_x (a\lambda_x (bx)))c
\stackrel{?}{=} a\lambda_x bc, which is obviously not what we want. Actually, the inner x is bound by the inner \lambda, and will not be replaced. It is possible to define this formally, but for this talk we just forbid that this occurs.

Another problem arises when we substitute with a term that itself contains a variable. Say, we have ((\lambda_x\lambda_y (xay))(by))c
\stackrel{?}{=} ((bc)ac). This is obviously not intended, and we forbid this to happen, instead of handling it formally.

A variable is bound in a term, when it is captured by some lambda. It is free otherwise. A term is closed when it does not contain free variables. For example, \lambda_x y contains y freely, while \lambda_x(xx) is a closed term.

As for most functional programming languages, we can assign types to terms. In the simply-typed lambda calculus, there are atomic types A, B, C and arrow types, which are of the form \tau\rightarrow\rho, where \tau,\rho are types. Variables will be assigned a type. We write t^{A\rightarrow
B} to denote that the term t has the type A\rightarrow
B. The intuition is that this term is a function from objects of type A to objects of type B. We therefore define the typing rule (\lambda_{x^\tau}
t^{\rho})^{\tau\rightarrow\rho}. Furthermore, we only allow applications when the types fit, that is, (a^{\tau\rightarrow\rho}b^\tau)^\rho. Of course, we can leave out type annotations when they are clear or not important. This is a very simple type system, but most other type systems are based on this.

We now ask ourselves the question whether there is a closed term of type ((P\rightarrow Q)\rightarrow P)\rightarrow P. Well, of course there is none, otherwise we wouldn't ask. Kripke models are a way of answering this kind of questions.

Kripke Models for the simply-typed lambda calculus

We are given a rooted tree (T,<), where T is the set of nodes, and <\subseteq T\times T is the "is parent" relation for this tree. The tree need not be finite.

Now, for every atomic type A, let a monotone function b_A:T\rightarrow\{0,1\} be given, that is, x<y\rightarrow
b_A(x)\le b_A(y).

A triple (T, <, b) is called a Kripke model (well, at least a very simple form of a Kripke model). We now define a so-called forcing relation \Vdash\subseteq T\times\Theta (where \Theta is the set of simple types).

We define

  • u\Vdash A if and only if b_A(u)=1.
  • u\Vdash\rho\rightarrow\tau if and only if for all w\ge
  u, w\Vdash\rho implies w\Vdash\tau.

We say that (T, <, b)\models\tau ("(T, <, b) models \tau") if and only if \operatorname{root}(T)\Vdash\tau.

Lemma (Monotonicity): Let (T, <, b) be a Kripke model, and q,r\in T with q < r, and \rho be some type. If q\Vdash \rho, then r\Vdash \rho.

Proof: We prove this by structural induction on \rho. If \rho=A for some atomic type A, this holds by the monotonicity of b_A.

Now Let \rho = \tau \rightarrow \gamma, and assume q\Vdash
\tau \rightarrow \gamma, that is, for all t\ge q, t\Vdash \tau implies t\Vdash \gamma. But then the same holds especially forall t\ge r, hence r\Vdash \tau
\rightarrow \gamma.

Q.E.D.

Lemma (Correctness): Let a closed simply-typed lambda term t^\rho exist. Then every model \frak M satisfies {\frak M}\models\rho.

Proof: We slightly generalize the Lemma. We actually prove: If there exists a term t^\tau with free variables u_1^{\rho_1},
\ldots, u_n^{\rho_n}, then for every {\frak M}=(T, <, b), if we have {\frak M}\models\rho_k for all k<n, we also have {\frak M} \models \tau. We prove this by structural induction.

Our induction hypothesis is that the current entailment holds for all Kripke models. Notice that if p\in T, then (\{q\ge
p\mid q\in T\}, <, p) is (trivially) also a Kripke model.

Obviously, it holds for variables t=u_1^{\rho_1}.

Assume t=q^{\vartheta \rightarrow \tau}r^\vartheta, where q has the free variables u_1^{\rho_1},\ldots,u_m^{\rho_m} and r has the free variables u_{m+1}^{\rho_{m+1}},\ldots,u_n^{\rho_n} (notice that both sets of free variables need not be disjoint). Assume that for all k\le n we have {\frak M} \models \rho_k. Then by induction, we have {\frak M} \models \vartheta \rightarrow \tau and {\frak M} \models \vartheta. But {\frak M} \models \vartheta
\rightarrow \tau means that {\frak M} \models \vartheta implies {\frak M} \models \tau. Hence, the claim follows.

Assume \tau = \gamma \rightarrow \vartheta, and t=\lambda_{x^\gamma}r^\vartheta. Besides x, let r have the free variables u_1^{\rho_1},\ldots,u_n^{\rho_n}, and let again {\frak M} \models \rho_k for all k. We now have to show that q\Vdash \gamma,\rho_1,\ldots,\rho_k implies q\Vdash \vartheta for all q\in T. By our monotonicity lemma, {\frak M}\models \rho_k implies q\Vdash
\rho_k. By our induction hypothesis for r is that this holds for all Kripke models, especially for (\{t\ge q\mid t\in
T\}, <, q). Hence, for all q\ge b, q\Vdash \gamma implies q\Vdash \vartheta. Therefore, b\Vdash
\gamma\rightarrow\vartheta, and therefore, {\frak M}\models
\gamma\rightarrow\vartheta.

Q.E.D.

Now that we have shown correctness, we can answer our above question.

Lemma: There is no closed simply-typed lambda term of type ((P\rightarrow Q)\rightarrow P)\rightarrow P.

Proof: Consider the tree of words matching the regular expression 0*1?. Define b_P(0\ldots 0) = b_Q(0\ldots 0) = 0, b_Q(0\ldots 01) = 0 and b_P(0\ldots01)=1. That is:

                   ⋮
     P→1 Q→0 001   000 P→0 Q→0
                \ /
    P→1 Q→0 01   00 P→0 Q→0
              \ /
               0 P→0 Q→0

Let for some node u\Vdash (P\rightarrow Q)\rightarrow P. That means that u\Vdash P\rightarrow Q implies u\Vdash P. For every u=0\ldots 0, there is some u'=0\ldots 01>0\ldots 0 such that u'\Vdash P but u'\nVdash Q, and therefore u'\nVdash P\rightarrow Q, and therefore, u\nVdash
P\rightarrow Q. So for all u, u\nVdash P\rightarrow
Q, and therefore, for all u, u\Vdash (P\rightarrow
Q)\rightarrow P regardles of whether or not u\Vdash P. But then again, if u=0\ldots 0, we have u\nVdash P, so u\Vdash (P\rightarrow Q)\rightarrow P does not imply u\Vdash
P. Therefore, our Kripke model does not force ((P\rightarrow
Q)\rightarrow P)\rightarrow P. According to our correctness lemma, there cannot be a closed lambda term with this type.

Q.E.D.

This example is called the Peirce formula. You might wonder why this is interesting. In fact, it is just an introductory example. However, if you know functional programming languages, especially Scheme, you might be familiar with the concept of continuations. The type of call/cc is the Peirce formula. In Haskell's standard library, there is also a callCC routine with a similar type. However, this routine has a slightly different type: It uses the continuation translation of P (see the type constructor cont). The reason is that Haskell is pure, and in a pure setting, it is not possible to derive the Peirce formula, which can be shown in a similar way as we just did.

Intuitionistic Propositional Logic

So far, our lambda terms only have atomic types and arrow types. We add a few new type constructors:

  • If \tau and \rho are types, then so is \tau\wedge\rho, the product type (or pair type).
  • If \tau and \rho are types, then so is \tau\vee\rho, the sum type (or union type – it corresponds to unions in programming languages like C++).
  • There is a distinguished type \bot, the empty type or falsum. It must be impossible to create a closed term with this type.

There are additional axiomatic lambda terms for all types \rho,\tau,\vartheta:

  • Product type introduction \wedge_+^{\rho \rightarrow \tau
  \rightarrow \rho\wedge\tau}: This term gets two arguments and returns their pair.
  • Product type destruction \wedge_-^{\rho \wedge \tau \rightarrow
  (\rho \rightarrow \tau \rightarrow \vartheta) \rightarrow
  \vartheta}: This term gets a term of the product type, and a term that maps the two elements of this pair to some other type, and returns this type. For example, the projection to the second coordinate of a pair is \pi_2=(\lambda_{x^{\rho \wedge
  \tau}}(\wedge_-(\lambda_{a^\rho}\lambda_{b^\tau}b)))^{\rho \wedge
  \tau \rightarrow\tau}.
  • Sum type introduction \vee_{+,l}^{\rho\rightarrow\tau\vee\rho} and \vee_{+,r}^{\tau\rightarrow\tau\vee\rho}.
  • Sum type destruction \vee_-^{\rho\vee\tau \rightarrow
  (\rho\rightarrow\vartheta) \rightarrow (\tau\rightarrow\vartheta)
  \rightarrow \vartheta}. This corresponds to if … then … else ….
  • Ex falso quodlibet: \bot_-^{\bot\rightarrow\rho}.

Except for the empty type, all of these are concepts that occur similarily in most modern programming languages, maybe written in a somewhat less intuitive way. We can give terms that do simple stuff like

(\lambda_{x^{(\rho\wedge\tau)\vee\vartheta}}(\vee_-x(\lambda_{t^{\rho\wedge\tau}}
\wedge_- t (\lambda_{y^\rho}\lambda_{z^\tau}
(\wedge_+(\vee_{+,r}y)(\vee_{+,r}z)))) (\lambda_{t^\vartheta}(\wedge_+
(\vee_{+,l}t) (\vee_{+,l}t)))))^{(\rho\wedge\tau)\vee\vartheta
\rightarrow (\rho\vee\vartheta) \wedge (\tau\vee\vartheta)}

From that perspective, empty types make no sense: They have no computational content. However, we can as well interpret these types as predicates: The atomic types become atomic predicates, \rho\vee\tau will mean \rho "or" \tau, \rho\rightarrow\tau will mean \rho "implies" \tau, and \rho\wedge\tau will mean \rho "and" \tau. In this interpretation, \bot is the predicate that is always wrong, and \rho\rightarrow\bot means "not \rho". We therefore define the abbrevitation \lnot\rho :=
\rho\rightarrow\bot. In this interpretation, closed lambda terms become formal proofs. For example, we can prove the introduction of contraposition:

(\lambda_{x^\rho\rightarrow\tau} \lambda_{y^{\lnot\tau}}
\lambda_{z^\rho} (y(xz))^\bot)^{(\rho\rightarrow\tau) \rightarrow
\lnot\tau \rightarrow \lnot\rho}

However, the other direction cannot be derived. We can show this with a slightly extended definition of Kripke Models: We have to add interpretations for \bot, \wedge and \vee:

  • not u\Vdash\bot for all u\in T.
  • u\Vdash \rho\wedge\tau if and only if u\Vdash\rho and u\Vdash\tau.
  • u\Vdash \rho\vee\tau if and only if u\Vdash\rho or u\Vdash\tau.

The above proof of correctness can be extended, but it is more technical. We will therefore not do this here. In this system, the Peirce formula is not derivable as well, as can be shown by the same Kripke model as above. Also, in the same Kripke model as above, contraposition and the law of excluded middle do not hold (exercise).

First-order Predicate Logic

We now extend our type system with objects and relations. We define a special type \omega of objects. A relation is a "type with object arguments". For example, the set theoretic \in-relation is a relation with two object arguments. Remember, we are in metatheory, the object type is not specified and must be defined by axioms.

To get the full mathematical strength, we need to introduce quantors:

  • \forall_x \tau means "for all objects x^\omega, \tau holds".
  • \exists_x \tau means "there exists an object x^\omega, such that \tau holds".

The type \forall_x \tau is similar to \omega\rightarrow\tau, and \exists_x \tau is similar to \omega\wedge \tau, except that \tau may depend on x.

Introducing a universal quantifier is therefore done by abstraction (\lambda_{x^\omega}\tau)^{\forall_x \tau}, and it is eliminated by application (t^{\forall_x \tau}u^\omega)^{\tau[x:=u]}.

The introduction of an existential quantifier reflects sum introduction: \exists_+^{\forall_x(\tau \rightarrow
\exists_x\tau)}. Its elimination is \exists_-^{\exists_x\tau
\rightarrow \forall_x(\tau \rightarrow \rho) \rightarrow \rho} where x must not occur freely in \rho.

We now have the full expressivity of first-order predicate logic. For example, we can prove \exists_x\tau \rightarrow \lnot \forall_x
\lnot \tau:

\lambda_{y^{\exists_x\tau}}(\exists_-y
(\lambda_{u^\omega}\lambda_{w^\tau}
(t^{\forall_x\lnot\tau}uw)^\bot))

However, as before the type system we just defined only allows for intuitionistic logic. The advantage of intuitionistic logic is that if you prove the existence of something, you get an algorithm that calculates it. The disadvantage is that things that cannot be calculated cannot be shown to exist. The following is the so-called Drinker formula (or Drinker problem, or sometimes Drinker paradox):

\exists x(Dx \rightarrow \forall_y Dy)

It is not derivable in intuitionistic logic. We can show this using a Kripke model, but we first have to extend our semantic:

Firstly, every node u\in T gets a domain D_u of objects, such that t\le u\rightarrow D_t\subseteq D_u. The \Vdash relation now depends on an assignment \eta, which is a function from object variables x^\omega to elements of the domain of the corresponding node. For every relation R with n arguments, now, b_R(u) becomes a function b_R(u) : D_u^n \rightarrow \{0,1\}. This corresponds to our former function in the sense that predicates become relations with zero arguments. We then say u\Vdash
  R\,x_1\,\ldots\,x_n[\eta] if and only if b_R(u)(\eta(x_1),\ldots,\eta(x_n)) = 1. For \circ\in\{\wedge, \vee, \rightarrow\}, the definition is similar to our former definition, where u\Vdash \rho\circ\tau
  [\eta] decomposes to u\Vdash \rho[\eta] and u\Vdash
  \tau [\eta].

For the interpretation of the quantors, we must define \eta[x:=o], which is \eta, except for the variable x, where it is o:

\eta[x:=o] (y) = \left\{
\begin{array}{cl}
o & \mbox{ for } y=x \\
\eta(y) & \mbox{ otherwise}
\end{array}\right.

  • u\Vdash (\exists_x A)[\eta] if and only if there exists an a\in D_u such that u\Vdash A[\eta[x:=a]].
  • u\Vdash (\forall_x A)[\eta] if for all w\ge u, for all a\in D_w, w\Vdash A[\eta[x:=a]].

We again omit the correctness proof. But we show that the Drinker formula cannot be derivable. We set T=\mathbb{N} with the canonical ordering. We set D_u = \{ k \mid k \le u \} b_D(u)(u) = 0 and b_D(u)(k) = 1 for k < u. (This model is taken from here.)

⋮
3 D={0,1,2,3} b={0→1,1→1,2→1,3→0}
|
2 D={0,1,2} b={0→1,1→1,2→0}
|
1 D={0,1} b={0→1,1→0}
|
0 D={0} b={0→0}

We set \eta(x)=x. Obviously, if 0\Vdash \exists_x (Dx
\rightarrow \forall_y Dy)[\eta], then 0\Vdash D0\rightarrow
\forall_y Dy[\eta]. However, 1\Vdash D0[\eta], but 1\nVdash \forall_yDy[\eta]. Therefore, 0\nVdash\exists_x (Dx
\rightarrow \forall_y Dy)[\eta].

Similar Concepts

Beth Models

A similar concept for first-order logic you might see is the concept of Beth models. Their definition is slightly different but similar, and their properties are similar. See here for example.

Presheaf Models

There is a generalization of Kripke models, the so-called presheaf models. Here, the domains are given by a presheaf D:C^{\operatorname{op}} \rightarrow \operatorname{Set} on a category C.

For every relation R with n arguments and every object o from C we have an interpretation b_R(o)\subseteq
D(o)^n such that if there is a morphism f:o\rightarrow u in C, we have b_R(u)\subseteq b_R(o). An assignment is, as before, a map from variable symbols into objects. The definition is now similar to the one for Kripke models: For relations, we define o\Vdash Rx_1\ldots x_n[\eta] :\Leftrightarrow
(\eta(x_1),\ldots,\eta(x_n)) \in b_R(o). For implications we define o\Vdash (\tau\rightarrow\rho) [\eta] if and only if for every morphism f:o\rightarrow u, u\Vdash\tau[\eta] implies u\Vdash\rho[\eta].

It is also possible to define this (even more general) in terms of morphisms between classical models instead of explicitly giving interpretations for all relations and domains. You might also find a more intricate version of implication for arrow types, a version that includes computational content, as an interpretation of type theory.

A Kripke model is a special case where C is a poset.

Semantic Forcing in Set Theory

The following part is kept very short and only for older audiences. Paul Joseph Cohen proved that the continuum hypothesis is independent of Zermelo-Fraenckel Set Theory (ZFC). He uses a concept inspired by Kripke models. We will only look at the parts related to Kripke models here. For an explanation of the proof see here, here and here.

This is a relative consistency result, so we can assume that there is a classical transitive countable model M of ZFC. (It can be proved that the existence of such a set is unprovable, by Gödel's second incompleteness theorem.)

That is a set M for which the ZFC axioms hold when relativizing all quantors to it (that is, replacing \forall_x with \forall_{x\in M}, etc). Transitivity means that x\in
M\rightarrow y\in x\rightarrow y\in M. This can be proved by the Löwenheim-Skolem theorem and the Mostowsky collapse lemma.

We write M\models\varphi if the formula \varphi holds in M when relativizing all quantors to it. Notice that we already used the character \models for Kripke models. It is generally used for semantic entailment. M is a so-called classical model, while Kripke models are intuitionistic models.

A forcing poset is a triple (\mathbb{P},\le,1) completely defined in M where \le is a preorder with the largest element 1 and the splitting condition that for all p\in
P there exist q,r\in P such that there is no s\le
q,r, that is, they form no diamond, that is, the ordering is similar to a tree.

A \mathbb{P}-name is, roughly speaking, a set with an element of \mathbb{P} attached, in which every element itself is a \mathbb{P}-name. That is, for example, (\{(\emptyset,p)\},q) for p,q\in \mathbb{P}.

We define \operatorname{val}(u,G) = \{\operatorname{val}(v, G)\mid
\exists_{p\in G}(v,p)\in u\}, the valuation map.

We now define

M[G] = \{\operatorname{val}(u,G)\mid u\in M^{(\mathbb{P})}\}

where M^{(\mathbb{P})} is the set of \mathbb{P}-names in M. We now define a forcing relation on elements of \mathbb{P}.

  • p\Vdash a\in b :\Leftrightarrow \forall_{q\le p\in \mathbb{P}} \exists_{r\le q\in\mathbb{P}, c\in M^{(\mathbb{P})}, s\ge r\in\mathbb{P}}((c,s)\in b\wedge r\Vdash a=c).
  • p\Vdash a=b :\Leftrightarrow \forall_{q\le p\in \mathbb{P},c\in M^{(\mathbb{P})}}((q\Vdash c\in a)\Leftrightarrow (q\Vdash c\in b)).
  • p\Vdash \lnot\varphi :\Leftrightarrow \lnot\exists_{q\le p\in\mathbb{P}} q\Vdash\varphi
  • p\Vdash \varphi\wedge\psi :\Leftrightarrow (p\Vdash\varphi) \wedge (p\Vdash\psi).
  • p\Vdash \forall_x\varphi(x) :\Leftrightarrow \forall_{x \in M^{(\mathbb{P})}} p\Vdash\varphi(x).

One can show that if G satisfies some conditions, M[G]\models\varphi holds there is a p\in G such that p\Vdash\varphi.