Trying to suggest to Lispers a way to omit parentheses is
like trying to suggest to Haskellers a way to omit monads.

Kripke Models
Fri, 23 Sep 2016 18:35:42 GMT

And again, a talk for the best mathematics club of the multiverse. This time, I thought I'd have a slightly more abstract topic, namely Kripke models. As always, this is not a scientific talk, and it focuses on breadth rather than depth, and neglects formality in favour of comprehensibility. Its audience will mainly consist of young students who will probably not have experience in mathematical logic for the largest part.

Anyway, feel free to drop a comment if you notice any mistakes.

A simply-typed lambda calculus

The lambda calculus is the prototype of all functional programming languages. We will give a short informal introduction to it.

A lambda term is inductively defined as

There are variables, as the ones known from programming languages. Every variable is a lambda-term. Thus, $x$ is a lambda-term.
If $a,b$ are lambda-terms, then the application $ab$ is a lambda-term. It regards $a$ as a function, and $b$ as its first argument. A more usual notation would be $a(b)$ , but there are reasons why $ab$ is preferred.
If $a$ is a lambda-term and $x$ is a variable, then the abstraction $\lambda_xa$ is a lambda-term. The variable $x$ will usually occur in $a$ . The intuition is that $\lambda_xa$ is the function that maps to some value $b$ the term $a[x:=b]$ , where every occurence of $x$ in $a$ is replaced by $b$ . For example, if $a=cxd$ , then $(\lambda_x (cxd))b = cbd$ . Another notation that is more common in some other parts of mathematics would be $x\mapsto a$ .

The idea of the untyped lambda calculus is that everything is a function. Functions map functions to functions. Lambda terms are the only objects one talks about.

A problem that arises from the above definition is that you might have a lambda term $\lambda_x (a\lambda_x (bx))$ . If we just replaced $x$ , we would have $(\lambda_x (a\lambda_x (bx)))c \stackrel{?}{=} a\lambda_x bc$ , which is obviously not what we want. Actually, the inner $x$ is bound by the inner $\lambda$ , and will not be replaced. It is possible to define this formally, but for this talk we just forbid that this occurs.

Another problem arises when we substitute with a term that itself contains a variable. Say, we have $((\lambda_x\lambda_y (xay))(by))c \stackrel{?}{=} ((bc)ac)$ . This is obviously not intended, and we forbid this to happen, instead of handling it formally.

A variable is bound in a term, when it is captured by some lambda. It is free otherwise. A term is closed when it does not contain free variables. For example, $\lambda_x y$ contains $y$ freely, while $\lambda_x(xx)$ is a closed term.

As for most functional programming languages, we can assign types to terms. In the simply-typed lambda calculus, there are atomic types $A, B, C$ and arrow types, which are of the form $\tau\rightarrow\rho$ , where $\tau,\rho$ are types. Variables will be assigned a type. We write $t^{A\rightarrow B}$ to denote that the term $t$ has the type $A\rightarrow B$ . The intuition is that this term is a function from objects of type $A$ to objects of type $B$ . We therefore define the typing rule $(\lambda_{x^\tau} t^{\rho})^{\tau\rightarrow\rho}$ . Furthermore, we only allow applications when the types fit, that is, $(a^{\tau\rightarrow\rho}b^\tau)^\rho$ . Of course, we can leave out type annotations when they are clear or not important. This is a very simple type system, but most other type systems are based on this.

We now ask ourselves the question whether there is a closed term of type $((P\rightarrow Q)\rightarrow P)\rightarrow P$ . Well, of course there is none, otherwise we wouldn't ask. Kripke models are a way of answering this kind of questions.

Kripke Models for the simply-typed lambda calculus

We are given a rooted tree $(T,<)$ , where $T$ is the set of nodes, and $<\subseteq T\times T$ is the "is parent" relation for this tree. The tree need not be finite.

Now, for every atomic type $A$ , let a monotone function $b_A:T\rightarrow\{0,1\}$ be given, that is, $x<y\rightarrow b_A(x)\le b_A(y)$ .

A triple $(T, <, b)$ is called a Kripke model (well, at least a very simple form of a Kripke model). We now define a so-called forcing relation $\Vdash\subseteq T\times\Theta$ (where $\Theta$ is the set of simple types).

We define

$u\Vdash A$ if and only if $b_A(u)=1$ .
$u\Vdash\rho\rightarrow\tau$ if and only if for all $w\ge u$ , $w\Vdash\rho$ implies $w\Vdash\tau$ .

We say that $(T, <, b)\models\tau$ (" $(T, <, b)$ models $\tau$ ") if and only if $\operatorname{root}(T)\Vdash\tau$ .

Lemma (Monotonicity): Let $(T, <, b)$ be a Kripke model, and $q,r\in T$ with $q < r$ , and $\rho$ be some type. If $q\Vdash \rho$ , then $r\Vdash \rho$ .

Proof: We prove this by structural induction on $\rho$ . If $\rho=A$ for some atomic type $A$ , this holds by the monotonicity of $b_A$ .

Now Let $\rho = \tau \rightarrow \gamma$ , and assume $q\Vdash \tau \rightarrow \gamma$ , that is, for all $t\ge q$ , $t\Vdash \tau$ implies $t\Vdash \gamma$ . But then the same holds especially forall $t\ge r$ , hence $r\Vdash \tau \rightarrow \gamma$ .

Q.E.D.

Lemma (Correctness): Let a closed simply-typed lambda term $t^\rho$ exist. Then every model $\frak M$ satisfies ${\frak M}\models\rho$ .

Proof: We slightly generalize the Lemma. We actually prove: If there exists a term $t^\tau$ with free variables $u_1^{\rho_1}, \ldots, u_n^{\rho_n}$ , then for every ${\frak M}=(T, <, b)$ , if we have ${\frak M}\models\rho_k$ for all $k<n$ , we also have ${\frak M} \models \tau$ . We prove this by structural induction.

Our induction hypothesis is that the current entailment holds for all Kripke models. Notice that if $p\in T$ , then $(\{q\ge p\mid q\in T\}, <, p)$ is (trivially) also a Kripke model.

Obviously, it holds for variables $t=u_1^{\rho_1}$ .

Assume $t=q^{\vartheta \rightarrow \tau}r^\vartheta$ , where $q$ has the free variables $u_1^{\rho_1},\ldots,u_m^{\rho_m}$ and $r$ has the free variables $u_{m+1}^{\rho_{m+1}},\ldots,u_n^{\rho_n}$ (notice that both sets of free variables need not be disjoint). Assume that for all $k\le n$ we have ${\frak M} \models \rho_k$ . Then by induction, we have ${\frak M} \models \vartheta \rightarrow \tau$ and ${\frak M} \models \vartheta$ . But ${\frak M} \models \vartheta \rightarrow \tau$ means that ${\frak M} \models \vartheta$ implies ${\frak M} \models \tau$ . Hence, the claim follows.

Assume $\tau = \gamma \rightarrow \vartheta$ , and $t=\lambda_{x^\gamma}r^\vartheta$ . Besides $x$ , let $r$ have the free variables $u_1^{\rho_1},\ldots,u_n^{\rho_n}$ , and let again ${\frak M} \models \rho_k$ for all $k$ . We now have to show that $q\Vdash \gamma,\rho_1,\ldots,\rho_k$ implies $q\Vdash \vartheta$ for all $q\in T$ . By our monotonicity lemma, ${\frak M}\models \rho_k$ implies $q\Vdash \rho_k$ . By our induction hypothesis for $r$ is that this holds for all Kripke models, especially for $(\{t\ge q\mid t\in T\}, <, q)$ . Hence, for all $q\ge b$ , $q\Vdash \gamma$ implies $q\Vdash \vartheta$ . Therefore, $b\Vdash \gamma\rightarrow\vartheta$ , and therefore, ${\frak M}\models \gamma\rightarrow\vartheta$ .

Q.E.D.

Now that we have shown correctness, we can answer our above question.

Lemma: There is no closed simply-typed lambda term of type $((P\rightarrow Q)\rightarrow P)\rightarrow P$ .

Proof: Consider the tree of words matching the regular expression 0*1?. Define $b_P(0\ldots 0) = b_Q(0\ldots 0) = 0$ , $b_Q(0\ldots 01) = 0$ and $b_P(0\ldots01)=1$ . That is:

                   ⋮
     P→1 Q→0 001   000 P→0 Q→0
                \ /
    P→1 Q→0 01   00 P→0 Q→0
              \ /
               0 P→0 Q→0

Let for some node $u\Vdash (P\rightarrow Q)\rightarrow P$ . That means that $u\Vdash P\rightarrow Q$ implies $u\Vdash P$ . For every $u=0\ldots 0$ , there is some $u'=0\ldots 01>0\ldots 0$ such that $u'\Vdash P$ but $u'\nVdash Q$ , and therefore $u'\nVdash P\rightarrow Q$ , and therefore, $u\nVdash P\rightarrow Q$ . So for all $u$ , $u\nVdash P\rightarrow Q$ , and therefore, for all $u$ , $u\Vdash (P\rightarrow Q)\rightarrow P$ regardles of whether or not $u\Vdash P$ . But then again, if $u=0\ldots 0$ , we have $u\nVdash P$ , so $u\Vdash (P\rightarrow Q)\rightarrow P$ does not imply $u\Vdash P$ . Therefore, our Kripke model does not force $((P\rightarrow Q)\rightarrow P)\rightarrow P$ . According to our correctness lemma, there cannot be a closed lambda term with this type.

Q.E.D.

This example is called the Peirce formula. You might wonder why this is interesting. In fact, it is just an introductory example. However, if you know functional programming languages, especially Scheme, you might be familiar with the concept of continuations. The type of call/cc is the Peirce formula. In Haskell's standard library, there is also a callCC routine with a similar type. However, this routine has a slightly different type: It uses the continuation translation of $P$ (see the type constructor cont). The reason is that Haskell is pure, and in a pure setting, it is not possible to derive the Peirce formula, which can be shown in a similar way as we just did.

Intuitionistic Propositional Logic

So far, our lambda terms only have atomic types and arrow types. We add a few new type constructors:

If $\tau$ and $\rho$ are types, then so is $\tau\wedge\rho$ , the product type (or pair type).
If $\tau$ and $\rho$ are types, then so is $\tau\vee\rho$ , the sum type (or union type – it corresponds to unions in programming languages like C++).
There is a distinguished type $\bot$ , the empty type or falsum. It must be impossible to create a closed term with this type.

There are additional axiomatic lambda terms for all types $\rho,\tau,\vartheta$ :

Product type introduction $\wedge_+^{\rho \rightarrow \tau \rightarrow \rho\wedge\tau}$ : This term gets two arguments and returns their pair.
Product type destruction $\wedge_-^{\rho \wedge \tau \rightarrow (\rho \rightarrow \tau \rightarrow \vartheta) \rightarrow \vartheta}$ : This term gets a term of the product type, and a term that maps the two elements of this pair to some other type, and returns this type. For example, the projection to the second coordinate of a pair is $\pi_2=(\lambda_{x^{\rho \wedge \tau}}(\wedge_-(\lambda_{a^\rho}\lambda_{b^\tau}b)))^{\rho \wedge \tau \rightarrow\tau}$ .
Sum type introduction $\vee_{+,l}^{\rho\rightarrow\tau\vee\rho}$ and $\vee_{+,r}^{\tau\rightarrow\tau\vee\rho}$ .
Sum type destruction $\vee_-^{\rho\vee\tau \rightarrow (\rho\rightarrow\vartheta) \rightarrow (\tau\rightarrow\vartheta) \rightarrow \vartheta}$ . This corresponds to if … then … else ….
Ex falso quodlibet: $\bot_-^{\bot\rightarrow\rho}$ .

Except for the empty type, all of these are concepts that occur similarily in most modern programming languages, maybe written in a somewhat less intuitive way. We can give terms that do simple stuff like

$(\lambda_{x^{(\rho\wedge\tau)\vee\vartheta}}(\vee_-x(\lambda_{t^{\rho\wedge\tau}} \wedge_- t (\lambda_{y^\rho}\lambda_{z^\tau} (\wedge_+(\vee_{+,r}y)(\vee_{+,r}z)))) (\lambda_{t^\vartheta}(\wedge_+ (\vee_{+,l}t) (\vee_{+,l}t)))))^{(\rho\wedge\tau)\vee\vartheta \rightarrow (\rho\vee\vartheta) \wedge (\tau\vee\vartheta)}$

From that perspective, empty types make no sense: They have no computational content. However, we can as well interpret these types as predicates: The atomic types become atomic predicates, $\rho\vee\tau$ will mean $\rho$ "or" $\tau$ , $\rho\rightarrow\tau$ will mean $\rho$ "implies" $\tau$ , and $\rho\wedge\tau$ will mean $\rho$ "and" $\tau$ . In this interpretation, $\bot$ is the predicate that is always wrong, and $\rho\rightarrow\bot$ means "not $\rho$ ". We therefore define the abbrevitation $\lnot\rho := \rho\rightarrow\bot$ . In this interpretation, closed lambda terms become formal proofs. For example, we can prove the introduction of contraposition:

$(\lambda_{x^\rho\rightarrow\tau} \lambda_{y^{\lnot\tau}} \lambda_{z^\rho} (y(xz))^\bot)^{(\rho\rightarrow\tau) \rightarrow \lnot\tau \rightarrow \lnot\rho}$

However, the other direction cannot be derived. We can show this with a slightly extended definition of Kripke Models: We have to add interpretations for $\bot$ , $\wedge$ and $\vee$ :

not $u\Vdash\bot$ for all $u\in T$ .
$u\Vdash \rho\wedge\tau$ if and only if $u\Vdash\rho$ and $u\Vdash\tau$ .
$u\Vdash \rho\vee\tau$ if and only if $u\Vdash\rho$ or $u\Vdash\tau$ .

The above proof of correctness can be extended, but it is more technical. We will therefore not do this here. In this system, the Peirce formula is not derivable as well, as can be shown by the same Kripke model as above. Also, in the same Kripke model as above, contraposition and the law of excluded middle do not hold (exercise).

First-order Predicate Logic

We now extend our type system with objects and relations. We define a special type $\omega$ of objects. A relation is a "type with object arguments". For example, the set theoretic $\in$ -relation is a relation with two object arguments. Remember, we are in metatheory, the object type is not specified and must be defined by axioms.

To get the full mathematical strength, we need to introduce quantors:

$\forall_x \tau$ means "for all objects $x^\omega$ , $\tau$ holds".
$\exists_x \tau$ means "there exists an object $x^\omega$ , such that $\tau$ holds".

The type $\forall_x \tau$ is similar to $\omega\rightarrow\tau$ , and $\exists_x \tau$ is similar to $\omega\wedge \tau$ , except that $\tau$ may depend on $x$ .

Introducing a universal quantifier is therefore done by abstraction $(\lambda_{x^\omega}\tau)^{\forall_x \tau}$ , and it is eliminated by application $(t^{\forall_x \tau}u^\omega)^{\tau[x:=u]}$ .

The introduction of an existential quantifier reflects sum introduction: $\exists_+^{\forall_x(\tau \rightarrow \exists_x\tau)}$ . Its elimination is $\exists_-^{\exists_x\tau \rightarrow \forall_x(\tau \rightarrow \rho) \rightarrow \rho}$ where $x$ must not occur freely in $\rho$ .

We now have the full expressivity of first-order predicate logic. For example, we can prove $\exists_x\tau \rightarrow \lnot \forall_x \lnot \tau$ :

$\lambda_{y^{\exists_x\tau}}(\exists_-y (\lambda_{u^\omega}\lambda_{w^\tau} (t^{\forall_x\lnot\tau}uw)^\bot))$

However, as before the type system we just defined only allows for intuitionistic logic. The advantage of intuitionistic logic is that if you prove the existence of something, you get an algorithm that calculates it. The disadvantage is that things that cannot be calculated cannot be shown to exist. The following is the so-called Drinker formula (or Drinker problem, or sometimes Drinker paradox):

$\exists x(Dx \rightarrow \forall_y Dy)$

It is not derivable in intuitionistic logic. We can show this using a Kripke model, but we first have to extend our semantic:

Firstly, every node $u\in T$ gets a domain $D_u$ of objects, such that $t\le u\rightarrow D_t\subseteq D_u$ . The $\Vdash$ relation now depends on an assignment $\eta$ , which is a function from object variables $x^\omega$ to elements of the domain of the corresponding node. For every relation $R$ with $n$ arguments, now, $b_R(u)$ becomes a function $b_R(u) : D_u^n \rightarrow \{0,1\}$ . This corresponds to our former function in the sense that predicates become relations with zero arguments. We then say $u\Vdash R\,x_1\,\ldots\,x_n[\eta]$ if and only if $b_R(u)(\eta(x_1),\ldots,\eta(x_n)) = 1$ . For $\circ\in\{\wedge, \vee, \rightarrow\}$ , the definition is similar to our former definition, where $u\Vdash \rho\circ\tau [\eta]$ decomposes to $u\Vdash \rho[\eta]$ and $u\Vdash \tau [\eta]$ .

For the interpretation of the quantors, we must define $\eta[x:=o]$ , which is $\eta$ , except for the variable $x$ , where it is $o$ :

$\eta[x:=o] (y) = \left\{ \begin{array}{cl} o & \mbox{ for } y=x \\ \eta(y) & \mbox{ otherwise} \end{array}\right.$

$u\Vdash (\exists_x A)[\eta]$ if and only if there exists an $a\in D_u$ such that $u\Vdash A[\eta[x:=a]]$ .
$u\Vdash (\forall_x A)[\eta]$ if for all $w\ge u$ , for all $a\in D_w$ , $w\Vdash A[\eta[x:=a]]$ .

We again omit the correctness proof. But we show that the Drinker formula cannot be derivable. We set $T=\mathbb{N}$ with the canonical ordering. We set $D_u = \{ k \mid k \le u \}$ $b_D(u)(u) = 0$ and $b_D(u)(k) = 1$ for $k < u$ . (This model is taken from here.)

⋮
3 D={0,1,2,3} b={0→1,1→1,2→1,3→0}
|
2 D={0,1,2} b={0→1,1→1,2→0}
|
1 D={0,1} b={0→1,1→0}
|
0 D={0} b={0→0}

We set $\eta(x)=x$ . Obviously, if $0\Vdash \exists_x (Dx \rightarrow \forall_y Dy)[\eta]$ , then $0\Vdash D0\rightarrow \forall_y Dy[\eta]$ . However, $1\Vdash D0[\eta]$ , but $1\nVdash \forall_yDy[\eta]$ . Therefore, $0\nVdash\exists_x (Dx \rightarrow \forall_y Dy)[\eta]$ .

Similar Concepts

Beth Models

A similar concept for first-order logic you might see is the concept of Beth models. Their definition is slightly different but similar, and their properties are similar. See here for example.

Presheaf Models

There is a generalization of Kripke models, the so-called presheaf models. Here, the domains are given by a presheaf $D:C^{\operatorname{op}} \rightarrow \operatorname{Set}$ on a category $C$ .

For every relation $R$ with $n$ arguments and every object $o$ from $C$ we have an interpretation $b_R(o)\subseteq D(o)^n$ such that if there is a morphism $f:o\rightarrow u$ in $C$ , we have $b_R(u)\subseteq b_R(o)$ . An assignment is, as before, a map from variable symbols into objects. The definition is now similar to the one for Kripke models: For relations, we define $o\Vdash Rx_1\ldots x_n[\eta] :\Leftrightarrow (\eta(x_1),\ldots,\eta(x_n)) \in b_R(o)$ . For implications we define $o\Vdash (\tau\rightarrow\rho) [\eta]$ if and only if for every morphism $f:o\rightarrow u$ , $u\Vdash\tau[\eta]$ implies $u\Vdash\rho[\eta]$ .

It is also possible to define this (even more general) in terms of morphisms between classical models instead of explicitly giving interpretations for all relations and domains. You might also find a more intricate version of implication for arrow types, a version that includes computational content, as an interpretation of type theory.

A Kripke model is a special case where $C$ is a poset.

Semantic Forcing in Set Theory

The following part is kept very short and only for older audiences. Paul Joseph Cohen proved that the continuum hypothesis is independent of Zermelo-Fraenckel Set Theory (ZFC). He uses a concept inspired by Kripke models. We will only look at the parts related to Kripke models here. For an explanation of the proof see here, here and here.

This is a relative consistency result, so we can assume that there is a classical transitive countable model $M$ of ZFC. (It can be proved that the existence of such a set is unprovable, by Gödel's second incompleteness theorem.)

That is a set $M$ for which the ZFC axioms hold when relativizing all quantors to it (that is, replacing $\forall_x$ with $\forall_{x\in M}$ , etc). Transitivity means that $x\in M\rightarrow y\in x\rightarrow y\in M$ . This can be proved by the Löwenheim-Skolem theorem and the Mostowsky collapse lemma.

We write $M\models\varphi$ if the formula $\varphi$ holds in $M$ when relativizing all quantors to it. Notice that we already used the character $\models$ for Kripke models. It is generally used for semantic entailment. $M$ is a so-called classical model, while Kripke models are intuitionistic models.

A forcing poset is a triple $(\mathbb{P},\le,1)$ completely defined in $M$ where $\le$ is a preorder with the largest element $1$ and the splitting condition that for all $p\in P$ there exist $q,r\in P$ such that there is no $s\le q,r$ , that is, they form no diamond, that is, the ordering is similar to a tree.

A $\mathbb{P}$ -name is, roughly speaking, a set with an element of $\mathbb{P}$ attached, in which every element itself is a $\mathbb{P}$ -name. That is, for example, $(\{(\emptyset,p)\},q)$ for $p,q\in \mathbb{P}$ .

We define $\operatorname{val}(u,G) = \{\operatorname{val}(v, G)\mid \exists_{p\in G}(v,p)\in u\}$ , the valuation map.

We now define

$M[G] = \{\operatorname{val}(u,G)\mid u\in M^{(\mathbb{P})}\}$

where $M^{(\mathbb{P})}$ is the set of $\mathbb{P}$ -names in $M$ . We now define a forcing relation on elements of $\mathbb{P}$ .

$p\Vdash a\in b$ $:\Leftrightarrow$ $\forall_{q\le p\in \mathbb{P}} \exists_{r\le q\in\mathbb{P}, c\in M^{(\mathbb{P})}, s\ge r\in\mathbb{P}}((c,s)\in b\wedge r\Vdash a=c)$ .
$p\Vdash a=b$ $:\Leftrightarrow$ $\forall_{q\le p\in \mathbb{P},c\in M^{(\mathbb{P})}}((q\Vdash c\in a)\Leftrightarrow (q\Vdash c\in b))$ .
$p\Vdash \lnot\varphi$ $:\Leftrightarrow$ $\lnot\exists_{q\le p\in\mathbb{P}} q\Vdash\varphi$
$p\Vdash \varphi\wedge\psi$ $:\Leftrightarrow$ $(p\Vdash\varphi) \wedge (p\Vdash\psi)$ .
$p\Vdash \forall_x\varphi(x)$ $:\Leftrightarrow$ $\forall_{x \in M^{(\mathbb{P})}} p\Vdash\varphi(x)$ .