Dude, close the window. The wifi is getting out.

Again, I gave a talk for the best mathematics club of the multiverse, this time for the pupils who do not yet study mathematics. I thought it would be nice to have something that is seldom taught, a proper introduction to \mathbb{R} "from scratch". This is what I tried to do. Keep in mind that this has very low requirements and is not thought for mathematicians, but for people who may become mathematicians. Since I still do not have anything better than MathJax, JavaScript is needed to render the formulae, sorry about that. Here is a script:

Inroduction

\mathbb{R} is probably the most known object in mathematics. However, only few non-mathematicians know how it is actually defined. Usually, they are seen as some "obvious", "natural" object. However, there are several fallacies in the step from \mathbb{Q} to \mathbb{R}, like the fact that it is not algebraically closed (\sqrt{-1}\not\in\mathbb{R}), and that 0,\overline{9}=1. To avoid such fallacies, we should first give some axioms that we want the reals to satisfy:
  • \mathbb{R} should be a field:
    • a+b=b+a, a\cdot b=b\cdot a, (a+b)+c=a+(b+c), (a\cdot b)\cdot c = a\cdot (b\cdot c), a\cdot(b+c)=a\cdot b + a\cdot c
    • There exist additive and multiplicative neutral elements 0 and 1, such that a+0=a\cdot 1=a
    • There exists an additive inverse -a for every a, such that a+(-a)=0. There exists a multiplicative inverse a^{-1} for every a\neq 0 such that a\cdot a^{-1}=1
  • \le on \mathbb{R} should be a linear partial ordering relation:
    • a\le a
    • a\le b and b\le a implies a=b
    • a\le b and b\le c implies a\le c
  • \mathbb{R} should be an ordered field regarding \le:
    • a\le b implies a+c\le b+c
    • 0\le a and 0\le b implies 0\le a\cdot b
  • \le is complete over \mathbb{R}:
    • If a \le x for all x\in I\subseteq\mathbb{R}, then there exists a maximal a'\ge a with a'\le x for all x\in I.
    • If a \ge x for all x\in 
I\subseteq\mathbb{R}, then there exists a minimal a'\le a with a'\ge x for all x\in I.

It can be shown that these axioms are sufficient to classify a unique structure "up to isomorphy", that is, two such structures are equivalent up to "renaming" of the elements. However, this just says that if such a structure exists, it is unique. It does not show that it actually exists, which is what we will approach.


Equivalence Relations

Equivalence relations are an extremely important concept in mathematics. They show up in almost every mathematical topic, like logic, algebra, topology. It is important to understand this concept to define \mathbb{R}, but it is also worthwile for its own sake.
An eqivalence relation is a binary relation that usually somehow resembles similarities between objects. Formally, a relation \equiv on a set S is a relation which obeys the following three axioms:

  • Reflexivity: For all a\in S we have a\equiv a
  • Symmetry: For all a,b\in S we have a\equiv b \Leftrightarrow b\equiv a
  • Transitivity: For all a,b,c\in S if we have a\equiv b and b\equiv c then we also have a\equiv c.

For every a\in S we can define its equivalence class [a]_\equiv=\{x\in S|x\equiv a\}.

Lemma: Two equivalence classes [a]_\equiv and [b]_\equiv are either equal or disjoint.

Proof: Every equivalence class contains at least one element, so they cannot be both disjoint and equal. Assuming they are not disjoint, they must have a common element [a]_\equiv \ni c \in [b]_\equiv, but then a\equiv c and c\equiv b, so by transitivity, a\equiv b, hence [a]_\equiv = [b]_\equiv.

Hence, the set of equivalence classes over S regarding \equiv is a partition of S. We call this set S/\equiv, say "S modulo \equiv".

Trivial examples are standard equality =, in which case S/= can be identified with S in a canonical way, and the "squash"-relation that makes every two elements equal, in which case there is only one equivalence class.

Furthermore, important equivalence classes are the residue-equalities =_n on \mathbb{Z}, where a=_n b means that a-b is divisible by n. Their equivalence classes are the residue classes, and they can be added and multiplied elementwise, forming the residue class rings.


Natural Numbers

Our bootstrapping of \mathbb{R} begins with a slightly simpler object, \mathbb{N}. First, we give axioms for something we shall call a Peano Structure:

A triple (S,z,\sigma), where S is a set, z\in S and \sigma:S\rightarrow S is a function, is called a Peano Structure, if

  • There is no t\in S with \sigma(t)=z
  • For every t\in S, \sigma(t)\in S
  • For every s,t\in S, if \sigma(t)=\sigma(s), then t=s
  • If U\subseteq S, z\in U and if t\in U\Rightarrow\sigma(t)\in U, then U=S

That z is called zero, and \sigma is called successor function of that peano structure.

Mathematicians call a function between two structures that preserves relevant parts of that structure an isomorphism. In case of Peano Structures, a function f:S\rightarrow T between two Peano Structures (S,z,\sigma),(T,t,\rho) is an isomorphism if 

  • f is bijective
  • f(z)=t
  • For all x\in S we have f(\sigma(x)) = \rho(f(x)), and for all y\in T we have f^{-1}(\rho(y)) = \sigma(f^{-1}(y)).

If such an isomorphism exists between two Peano Structures, they are called isomorphic. They are then essentially equal structures, in the sense that there is no mathematically meaningful difference between them, and the one structure is a result of just renaming the elements of the other.

Lemma: Every two peano structures are isomorphic (in one set universe)

Proof: Let two peano structures (S,z,\sigma),(T,t,\rho) be given. We define a function f:S\rightarrow T recursively by f(z):=t, f(\sigma(x)):=\rho(f(x)). Therefore, the second and a part of the third axiom are already satisfied by definition.

We show that f is surjective: We know that the image of f is a subset of T, and we know that t=f(z) is in this image. If y is in the image of f, say, y=f(x) for some x, then we know that \rho(y)=f(\sigma(x)) by definition, hence, \rho(y) is also in the image of f. By the axioms of peano structures, this means that the image of f is T.

Now consider the function g:T\rightarrow S with g(t)=z, g(\rho(x))=\sigma(g(x)). By the same argument as above, g is surjective. Furthermore, since we have f(g(t))=f(z)=t and from f(g(x))=x follows f(g(\rho(x))) = f(\sigma(g(x))) = \rho(f(g(x))) we have f(g(x))=x for all x\in T, and similarily g(f(x))=x for all x\in S, hence, g=f^{-1}. Therefore, f is also injective, and the third axiom of isomorphism is proved.

What we now know is that if there is a peano structure, then there is - up to isomorphism - only one. There are several ways to show that (in a set universe) there actually is one. The common way is to classify a special set of finite sets, the finite cardinal numbers. In ZFC, this can be done using the well-ordering theorem, the axiom of infinity and the axiom of comprehension. But doing this is outside our scope, we just believe the following theorem:

Theorem: A peano structure exists (in every set universe).

And now we can take an arbitrary peano structure as our set of natural numbers, \mathbb{N}, by setting 0:=z and (n+1):=\sigma(n) [note that we have yet to define addition, this is just for intuition]. The first three axioms of peano structures become obvious arithmetical laws, the fourth axiom becomes the induction axiom. We can define addition recursively by

0+x:=x, \sigma(y)+x:=\sigma(y+x).

Exercise: Prove x+y=y+x

We can define multiplication recursively by

0\cdot x=0, \sigma(y)\cdot x = x+y\cdot x

The arithmetical laws follow from these definitions.


Integers and Rational Numbers

Now that we have defined \mathbb{N}, we want to reach out for \mathbb{Z}, and it turns out that this is much easier than getting \mathbb{N} in the first place. However, for a non-mathematician, these definitions sound rather overcomplicated, but it turns out that they are much easier to handle than "less complicated" ones.

The need for negative numbers comes from the fact that subtraction is not total on \mathbb{N}, that is, for example, 2-3\not\in\mathbb{N}. Subtraction itself is just the operation of solving an additive equation, 3+x=2 in our example. To be consistent with most of our arithmetical laws so far, it is desirable that the solution of 3+x=2 is equal to the solution of 4+x=3 and generally 3+n+x=2+n, due to cancellation.

We observe that the equations a+x=b and c+x=d must have the same solutions in x if a+d=b+c. Now, let us have a look at the set \mathbb{N}^2 of pairs of natural numbers. If we define (a,b)\approx(c,d):\Leftrightarrow a+d=b+c, then \approx is an equivalence relation on \mathbb{N}^2: Reflexivity and symmetry are obvious. For transitivity, consider (a,b)\approx(c,d) and (c,d)\approx(e,f). That is, a+d=b+c and c+f=e+d. Since we have equality of sums here, we may subtract without getting negative results, therefore c=e+d-f, so a+d=b+e+d-f and hence a+f = e+b, which means (a,b)\approx(e,f). Now we define \mathbb{Z}:=\mathbb{N}^2/\approx.

We furthermore define [(a,b)]_\approx + [(c,d)]_\approx = [(a+c,b+d)]_\approx. We have to show that this is well-defined, which is left as an exercise.

Now, at schools you are taught that \mathbb{N}\subseteq\mathbb{Z}, and even in higher mathematics this is often used. With our definition, this is of course not the case, but we can identify every natural number n with the equivalence class [(n,0)]_\approx. As isomorphic structures do not need to be distinguished, we can safely redefine \mathbb{N} to be the set of these equivalence classes. The changes we would have to accomplish are below the usual abstraction level.

The structure \mathbb{Z} with addition and multiplication is called a Ring.

Rational numbers are similar, this time we add solutions to a\cdot x=b, but we have to avoid the pathological case of a=0. The reason is that we want the rationals to be a Field. Similar to how we did before, we define an equivalence relation \simeq on \mathbb{Z}\times(\mathbb{Z}\backslash\{0\}), namely (a,b)\simeq(c,d):\Leftrightarrow a\cdot d = b\cdot c. You are probably familiar with this representation of rational numbers, you just write \frac{a}{b} instead of (a,b) - we just chose an equivalence that tells when two fractions are equal, and we have \frac{a}{b}=\frac{c}{d}\Leftrightarrow a\cdot d=b\cdot c.


Cauchy Sequences

The step from rational numbers to real numbers is a bit more complicated. It is not sufficient to add solutions of special equations to our system. However, of course, in many cases, the real numbers are just solutions of equations. If we add zeroes of all polynomials, for example, we would get (more than) the algebraic real numbers over \mathbb{Q}. But there are also transcendental numbers like \pi.

Let us first have a look at one of the first irrational numbers ever discovered: \sqrt{2}.

Theorem: There is no rational solution to the equation x^2-2=0.

Proof: By monotony of x\mapsto x^2 we know that a positive solution must lie between 1 and 2, and can especially not be integral. Now assume there was some solution x=\frac{p}{q} with \frac{p^2}{q^2}=2, where w.l.o.g. gcd(p,q)=1. Then also gcd(p^2,q^2)=1. But p^2=2q^2, and as 2 cannot be factorized nontrivially, p must be divisible by 2, therefore, p=2n for some n. But then 4n^2 = 2q^2 yields 2n^2=q^2 and by the same argument as before, we get that q is divisible by 2, which contradicts our assumption gcd(p,q)=1.

So we know that \sqrt{2}\not\in\mathbb{Q}, but still, we can define the set \{x\in \mathbb{Q}|x^2\le 2\}, and this set will be an open interval, with rational numbers approaching \sqrt{2} up to an arbitrary precision. The set \{x\in \mathbb{Q}|x^2\le 2\} is bounded, but has no rational bound. The set \mathbb{Q} is therefore not complete, we say.

However, for every given 0<\varepsilon\in\mathbb{Q} we can give a rational number r\in\mathbb{Q} such that |r-\sqrt{2}|<\varepsilon. We now touch the topic of infinitesimal analysis, however, do not fear, we do not have to dive in it too deeply. The crucial concept we have to define now is the Cauchy sequence:

Definition: A sequence (a_i)_{i\in\mathbb{N}} of rational numbers is called a Cauchy sequence, if for every 0<\varepsilon\in\mathbb{Q} we can find an N\in\mathbb{N} such that for all j,k\ge N we have |a_j - a_k|\le\varepsilon.

That is, if we have such an \varepsilon, we know an N such that all elements of that sequence above a_N do not differ about more than \varepsilon, and since we may choose this \varepsilon arbitrarily small, the sequence "approximates" something.


(Source)

For some of these sequences, we can give a rational number, which they "obviously" approximate. For example, somehow, the sequence (\frac{1}{i})_{i\in\mathbb{N}} "approximates" 0: It gets arbitrarily close to 0. On the other hand, if we set a_i = \max \{n\in\mathbb{N}|(\frac{n}{2^i})^2<2\}, this is a cauchy sequence, but does not approximate any rational number. However, we want this to be a number - in general, we want things to which we have an intuitive arbitrarily precise "approximation" to be numbers - we want the numbers to be complete.

The easiest way to make sure this is the case is to just postulate that every Cauchy sequence has such an approximation (a "limit point"), but then the problem remains that we do not really know how the objects we add should look like. So we simply do the first thing that comes to your mind when you are a mathematician: Just use the Cauchy sequences themselves, and define a proper equivalence relation on them.

To do this, we first have to make a formal definition of what we mean by an "approximation", but we can give it for rational approximated numbers only, so far:

Definition: A sequence (a_i)_{i\in\mathbb{N}} converges towards a\in\mathbb{Q}, if for an arbitrarily small 0<\varepsilon\in\mathbb{Q} we can find an N\in\mathbb{N} such that for all j\ge N we have |a_j - a|\le\varepsilon.

As an exercise, check whether our above statements on "approximations" hold for this definition. Now we define an equivalence relation on cauchy sequences:

Definition: Two cauchy sequences (a_i)_{i\in\mathbb{N}} and (b_i)_{i\in\mathbb{N}} converge towards the same point, denoted by (a_i)_{i\in\mathbb{N}}\asymp(b_i)_{i\in\mathbb{N}}, if the sequence (|a_i - b_i|)_{i\in\mathbb{N}} converges towards 0.

Theorem: \asymp is an equivalence relation.

Proof: Reflexivity holds trivially, since (|a_i-a_i|)_{i\in\mathbb{N}}=(0)_{i\in\mathbb{N}}. Symmetry is trivial since |a_i-b_i|=|b_i-a_i|. For transitivity, assume (a_i)_{i\in\mathbb{N}}\asymp(b_i)_{i\in\mathbb{N}} and (b_i)_{i\in\mathbb{N}}\asymp(c_i)_{i\in\mathbb{N}}. Now let an \varepsilon be given. By definition, we find an N such that for all i\ge N we have |a_i - b_i|\le\frac{\varepsilon}{2}, and similarily, we find an M such that for all i\ge M we have |b_i - c_i|\le\frac{\varepsilon}{2}. But then we have \varepsilon\ge |a_i-b_i| + |b_i-c_i| \ge |a_i-c_i| for all i\ge\max\{M,N\}. Therefore, (a_i)_{i\in\mathbb{N}}\asymp(c_i)_{i\in\mathbb{N}}.

Let \mathbb{Q}^{\rightarrow} be the set of Cauchy sequences, then we define \mathbb{R}:=\mathbb{Q}^{\rightarrow}/\asymp. We can embed the rational numbers into the real numbers by x\mapsto [(x)_{i\in\mathbb{N}}]_\asymp. Furthermore, we define addition componentwise by

(a_i)_{i\in\mathbb{N}}+(b_i)_{i\in\mathbb{N}} = (a_i+b_i)_{i\in\mathbb{N}}

same for multiplication. The simple arithmetic laws then also follow componentwise. For \le and \ge, we say that they hold when they hold for infinitely many corresponding elements of the compared sequences.

Of course, we still have to show a lot of things, like showing that this is really a field with the desired axioms, and showing that addition and multiplication defined in this way are compatible with the equivalence relation. However, this is just technical, and we omit it, since we think that for people not familiar with this topic, it is already sufficiently complicated.

Prospects

For everyone who does not yet feel challenged enough, I have a few prospects. The first one is from set theory over the real line:

Hypothesis (Suslin): Let S be a set and \le satisfy the following conditions:

  • \le is a linear ordering: a\le a, a\le b\ \&\ b\le a \Rightarrow a=b, a\le b\ \&\ b\le c\Rightarrow a\le c.
  • \le is dense: If a\le b and a\neq b, then there is a c such that a\le c\le b.
  • \le is complete: If for some X\in S there exists an a\in S such that all x\in X are \le a ("X is bounded"), then there exists a smallest b\in S with this property ("b is the supremum of X in S).
  • Define intervals in the canonical way by [a;b]:=\{x\in S|a\le x\le b\}. Then for every set I of pairwise disjoint intervals, there exists a surjective function \mathbb{N}\rightarrow I ("every set of pairwise disjoint intervals is countable").

Then (S,\le) is isomorphic to (\mathbb{R},\le).

Theorem: Suslin's Hypothesis is undecidable over standard set theory (ZFC).

Of course, we will not prove this, as it involves a lot of model theory. Let us move on with our second prospect, the p-adic numbers. If we look at the proof that \asymp is an equivalence relation, we notice that what we really needed to prove this were a few inequalities on the absolute value, namely:

  • |a-b|=|b-a|
  • |a-a|=0
  • |a - b|+|b-c| \ge |a-c|

The third one is called the triangle inequality. It is crucial for the transitivity proof. On the other hand, we might think of some other function which satisfies the same axioms. And in fact, there is a mathematical concept which generalizes this:

Definition: A function \delta : X^2\rightarrow\mathbb{Q}^{\ge 0} is called a metric (or distance function) on X if it satisfies

  • \delta(x,x)=0
  • \delta(x,y)=\delta(y,x)
  • \delta(x,y)+\delta(y,z)\ge\delta(x,z)

Note that this definition is usually made with real numbers instead of rational numbers.

Now, as above, we can define \asymp_\delta for a given metric \delta, and it will also be an equivalence relation.

Now, let p be some prime number. For every rational number r\neq 0, we can find an unique integer n, such that r=p^n\cdot\frac{a}{b}, where gcd(a,b)=1, and neither a nor b are divisible by p. Define |r|_p:=p^{-n} for every such r,p,n, and |0|_p:=0. Define \delta_p(x,y):=|x-y|_p.

Lemma: |a|_p + |b|_p\ge |a+b|_p

Proof: Let a=p^n\frac{a'}{a''} and b=p^m\frac{b'}{b''} with fully cancelled fractions not containing p as a factor, and let w.l.o.g. m\le n. Then a+b=p^n\frac{a'}{a''}+p^m\frac{b'}{b''} = p^m (\frac{a'\cdot p^{n-m}\cdot b'' + b'\cdot a''}{a''\cdot b''}). Now, a''b'' is not divisible by p, so the fraction can only contain positive powers of p. Therefore, |a+b|_p = p^{-m-m'} for some m'\ge 0. We now want p^{-m}+p^{-n} \ge p^{-m-m'}, that is, p^{m'}+p^{m-n+m'}\ge 1, which is trivial, since m'\ge0.

Theorem: \delta_p is a metric.

Proof: Trivially, \delta_p(a,a)=0 and \delta_p(a,b)=\delta_p(b,a). It remains to show the triangle inequality, that is, \delta_p(a,b)+\delta_p(b,c)\ge\delta_p(a,c). By the above lemma, we have \delta_p(a,c)=|a-c|_p=|(a-b)+(b-c)|_p\le |a-b|_p+|b-c|_p = \delta_p(a,b)+\delta_p(b,c).

The set we get when identifying sequences according to this metric is called \mathbb{Q}_p, the p-adic numbers. It has many similarities with \mathbb{R}, and you can even do analysis on it.