Wednesday 30 January 2013

Independence of the continuum hypothesis

Following up from previous posts, let us continue on the theme of independence proofs, i.e. showing that a certain statement is not provable in a given formal system. The most celebrated of these results is of course Cohen's proof that the continuum hypothesis (CH) is not provable in set theory. For that Cohen developed the new method of "forcing". A few years after Cohen's result, Dana Scott come up with a brilliant proof of the independence of the CH that avoids forcing, but uses what are now called boolean valued-models. I want to give a brief sketch of Scott's proof, but one should definitely read the extremely well-writen paper [1] for all the details.

The first nice thing about Scott's paper is that it starts by giving a completely natural formulation of CH. CH is often stated using cardinals as the equation $$2^{\aleph_0} = \aleph_1$$ so it's easy to dismiss it as abstract non-sense. But it can also be rephrase in concrete terms as: Given any set of reals, either the set of integers can be mapped onto the set, or the set can be mapped onto the whole set of reals. So, the question is: Is there a set of reals whose cardinality lies strictly between the cardinalities of natural numbers and the whole set of reals?

Formally this can be written as $$ \forall X ( \exists f^{\mathbb{N} \rightarrow X} (\mbox{Img}(f) = X) \vee \exists g^{X \to \mathbb{R}} (\mbox{Img}(g) = \mathbb{R}) ).  $$ I'm writing \( x^X \) as an abbreviation for \( x \in X \).

Scott then builds a model of set theory (a version slightly different than the usual presentation of the Zermelo-Frankel set theory) in which CH is false.

The first novelty is that the construction of the model is parametrised by a probability space \( (\Omega, \cal{A}, P) \). The real numbers are interpreted as random variables, i.e. in the constructed model the set of real number \( \cal{R} \) consists of all \( \eta \colon \Omega \to \mathbb{R} \) such that $$ \{\omega \in \Omega \; \colon \; \eta(\omega) \leq r \} \in \cal{A} $$ where \( \mathbb{R} \) is the "real" set of real numbers. The whole point is to make the set of reals in the model \( \cal{R} \) much much bigger than the actual set of real numbers \( \mathbb{R} \).

The second novelty is that formulas are no longer either true or false, but they have a real (in the sense of \( \mathbb{R} \)) value between zero and one (saying how true it is, with one meaning "very" true). So, equality between two "real numbers" \( \eta \colon \Omega \to \mathbb{R} \) and \( \zeta \colon \Omega \to \mathbb{R} \) is given as $$ val(\eta = \zeta) =  \{\omega \in \Omega \; \colon \; \eta(\omega) = \zeta(\omega) \} / [P = 0] $$ where one has to quotient out the set of events which have \(P\)-measure zero. This valuation function can be extended to all formulas, and the main result is that if a formula \( A \) is provable in set theory then \( val(A) = 1 \), independent of the probability space we started with.

Hence, in order to show that CH is not provable, it's enough to find some probability space \( (\Omega, \cal{A}, P) \) such that in that space \(val(\)CH\() < 1\). Scott fixes \( I \) an arbitrary set of cardinality greater than \( 2^{\aleph_0} \), and then takes \( \Omega = [0,1]^I \). Because \( I \) is huge, the set of sample points \( \Omega \) will also be huge, which means the set of reals \( \cal{R} \) in the model has an immense cardinality.

So, now we need to build a set \( X \subseteq \cal{R}\) such that neigher \( \exists f^{\mathbb{N} \rightarrow X} (\mbox{Img}(f) = X) \) nor \( \exists g^{X \to \cal{R}} (\mbox{Img}(g) = {\cal R}) \). One can check that the projection \( \xi_i \colon \Omega \to \mathbb{R} \) are indeed random variables, and hence are real numbers in the model (i.e. belong to \(\cal{R}\)). Pick a subset \( J \subset I \) which is uncountable but still of cardinality strictly smaller than \( I \). Our set of reals \( X \subset \cal{R}\) will be (informally) $$ X = \{ \xi_j \; \colon \; j \in J \} $$ which one can show that is not countable but has (inside the model) cardinality smaller than the reals \( \cal{R} \). Neat!

Anyway, I hope Scott will forgive me for such a sketchy account of his paper, but I hope this will serve as an advertisement for the curious ones to go and have a closer look at all the details in [1].

[1] Dana Scott, A proof of the independence of the continuum hypothesis, Mathematical systems theory, 1(2):89-111, 1967.

Wednesday 16 January 2013

Disjunction property implies existence property

A theory \( T \) is said to have the disjunction property if whenever a sentence \( A \vee B \) is provable then either \( A \) is provable or \( B \) is provable. It is important that \( A \vee B \) is a sentence, i.e. no free variables, otherwise a simple counter-example would be \( A(x) = {\sf Even}(x) \) and \( B(x) = {\sf Odd}(x) \).

A theory \( T \) is said to have the numerical existence property if whenever a sentence \( \exists n A(n) \) is provable then \( A(n) \) is provable for some numeral \( n \).

Clearly if a theory has the numerical existence property it must also have the disjunction property, as, with a bit of arithmetic, one can write a disjunction \( A \vee B \) using an existential quantifier as \( \exists n ( ( n = 0 \rightarrow A ) \wedge ( n \neq 0 \rightarrow B ) ) \).

Harvey Friedman proved in [1] the amazing result that these two properties are in fact equivalent.

Theorem. Any recursively enumerable extension of intuitionistic arithmetic which has the disjunction property must also have the numerical existence property.

Let us write \( \vdash_y x \) for the binary predicate "\( y \) is the code of a proof of the formula coded by \( x \)". If \( k \) is the code of the formula \( A(x) \) then we write \( Sub(k) \) for the code of the formula \( A(k) \). The code of a formula \( A \) is sometimes written as \( \# A \). Finally, we write \( \neg x \) for \( \# \neg A \) if \( x = \# A \).

Let \( A(x) \) be the formula $$ \exists y ( ( (\vdash_y \neg x) \vee P(y) ) \wedge \forall z < y \not\vdash_z x) $$ which roughly says that "there exists a number \( y \) such that either \( y \) is a proof of \( \neg x \) or \( P(y) \) holds, and moreover, no proof of \( x \) with code shorter than \( y \) exists.

Let \( k \) be a numeral such that \( \# A(Sub(k)) = Sub(k) \), e.g. \( k = \# A(Sub(x)) \). Note that \( Sub(k) \) behaves like a fixed point of \( A(x) \) essentially for the same reasons as in my first post here on Kleene's recursion theorem. We are just replacing the self application of a function \( x (x) \) by the "self-substitution" \( Sub(x) \).

The main trick in Friedman's proof is the following:

Lemma 1. If \( T \vdash A(Sub(k)) \) then \( T \vdash P(n) \), for some \( n \).

Proof. Let us see how such witness \( n \) comes naturally from a given proof of \( A(Sub(k)) \). Let \( n \) be the code of a proof of \( A(Sub(k)) \). Then \( \vdash_n Sub(k) \). By the definition of \( A \) we also have that \( T \) proves $$ \exists y ( ( (\vdash_y \neg Sub(k)) \vee P(y) ) \wedge \forall z < y \not\vdash_z Sub(k)). $$ Hence, taking \( z = n \) it follows that \( T \) proves $$ \exists y \leq n ((\vdash_y \neg Sub(k)) \vee P(y) ). $$ Obviously we can't have \( \exists y \leq n (\vdash_y \neg Sub(k)) \), or \( T \) would be inconsistent. So, we must have that \( T \) proves \( \exists y \leq n P(y) \). Because \( n \) is a numeral, this can be written as a big disjunction, and by the disjunction property \( P(m) \) holds for some \( m \leq n \). QED

So, if we are able to prove \( A(Sub(k)) \) then essentially the "size" of such proof would be a bound on some \( n \) which witnesses \( P \). It remains to show that if \( \exists y P(y) \) is provable then so is \( A(Sub(k)) \). One must first show that \( \neg A(Sub(k)) \) is not provable.

Lemma 2. If \( T \) proves \( \neg A(Sub(k)) \) then it also proves \( A(Sub(k)) \). 

Proof. Assume \( T \vdash \neg A(Sub(k)) \). Then we have that some \( n \) codes a proof of \( \neg A(Sub(k)) \), i.e. \( \vdash_n \neg A(Sub(k)) \). We consider two cases: Either \( \forall z < n \not\vdash_z Sub(k) \), in which case \( A(Sub(k)) \) holds (a contradiction to the consistence of \( T \) ), or \( \exists z < n \vdash_z Sub(k) \), which also implies that \( A(Sub(k)) \) is provable. QED

Note that we might as well assume that \( T \) is consistent, just because it is more sensible to work with consistent theories. Of course if \( T \) is inconsistent then it proves everything and in particular will have both the disjunction and the existence property. So, assuming \( T \) is consistent Lemma 2 says that \( \neg A(Sub(k)) \) is not provable.

The missing piece of the puzzle is the following:

Lemma 3. \( T \) proves \( \exists y P(y) \rightarrow A(Sub(k)) \vee \exists z (\vdash_z Sub(k)) \).

Proof. Let \( y \) be such that \( P(y) \). Consider two cases: Either \( \forall z < y (\not\vdash_z Sub(k)) \), in which case we have \( A(Sub(k)) \); or \( \exists z < y (\vdash_z Sub(k)) \). QED

Note that if \( P(y) \) is primitive recursive we can in fact conclude that \( T \) proves \( \exists y P(y) \rightarrow A(Sub(k)) \), as follows. Assume that \( y \) is the least such that \( (\vdash_y \neg Sub(k)) \vee P(y) \) holds. That means $$ ((\vdash_y \neg Sub(k)) \vee P(y)) \wedge \forall z < y ((\not\vdash_z \neg Sub(k)) \wedge \neg P(z)) $$ which implies \( A(Sub(k)) \).

This already gives us the result for primitive recursive \( P \)'s:

Lemma 4. If \( P \) is primitive recursive and \( T \vdash \exists y P(y) \) then \( T \vdash P(n) \) for some \( n \).

Proof. By the preceding remark we have \( T \vdash A(Sub(k)) \), which by Lemma 1 gives the desired witness \( n \). QED

Finally all can be put together:

Proof of Theorem. Assume \( T \) proves \( \exists y A(y) \). By Lemma 3, \( T \) proves \( A(Sub(k)) \vee \exists z (\vdash_z Sub(k)) \). By the disjunction property we have to cases: (1) \( T \) proves \( A(Sub(k)) \), which by Lemma 1 implies \( T \vdash P(n) \) for some \( n \); or (2) \( T \vdash \exists z (\vdash_z Sub(k)) \). By Lemma 4 we have a witness \( n \) such that \( \vdash_n Sub(k) \), which is in fact a code for a proof of \( A(Sub(k)) \), and the same argument as in case (1) can be followed.

A short reflection on the proof: The proof essentially shows that given a concrete proof of \( \exists y P(y) \) one can "calculate" a bound \( n \) on \( y \) simply using the algorithm for "calculating" the disjunction property. More precisely, let \( f_0, f_1 \) be functions such that $$ \vdash_\pi A \vee B \implies \vdash_{f_0 \pi} A \mbox{ or } \vdash_{f_1 \pi} B. $$ Note that Lemma 1 says $$ \vdash_n A(Sub(k)) \implies \vdash \exists y \leq n \, P(y). $$ Hence, in order to establish the relation between the size of the given proof \( \pi \) of \( \exists y A(y) \) and the final bound \( n \) we must simply track the impact of \( \pi \) on the size of the proof of \( A(Sub(k)) \). So from the proof we can read off a primitive recursive functional \( \phi \) such that if \( \pi \) is the code of a proof of \( \exists n P(n) \) then \( \exists n \leq \phi(f_0, f_1, \pi) P(n) \) is also provable.

[1] Harvey Friedman, "The disjunction property implies the numerical existence property", Proc. Nat. Acad. Sci. USA, 72(8):2877-2878, 1975.