Monday 4 July 2016

Replacing ordinal recursion with primitive recursion

Ordinal numbers are used to order things, no surprise there. What can be surprising for some is that logicians consider ordinals that go beyond the daily 1st, 2nd,... The ordinal \( \omega \) denotes the position right after all "normal" positions 1st, 2nd, 3rd,... And then we can continue with \( \omega + 1, \omega + 2,...\) For instance, suppose we order the natural numbers in such a way that we first enumerate all odd numbers 1, 3, 5,... and then after all these the even numbers 2, 4, 6,... We see that the number two is in the position \( \omega \) and number 4 is in position \( \omega + 1 \), and so on. Now, what is called \( \alpha \)-recursion, for an ordinal number \( \alpha \) is the ability to define a function by course-of-values recursion on the ordinal \( \alpha \). So \( \omega \)-recursion is the good old primitive recursion. Formally, given a primitive recursive ordering \( \prec \) of type \( \alpha \) -- meaning the you will need the ordinal \( \alpha \) in order to count 1st, 2nd, 3rd,... -- an \( \alpha \)-recursive function \( f \colon \mathbb{N} \rightarrow \mathbb{N} \) is one defined as $$ f(n) = F(n)(\lambda i \prec n . f (i)) $$ So, in order to define \( f(n) \) we are allowed to use any "previously calculated" value of f, where "previous" is formalised with the ordering \( \prec \).

Now, there is an interesting relation between primitive recursion over an ordering of type \( \alpha < \varepsilon_0 \) and the primitive recursive functionals of finite type. Here, \( \varepsilon_0 \) is the fist \( \varepsilon \) ordinal, and primitive recursive functionals of finite type are Goedel's extension of Kleene's primitive recursion, where one allows the input and return type of a function to be another function (quite common nowadays in functional programming languages such as Haskell, Python and Scala). It is well-known that a function is \(\alpha\)-recursive, for \( \alpha < \varepsilon_0 \), if and only if it is definable as a primitive recursive functional.

In this post we will look at part of one direction: how to replace \(\alpha\)-recursion by higher-type primitive recursion. In particular, we will look at the construction given in Tait's paper "Constructive Reasoning" from 1968, where he shows how to replace a \(2^\alpha\)-recursion, by an \( \alpha \)-recursion together with one application of higher-type primitive recursion. We will be following this proof quite closely.

So let \( \prec \) be a primitive recursive ordering of type \( \alpha \). Assume also that we have a standard primitive recursive one-to-one correspondence between natural numbers and finite sequences of natural numbers $$ a \quad \leftrightarrow \langle a_0, \ldots, a_{|a| - 1} \rangle $$ such that \( a_0 > a_1 > \ldots > a_{|a| - 1} \).

Define \( a \subset b \) if and only if either
  • \( |a| < |b| \) and \( a_i = b_i \) for all \( i < |a| \), or
  • there is a \( k < \min\{|a|, |b|\} \) with \( a_i \prec b_k \)
Ie. \( a \subset b \) is nothing more than the lexicographical ordering on words given the pointwise ordering \( \prec \) on letters.

Then, note that if each \( a_i \) is in position \( \alpha_i \) in the ordering \( \prec \), then \( a \) is in position \( 2^{\alpha_0} + 2^{\alpha_1} + \ldots + 2^{\alpha_{|a| - 1}} \) in the lexicographical ordering \( \subset \). Hence the ordering \( \subset \) has type \( 2^\alpha \).

Assume now \( \phi \colon \mathbb{N} \rightarrow (\mathbb{N} \rightarrow \mathbb{N}) \) is defined by \( 2^\alpha \)-recursion $$ \phi(n) = \lambda i \subset n . F(i, \phi i) $$ We will give a definition of \( \phi \) only using \( \alpha \)-recursion plus some extra primitive recursion of higher-type.

Given \( a = \langle a_0, \ldots, a_{|a| - 1} \rangle \), let us define \( a^k = \langle a_0, \ldots, a_{k - 1} \rangle \), i.e. the chopping of the sequence \( a \) after \( k \) elements.

The idea of the construction is quite simple. We just need to notice that for \( k < |n| \) we have that \( i \subset n^{k+1} \) if and only if either
  • \( i \subseteq n^k \), or
  • \( |n| > k \wedge i^k \equiv n^k \wedge i_k \prec n_k \)
In other words, the sequence \( i \) is strictly smaller than the first \( k + 1 \) bits of \( n \) if either \( i \) is already smaller or equal than the first \( k \) bits of \( n \), or if \( i \) is equal to \( n \) up to point \( k - 1 \) but strictly smaller at point \( k \).

So we are essentially done. In the first case we compute by primitive recursion, whereas in the second case we compute by \( \alpha \)-recursion!

Formally, define a more general functional \( \Psi(k)(v)(n)(i) \) by primitive recursion on \( k \) (on a higher-type) $$ \Psi(k) = {\huge\{} \begin{array}{ll} \lambda v \lambda n \lambda i . 0 & k = 0 \\ \chi(k, \Psi(k-1)) & k > 0 \end{array} $$ where \( \chi(k, f)(v) \) is defined by \( \alpha \)-recursion on \( v \) as $$ \chi(k, f)(v) = \lambda n \lambda i . {\huge\{} \begin{array}{ll} f(v)(n)(i) & i \subseteq n^k \\ F(i, \chi(k, f)(i_k)(i)) & i^k \equiv n^k \wedge i_k \prec v \end{array} $$ with \( \chi(k, f)(v) = \lambda n \lambda i . 0 \) if \( i \not\subset n^{k+1} \) or \( k \geq |n| \). By induction on \( k \) and \( \alpha \)-induction on \( v \) we can see that if \( k \leq |n| \) and \( k > 0 \rightarrow v = n_{k-1} \) then $$ \Psi(k)(v)(n) = \lambda i \subset n^k . F(i, \phi i) $$ Hence, we can define $$ \phi(n) = {\huge\{} \begin{array}{ll} \Psi(|n|)(n_{|n| - 1})(n) & n > 0 \\ \lambda i . 0 & n = 0 \end{array} $$

Friday 16 August 2013

Kripke models and the dialectica interpretation


Few people know that Girard's PhD dissertation contained some results about Gödel's dialectica interpretation. As a fan of both Girard and the dialectica interpretation I feel I have to explain this rather unknown bit of Girard's work here.

The result I mention is in part IV of his dissertation, and was published in [1]. It is an interesting connection between validity in a certain Kripke structure and validity with respect to the dialectica interpretation. Not only is the result interesting, but the proof as well is quite clever, involving an unexpected use of the fan theorem in the form of "any open convering of a compact set has a finite sub-covering".

More precisely, let any quantifier-free extension \( F \) of system T which is consistent and complete (i.e. for any term \( t \) of ground type \( F \) proves exactly one of either \( t = 0 \) or \( t \neq 0 \)) be called a functional system. Consider the Kripke structure where worlds are functional systems, with the pre-order on the worlds being the relation "subsystem of". Girard's main result is that a formula \( A \) is valid at a node \( F \) if and only if there exists a closed term \( t \in F \) such that \( F \vdash A_D(t, y) \), where \( \exists x \forall y A_D(x, y) \) is the dialectica interpretation of \( A \). The most difficult part of the proof is to show that if in any extension \( F' \) of \( F \) and any closed term \( a \in F' \) there exists a closed term \( b \in F' \) such that \( F' \vdash A(a, b) \), then it must be the case already that in \( F \) we have a function \( c \) such that \( F \vdash A(x, c x) \). We sketch the proof of this here.

Theorem. Suppose that for all functional systems \( F' \supset F \) and closed term \( a \in F' \) of type \( \rho \) there exists a closed term \( b \in F' \) of type \( \tau \) such that \( F' \vdash A(a, b) \). Then, there exists a closed term \( c \in F \) of type \( \rho \to \tau \) such that \( F \vdash A(x, c x) \).

Note that without the assumption of completeness of functional systems the result would be easy. Just take \( F' \) to be \( F \) with a single new constant \( a \). Then, by the assumption there would be a closed term \( b \) in the language of \( F \) plus the new constant. So, it would have to be \( t[a] \) for some \( t[x] \in F \). We could then take \( c = \lambda x . t[x] \). The problem is that we can only consider extensions \( F' \) of \( F \) which are functional systems, i.e. consistent and complete.

The proof goes as follows:

I. Consider an enumeration \( u_n[x^\rho] \) of all the formulas of \( F \) with a single free-variable of type \( \rho \). Let \( a \) be a new constant of type \( \rho \).

II. For each partial function \( p \colon \mathbb{N} \to 2 \) define \( F_p \) as \( F \) plus the new constant \( a \) and axioms $$ u_n[a] = \top $$ when \( p(n) \) is defined and \( p(n) = \top \) and $$ u_n[a] = \bot $$ when \( p(n) \) is defined and \( p(n) = \bot \). Note that \( F_p \) might not be a functional system.

III. Let us argue that for all total \( f \) there exists a closed term \( c_f \in F \) such that \( F_f \vdash A(a, c_f a) \). Because \( f \) is total, \( F_f \) is complete. But it is either consistent or inconsistent. If it is inconsistent then we can take \( c_f \) to be anything we wish. On the other hand, if \( F_f \) is consistent then it is a functional system which extends \( F \). By the assumption of the theorem we have a closed term \( b \in F_f \) such that \( F_f \vdash A(a, b) \). But \( b \) must be of the form \( t[a] \) for some \( t[x] \in F \). Hence, take \( c_f = \lambda x . t[x] \).

IV. Let \( U_A = \{ p \colon [n] \to 2 \; | \; \exists c_p \in F (F_p \vdash A(a, c_p a)) \} \) and \( O_p = \{ f \; | \; f \colon \mathbb{N} \to 2 \wedge p \prec f \} \). That is, \( U_A \) contains all the partial functions \( p \) for which III above already holds. And \( O_p \) is the open set of the Baire space defined by the partial function \( p \).

V. We have that \( 2^\mathbb{N} = \bigcup_{p \in U_A} O_p \). This follows because for any total \( f \in 2^\mathbb{N}\) we can see that in III we only use finitely many axioms of \( F_f \) to prove \( A(a, c_f a) \). So, for such finite approximation \( p \) the result also holds and \( f \in O_p \).

VI. Since \( 2^\mathbb{N} \) is compact, a finite cover exists, i.e. \( 2^\mathbb{N} = O_{p_0} \cup \ldots \cup O_{p_n} \).

VII. It is easy to see that the extra axioms of \( F_{p_i} \) can be equivalently written as a single formula \( P_i[a] \) such that \( F_{p_i} = F_{\emptyset} + P_i[a] \), where \( \emptyset \) denotes the everywhere undefined function.

VIII. Because we had a covering of \( 2^\mathbb{N} \) we have $$ F_{\emptyset} \vdash P_0[a] \vee \ldots \vee P_n[a]. $$

IX. Moreover, $$ F_{\emptyset} \vdash P_i[a] \to A(a, c_{p_i}(a)). $$

X. Finally, this means we have \( F \vdash P_0[x] \vee \ldots \vee P_n[x] \) and \( F \vdash P_i[x] \to A(x, c_{p_i}(x)) \). The desired function \( c \) can be defined by cases as \( c x = c_{p_i} x \) for the least \( i \leq n \) such that \( P_i[x] \) holds.

[1] Jean-Yves Girard, Functional interpretations and Kripke models. Logic and Foundations of Mathematics and Computability Theory. The University of Western Ontario Series in Philosophy of Science, volume 9, 33-57, 1977.


Monday 4 March 2013

Goodman's theorem

Let HA denote Heyting (intuitionistic) arithmetic and HA\(^\omega\) be Heyting arithmetic extended to the language of finite types, i.e. with quantifiers for each finite type. AC stands for the axiom of choice. Finally, recall that a formula is arithmetical if it only contains quantifications over numbers.

In [1,2] Goodman proved the following amazing result:

Theorem. If HA\(^\omega\) + AC proves an arithmetical formula \( A \) then HA already proves \( A \).

In proof theory one would simply say that HA\(^\omega\) + AC is conservative over HA. The fact that HA\(^\omega\) is conservative over HA was already known and is easy to show. But the conservation of HA\(^\omega\) + AC over HA is quite a surprising result.

Recall that adding the axiom of choice to classical mathematics leads to all sorts of strange things (e.g. the Banach-Tarski paradox). Goodman's theorem essentially says that this is not the fault of the axiom of choice, but rather a fault of the combination of the axiom of choice with classical logic. Classical logic on its own makes perfect sense. The axiom of choice in an intuitionistic setting is harmless. But when these two are put together all hell breaks loose.

Proof of Goodman's theorem. Goodman's proof involves two major proof-theoretic techniques: forcing and realizability. Realizability is used to eliminate the axiom of choice, since the axiom of choice has a trivial realizer. But then one ends up with a proof of "\(t\) realizes \(A\)". Forcing is used to recover the truth of \( A \) given that \( A \) has a realizer. That is done by choosing the forcing conditions to be approximations of the Skolem functions of the sub-formulas of \( A \). So, the axiom of choice is replaced by finite approximations of Skolem functions! Let's see a few of the details:

We present here a sketch of Beeson's proof [3] of Goodman's theorem. Beeson's proof has the advantage (over Goodman's proof) that the two techniques of forcing and realizability are clearly separated. The proof consists of the following steps:

(1) Let \( A \) be an arithmetical formula such that HA\(^\omega\) + AC \( \vdash A \).

(2) Let HA\(^\omega_a \) denote the extension of HA\(^\omega\) with a new function symbol \( a \). By the soundness of Kleene realizability relative to \( a \) we have HA\(^\omega_a \vdash t \) realizes \( A \), where \( t \) is a term of HA\(^\omega_a \).

(3) By the soundness of the forcing interpretation we have HA\(^\omega \vdash \exists p (p \Vdash t \) realizes \( A) \), where the forcing conditions are chosen as in the main lemma (below). Forcing is done as usual, with the crucial difference that in the forcing of the atomic formulas the forcing condition \( p \) replaces the "generic" function \( a \), i.e. \( p \Vdash a(n) = m \) is defined as \( p(n) = m \), where I'm actually using the equality symbol "=" for partial equality.

(4) By the main lemma HA\(^\omega \vdash \forall p \exists q \leq p (q \Vdash (t \) realizes \( A) \Rightarrow A) \).

(5) By (3) and (4) we have HA\(^\omega \vdash \exists q (q \Vdash A) \).

(6) As for arithmetical formula HA\(^\omega \vdash (q \Vdash A) \Leftrightarrow A \) we have, HA\(^\omega \vdash A \).

Steps (2) and (3) are pretty standard realizability and forcing interpretations. And only the soundness theorems of these are used. Step (6) is also easy to check by a simple induction on \( A \). So, the crucial bit of the proof is the choice of the forcing conditions and the main lemma which we discuss next.

Main lemma. Fix an arithmetical sentence \( A \). Then there is a set \( C \) of forcing conditions such that $$ \mbox{HA}^\omega \vdash \forall p \exists q \leq p (q \Vdash (t  \mbox{ realizes } A) \Rightarrow A). $$
Proof of main lemma. The set of forcing conditions \( C \) consists of finite functions \( p \) which are approximations to the Skolem functions of all sub-formulas of \( A \). More precisely, these are partial functions \( p \) with finite domain such that for each sub-formula \( B(x, y) \) of \( A \) we have $$ \exists x B(x, y) \wedge (p_{\exists x B(x, y)}(y) \mbox{ defined }) \Rightarrow B(p_{\exists x B(x, y)}(y), y).$$ Hence, whenever the approximation to the Skolem function \( p_{\exists x B(x, y)} \) is defined then it produces the right witness. Relative to these (approximation of) Skolem functions, it's easy to show the following:

(i) HA\(^\omega \vdash \forall p \exists q \leq p (q \Vdash (B(y) \Rightarrow \{j_B\}(y) \mbox{ realizes } B )) \), for some index \( j_B \) which we can construct (using the Skolem functions).

(ii) HA\(^\omega \vdash \forall p \exists q \leq p (q \Vdash (t \mbox{ realizes } B ) \Rightarrow B) \).

Remark. Ulrich Kohlenbach [4] has shown the interesting fact that Goodman's theorem does not hold for fragments of HA\(^\omega\). This means that in order to eliminate AC from the proof of an arithmetical formula \( A \) we might have to use a more complex induction than in the proof which is allowed to use AC. Thierry Coquand [5] has just published another proof of Goodman's theorem.

[1] Goodman, N., The theory of the Gödel functionals, Journal of Symbolic Logic 41, 574-583 (1976)

[2] Goodman, N. Relativized realizability in intuitionistic arithmetic of all finite types. Journal of Symbolic Logic 43, pp. 23-44 (1978)

[3] Beeson, M., Goodman's theorem and beyond. Pacific J. Math. 84, 1-16 (1979)

[4] Kohlenbach, U., A note on Goodman's theorem. Studia Logica 63, 1-5 (1999)

[5] Coquand, T., About Goodman's theorem. Annals of Pure and Applied Logic 164(4), 437-442 (2013)

Wednesday 30 January 2013

Independence of the continuum hypothesis

Following up from previous posts, let us continue on the theme of independence proofs, i.e. showing that a certain statement is not provable in a given formal system. The most celebrated of these results is of course Cohen's proof that the continuum hypothesis (CH) is not provable in set theory. For that Cohen developed the new method of "forcing". A few years after Cohen's result, Dana Scott come up with a brilliant proof of the independence of the CH that avoids forcing, but uses what are now called boolean valued-models. I want to give a brief sketch of Scott's proof, but one should definitely read the extremely well-writen paper [1] for all the details.

The first nice thing about Scott's paper is that it starts by giving a completely natural formulation of CH. CH is often stated using cardinals as the equation $$2^{\aleph_0} = \aleph_1$$ so it's easy to dismiss it as abstract non-sense. But it can also be rephrase in concrete terms as: Given any set of reals, either the set of integers can be mapped onto the set, or the set can be mapped onto the whole set of reals. So, the question is: Is there a set of reals whose cardinality lies strictly between the cardinalities of natural numbers and the whole set of reals?

Formally this can be written as $$ \forall X ( \exists f^{\mathbb{N} \rightarrow X} (\mbox{Img}(f) = X) \vee \exists g^{X \to \mathbb{R}} (\mbox{Img}(g) = \mathbb{R}) ).  $$ I'm writing \( x^X \) as an abbreviation for \( x \in X \).

Scott then builds a model of set theory (a version slightly different than the usual presentation of the Zermelo-Frankel set theory) in which CH is false.

The first novelty is that the construction of the model is parametrised by a probability space \( (\Omega, \cal{A}, P) \). The real numbers are interpreted as random variables, i.e. in the constructed model the set of real number \( \cal{R} \) consists of all \( \eta \colon \Omega \to \mathbb{R} \) such that $$ \{\omega \in \Omega \; \colon \; \eta(\omega) \leq r \} \in \cal{A} $$ where \( \mathbb{R} \) is the "real" set of real numbers. The whole point is to make the set of reals in the model \( \cal{R} \) much much bigger than the actual set of real numbers \( \mathbb{R} \).

The second novelty is that formulas are no longer either true or false, but they have a real (in the sense of \( \mathbb{R} \)) value between zero and one (saying how true it is, with one meaning "very" true). So, equality between two "real numbers" \( \eta \colon \Omega \to \mathbb{R} \) and \( \zeta \colon \Omega \to \mathbb{R} \) is given as $$ val(\eta = \zeta) =  \{\omega \in \Omega \; \colon \; \eta(\omega) = \zeta(\omega) \} / [P = 0] $$ where one has to quotient out the set of events which have \(P\)-measure zero. This valuation function can be extended to all formulas, and the main result is that if a formula \( A \) is provable in set theory then \( val(A) = 1 \), independent of the probability space we started with.

Hence, in order to show that CH is not provable, it's enough to find some probability space \( (\Omega, \cal{A}, P) \) such that in that space \(val(\)CH\() < 1\). Scott fixes \( I \) an arbitrary set of cardinality greater than \( 2^{\aleph_0} \), and then takes \( \Omega = [0,1]^I \). Because \( I \) is huge, the set of sample points \( \Omega \) will also be huge, which means the set of reals \( \cal{R} \) in the model has an immense cardinality.

So, now we need to build a set \( X \subseteq \cal{R}\) such that neigher \( \exists f^{\mathbb{N} \rightarrow X} (\mbox{Img}(f) = X) \) nor \( \exists g^{X \to \cal{R}} (\mbox{Img}(g) = {\cal R}) \). One can check that the projection \( \xi_i \colon \Omega \to \mathbb{R} \) are indeed random variables, and hence are real numbers in the model (i.e. belong to \(\cal{R}\)). Pick a subset \( J \subset I \) which is uncountable but still of cardinality strictly smaller than \( I \). Our set of reals \( X \subset \cal{R}\) will be (informally) $$ X = \{ \xi_j \; \colon \; j \in J \} $$ which one can show that is not countable but has (inside the model) cardinality smaller than the reals \( \cal{R} \). Neat!

Anyway, I hope Scott will forgive me for such a sketchy account of his paper, but I hope this will serve as an advertisement for the curious ones to go and have a closer look at all the details in [1].

[1] Dana Scott, A proof of the independence of the continuum hypothesis, Mathematical systems theory, 1(2):89-111, 1967.

Wednesday 16 January 2013

Disjunction property implies existence property

A theory \( T \) is said to have the disjunction property if whenever a sentence \( A \vee B \) is provable then either \( A \) is provable or \( B \) is provable. It is important that \( A \vee B \) is a sentence, i.e. no free variables, otherwise a simple counter-example would be \( A(x) = {\sf Even}(x) \) and \( B(x) = {\sf Odd}(x) \).

A theory \( T \) is said to have the numerical existence property if whenever a sentence \( \exists n A(n) \) is provable then \( A(n) \) is provable for some numeral \( n \).

Clearly if a theory has the numerical existence property it must also have the disjunction property, as, with a bit of arithmetic, one can write a disjunction \( A \vee B \) using an existential quantifier as \( \exists n ( ( n = 0 \rightarrow A ) \wedge ( n \neq 0 \rightarrow B ) ) \).

Harvey Friedman proved in [1] the amazing result that these two properties are in fact equivalent.

Theorem. Any recursively enumerable extension of intuitionistic arithmetic which has the disjunction property must also have the numerical existence property.

Let us write \( \vdash_y x \) for the binary predicate "\( y \) is the code of a proof of the formula coded by \( x \)". If \( k \) is the code of the formula \( A(x) \) then we write \( Sub(k) \) for the code of the formula \( A(k) \). The code of a formula \( A \) is sometimes written as \( \# A \). Finally, we write \( \neg x \) for \( \# \neg A \) if \( x = \# A \).

Let \( A(x) \) be the formula $$ \exists y ( ( (\vdash_y \neg x) \vee P(y) ) \wedge \forall z < y \not\vdash_z x) $$ which roughly says that "there exists a number \( y \) such that either \( y \) is a proof of \( \neg x \) or \( P(y) \) holds, and moreover, no proof of \( x \) with code shorter than \( y \) exists.

Let \( k \) be a numeral such that \( \# A(Sub(k)) = Sub(k) \), e.g. \( k = \# A(Sub(x)) \). Note that \( Sub(k) \) behaves like a fixed point of \( A(x) \) essentially for the same reasons as in my first post here on Kleene's recursion theorem. We are just replacing the self application of a function \( x (x) \) by the "self-substitution" \( Sub(x) \).

The main trick in Friedman's proof is the following:

Lemma 1. If \( T \vdash A(Sub(k)) \) then \( T \vdash P(n) \), for some \( n \).

Proof. Let us see how such witness \( n \) comes naturally from a given proof of \( A(Sub(k)) \). Let \( n \) be the code of a proof of \( A(Sub(k)) \). Then \( \vdash_n Sub(k) \). By the definition of \( A \) we also have that \( T \) proves $$ \exists y ( ( (\vdash_y \neg Sub(k)) \vee P(y) ) \wedge \forall z < y \not\vdash_z Sub(k)). $$ Hence, taking \( z = n \) it follows that \( T \) proves $$ \exists y \leq n ((\vdash_y \neg Sub(k)) \vee P(y) ). $$ Obviously we can't have \( \exists y \leq n (\vdash_y \neg Sub(k)) \), or \( T \) would be inconsistent. So, we must have that \( T \) proves \( \exists y \leq n P(y) \). Because \( n \) is a numeral, this can be written as a big disjunction, and by the disjunction property \( P(m) \) holds for some \( m \leq n \). QED

So, if we are able to prove \( A(Sub(k)) \) then essentially the "size" of such proof would be a bound on some \( n \) which witnesses \( P \). It remains to show that if \( \exists y P(y) \) is provable then so is \( A(Sub(k)) \). One must first show that \( \neg A(Sub(k)) \) is not provable.

Lemma 2. If \( T \) proves \( \neg A(Sub(k)) \) then it also proves \( A(Sub(k)) \). 

Proof. Assume \( T \vdash \neg A(Sub(k)) \). Then we have that some \( n \) codes a proof of \( \neg A(Sub(k)) \), i.e. \( \vdash_n \neg A(Sub(k)) \). We consider two cases: Either \( \forall z < n \not\vdash_z Sub(k) \), in which case \( A(Sub(k)) \) holds (a contradiction to the consistence of \( T \) ), or \( \exists z < n \vdash_z Sub(k) \), which also implies that \( A(Sub(k)) \) is provable. QED

Note that we might as well assume that \( T \) is consistent, just because it is more sensible to work with consistent theories. Of course if \( T \) is inconsistent then it proves everything and in particular will have both the disjunction and the existence property. So, assuming \( T \) is consistent Lemma 2 says that \( \neg A(Sub(k)) \) is not provable.

The missing piece of the puzzle is the following:

Lemma 3. \( T \) proves \( \exists y P(y) \rightarrow A(Sub(k)) \vee \exists z (\vdash_z Sub(k)) \).

Proof. Let \( y \) be such that \( P(y) \). Consider two cases: Either \( \forall z < y (\not\vdash_z Sub(k)) \), in which case we have \( A(Sub(k)) \); or \( \exists z < y (\vdash_z Sub(k)) \). QED

Note that if \( P(y) \) is primitive recursive we can in fact conclude that \( T \) proves \( \exists y P(y) \rightarrow A(Sub(k)) \), as follows. Assume that \( y \) is the least such that \( (\vdash_y \neg Sub(k)) \vee P(y) \) holds. That means $$ ((\vdash_y \neg Sub(k)) \vee P(y)) \wedge \forall z < y ((\not\vdash_z \neg Sub(k)) \wedge \neg P(z)) $$ which implies \( A(Sub(k)) \).

This already gives us the result for primitive recursive \( P \)'s:

Lemma 4. If \( P \) is primitive recursive and \( T \vdash \exists y P(y) \) then \( T \vdash P(n) \) for some \( n \).

Proof. By the preceding remark we have \( T \vdash A(Sub(k)) \), which by Lemma 1 gives the desired witness \( n \). QED

Finally all can be put together:

Proof of Theorem. Assume \( T \) proves \( \exists y A(y) \). By Lemma 3, \( T \) proves \( A(Sub(k)) \vee \exists z (\vdash_z Sub(k)) \). By the disjunction property we have to cases: (1) \( T \) proves \( A(Sub(k)) \), which by Lemma 1 implies \( T \vdash P(n) \) for some \( n \); or (2) \( T \vdash \exists z (\vdash_z Sub(k)) \). By Lemma 4 we have a witness \( n \) such that \( \vdash_n Sub(k) \), which is in fact a code for a proof of \( A(Sub(k)) \), and the same argument as in case (1) can be followed.

A short reflection on the proof: The proof essentially shows that given a concrete proof of \( \exists y P(y) \) one can "calculate" a bound \( n \) on \( y \) simply using the algorithm for "calculating" the disjunction property. More precisely, let \( f_0, f_1 \) be functions such that $$ \vdash_\pi A \vee B \implies \vdash_{f_0 \pi} A \mbox{ or } \vdash_{f_1 \pi} B. $$ Note that Lemma 1 says $$ \vdash_n A(Sub(k)) \implies \vdash \exists y \leq n \, P(y). $$ Hence, in order to establish the relation between the size of the given proof \( \pi \) of \( \exists y A(y) \) and the final bound \( n \) we must simply track the impact of \( \pi \) on the size of the proof of \( A(Sub(k)) \). So from the proof we can read off a primitive recursive functional \( \phi \) such that if \( \pi \) is the code of a proof of \( \exists n P(n) \) then \( \exists n \leq \phi(f_0, f_1, \pi) P(n) \) is also provable.

[1] Harvey Friedman, "The disjunction property implies the numerical existence property", Proc. Nat. Acad. Sci. USA, 72(8):2877-2878, 1975.

Thursday 20 December 2012

Independence of B\(\Sigma_2\) from I\(\Sigma_1\)


One thing is to prove a certain statement \( A \), another thing completely different is to prove that \( A \) is not provable! How should one go about proving something like this? By the soundness theorem (if \( A \) is provable in a theory T then \( A \) is valid in all model of T), it suffices to produce a model where \( A \) is not valid. And that's what people will normally do.

But can one answer such questions using proof-theoretic methods? In theory one can. You assume a proof of \( A \) exists. Apply a proof translation or proof interpretation to the imaginary proof. The soundness of the translation or interpretation guarantees the existence of some information or result, which, if one can show not to exist, contradicts the fact that a proof of \( A \) existed in the first place.

And that's precisely what Charles Parsons did in [1], in order to show that \( \Sigma_2 \)-Bounded Collection is not provable from \( \Sigma_1 \)-Induction. Parsons' proof uses Herbrand theorem and the recursion theorem from the previous post. Unfortunately, the proof is not only quite complicated, but I think it contains a bug (don't quote me on that though).

As a fan of proof theory I would love to see a nice and simple proof-theoretic argument proving this interesting result, some form of realizability or dialectica, but for now I will present a model theoretic proof given by Paris and Kirby in [2].

For the record, \( \Sigma_2 \)-Bounded Collection is the principle $$ \mbox{B\( \Sigma_2 \)} \quad \colon \quad \forall x < y \exists z A(x, z) \rightarrow \exists t \forall x < y \exists z < t A(x, z) $$ where \( A(x,y) \) is a \( \Sigma_2 \) formula. Because the implication from right to left is trivial, one can view the principle as stating the equivalence $$ \forall x < y \exists z A(x, z) \leftrightarrow \exists t \forall x < y \exists z < t A(x, z) $$ stating that unbounded existential quantifiers can in some sense be ``prenexed" over bounded universal quantifiers. Moreover, as we are in the classical setting the same can be said about unbounded universal quantifiers and bounded existential quantifiers $$ \exists x < y \forall z A(x, z) \leftrightarrow \forall t \exists x < y \forall z < t A(x, z). $$Theorem. There is a model of I\( \Sigma_1 \) which is not a model of B\( \Sigma_2 \).

The proof goes as follows. Start with any model \( M \) of Peano arithmetic containing non-standard \( \Sigma_2 \)-definable elements. They proceed as follows:

(1) Let \( K = \{ x \in M \; | \; \mbox{ \(x\) is \(\Sigma_2\)-definable in } M \} \)

(2) Notice that if \( A \) is \( \Sigma_2 \) then \( M \models A \) implies \( K \models A \).

(3) Clearly \( K \models \mbox{I}\Sigma_1 \), as \( K \) is a model of the least number principle for \( \Pi_1 \) formulas. (Side remark: Show that \( K \not\models \mbox{I}\Sigma_2 \) )

Let us show that \( K \not\models \mbox{B}\Sigma_2 \) by producing a concrete instance of \( {B}\Sigma_2 \) which does not hold in \( K \).

(4) Let \( \exists w A(e, w, x) \) be a universal \(\Sigma_2\)-formula, with \( A \in \Pi_1 \). Moreover, let \( t \in K \) be non-standard.

(5) For any \( a \in K \) we clearly have $$ M \models \exists e < t \exists u ( \underbrace{A(e, u_0, u_1) \wedge u_1 = a \wedge \forall z < u \neg A(e, z_0, z_1)}_{(*)}). $$ That's to say, in the model \( M \) we can find standard codes \( e \) representing \( \Sigma_2\)-formulas that uniquely define each \( a \in K \).

(6) Let \( B(e, u, a) = \exists x \forall y B_0(e, u, a, x, y) \) be a \( \Sigma_2 \) formula provably equivalent in Peano arithmetic to \( (*) \). Hence $$ M \models \exists e < t \exists u B(e, u, a).$$ (7) Note that \( \exists u B(e, u, a_0) \wedge \exists u B(e, u, a_1) \rightarrow a_0 = a_1 \).

(8) Hence, by (2), we also have $$ K \models \exists e < t \exists u B(e, u, a) $$ and in particular $$ K \models \forall a < t+1 \exists e < t \exists u \exists x \forall y B_0(e, u, a, x, y) $$ We show now that $$ K \not\models \exists r \exists s \forall a < t + 1 \exists e < t \exists u < r \exists x < s \forall y B_0(e, u, a, x, y). $$ obtaining an instance of \( B\Sigma_2 \) which does not hold in \( K \).

(9) Assume $$ K \models \exists r \exists s \exists e < t \exists u < r \exists x < s \forall y B_0(e, u, a, x, y) $$ and consequently $$ K \models \exists r \forall p \exists s \exists e < t \exists u < r \exists x < s \forall y < p B_0(e, u, a, x, y). $$ (10)  Hence, by (2) again, $$ M \models \forall a < t+1 \exists r \forall p \exists s \exists e < t \exists u < r \exists x < s \forall y < p B_0(e, u, a, x, y) $$ which is equivalent to $$ M \models \forall a < t+1 \exists e < t \exists u B(e, u, a) $$ which is a contradiction by (7) and the pigeon-hole principle in \( M \).

So that concludes the proof. I must add that the same argument works to show the independence of B\(\Sigma_{n+1}\) from I\(\Sigma_n\), one must simply adjust all the indexes above accordingly (and that's what Paris and Kirby have in [2]).

Anyway, as I said, I would love to see a nice and short proof as the one above using proof theoretic methods instead.

[1] Charles Parsons, On a number theoretic choice schema and its relation to induction, In: Intuitionism and Proof Theory (Proc. Conf., Buffalo, N.Y.,1968) (1970), pp. 459-473.

[2] J. B. Paris and L.A.S. Kirby, \( \Sigma_n \) collection schemas in arithmetic. Logic Colloquium, pp. 199-209, 1977.

Tuesday 18 December 2012

Kleene's second recursion theorem

You will need to know the basics of partial recursive functions for this one, but believe me, it's worth the effort. If \( e \) is a number, we use the standard notation \( \{e\} \) for the partial function whose code is \( e \). The theorem says the following:


Theorem A. For any total computable function \( F \) there exist an \( e \) such that \( \{ e \} \simeq \{ F e \} \).

Just think about it. \( F \) can be any total computable function! Take \( F(x) = x+1 \), for instance. That's certainly computable. The theorem says that there exists two successive natural numbers \( e \) and \( e + 1 \) that code the same function, i.e. \( \{e\} \simeq \{ e+1 \} \) . Isn't that just amazing?

The crucial idea of the proof is self-application, i.e. the fact that we can apply a function to its own code as \( \{e\}(e) \). Moreover, that this self-application is itself computable, i.e. there is a computable function \( h \) such that \( \{ h(e) \} \simeq \{ \{e\}(e) \} \). 
The proof of the theorem goes as follows:

(1) Assume a total computable function \( F \) is given.
(2) For any \( x \) let \( h (x) \) be a code such that \( \{ h(x) \} \simeq \{ \{x\}(x) \} \). Such total and computable function \( h \) exists.
(3) Let \( k \) be such that \( \{k\}(x) \simeq F(h(x)) \). Such \( k \) exists because both \( F \) and \( h \) are computable.
(4) We claim that \( e = h(k) \) does the job!
(5) By (2) and (4) we have \( \{e\}(x) \simeq \{h(k)\}(x) \simeq \{\{k\}(k)\}(x) \).
(6) But by (3) we also have \( \{\{k\}(k)\}(x) \simeq \{F(h(k)\}(x) \simeq \{F(e)\}(x) \), so we are done.

An absolute gem, don't you think!?

If you also know about the lambda calculus, you can have the following variant of the theorem:

Theorem B. For any lambda term \( t \) there exist a lambda term \( s \) such that \( t s = s \).

So, this essentially says that any lambda term has a fixed point!

The proof of this variant follows the structure of the original proof:

(1) Let a lambda term \( t \) be given.
(2) Let \( h = \lambda x . x x \).
(3) Let \( k = \lambda x . t(h(x)) \). 
(4) We claim that \( s = h(k) \) does the job.
(5) By (2) and (4) we have \( s = h(k) = k k \).
(6) But by (3) we also have \( k k = t(h(k)) = t s \), so we are done.

Note that \( k = \lambda x . t ( x x ) \). Hence \( s = k k = ( \lambda x . t ( x x ) ) ( \lambda x . t ( x x ) )    \) 

Therefore, the combination of \( h \) and \( k \) namely 
\[ \phi(f) = ( \lambda x . f ( x x ) ) ( \lambda x . f ( x x ) )  \]
in fact gives a fixed point operator: a functional \( \phi \) that calculates the fixed point of any given function \( f \).