Proof Theory Abridged: 2013

Friday, 16 August 2013

Kripke models and the dialectica interpretation

Few people know that Girard's PhD dissertation contained some results about Gödel's dialectica interpretation. As a fan of both Girard and the dialectica interpretation I feel I have to explain this rather unknown bit of Girard's work here.

The result I mention is in part IV of his dissertation, and was published in [1]. It is an interesting connection between validity in a certain Kripke structure and validity with respect to the dialectica interpretation. Not only is the result interesting, but the proof as well is quite clever, involving an unexpected use of the fan theorem in the form of "any open convering of a compact set has a finite sub-covering".

More precisely, let any quantifier-free extension $ F $ of system T which is consistent and complete (i.e. for any term $ t $ of ground type $ F $ proves exactly one of either $ t = 0 $ or $ t \neq 0 $) be called a functional system. Consider the Kripke structure where worlds are functional systems, with the pre-order on the worlds being the relation "subsystem of". Girard's main result is that a formula $ A $ is valid at a node $ F $ if and only if there exists a closed term $ t \in F $ such that $ F \vdash A_D(t, y) $, where $ \exists x \forall y A_D(x, y) $ is the dialectica interpretation of $ A $. The most difficult part of the proof is to show that if in any extension $ F' $ of $ F $ and any closed term $ a \in F' $ there exists a closed term $ b \in F' $ such that $ F' \vdash A(a, b) $, then it must be the case already that in $ F $ we have a function $ c $ such that $ F \vdash A(x, c x) $. We sketch the proof of this here.

Theorem. Suppose that for all functional systems $ F' \supset F $ and closed term $ a \in F' $ of type $ \rho $ there exists a closed term $ b \in F' $ of type $ \tau $ such that $ F' \vdash A(a, b) $. Then, there exists a closed term $ c \in F $ of type $ \rho \to \tau $ such that $ F \vdash A(x, c x) $.

Note that without the assumption of completeness of functional systems the result would be easy. Just take $ F' $ to be $ F $ with a single new constant $ a $. Then, by the assumption there would be a closed term $ b $ in the language of $ F $ plus the new constant. So, it would have to be $ t[a] $ for some $ t[x] \in F $. We could then take $ c = \lambda x . t[x] $. The problem is that we can only consider extensions $ F' $ of $ F $ which are functional systems, i.e. consistent and complete.

The proof goes as follows:

I. Consider an enumeration $ u_n[x^\rho] $ of all the formulas of $ F $ with a single free-variable of type $ \rho $. Let $ a $ be a new constant of type $ \rho $.

II. For each partial function $ p \colon \mathbb{N} \to 2 $ define $ F_p $ as $ F $ plus the new constant $ a $ and axioms $$ u_n[a] = \top $$ when $ p(n) $ is defined and $ p(n) = \top $ and $$ u_n[a] = \bot $$ when $ p(n) $ is defined and $ p(n) = \bot $. Note that $ F_p $ might not be a functional system.

III. Let us argue that for all total $ f $ there exists a closed term $ c_f \in F $ such that $ F_f \vdash A(a, c_f a) $. Because $ f $ is total, $ F_f $ is complete. But it is either consistent or inconsistent. If it is inconsistent then we can take $ c_f $ to be anything we wish. On the other hand, if $ F_f $ is consistent then it is a functional system which extends $ F $. By the assumption of the theorem we have a closed term $ b \in F_f $ such that $ F_f \vdash A(a, b) $. But $ b $ must be of the form $ t[a] $ for some $ t[x] \in F $. Hence, take $ c_f = \lambda x . t[x] $.

IV. Let $ U_A = \{ p \colon [n] \to 2 \; | \; \exists c_p \in F (F_p \vdash A(a, c_p a)) \} $ and $ O_p = \{ f \; | \; f \colon \mathbb{N} \to 2 \wedge p \prec f \} $. That is, $ U_A $ contains all the partial functions $ p $ for which III above already holds. And $ O_p $ is the open set of the Baire space defined by the partial function $ p $.

V. We have that $ 2^\mathbb{N} = \bigcup_{p \in U_A} O_p $. This follows because for any total $ f \in 2^\mathbb{N}$ we can see that in III we only use finitely many axioms of $ F_f $ to prove $ A(a, c_f a) $. So, for such finite approximation $ p $ the result also holds and $ f \in O_p $.

VI. Since $ 2^\mathbb{N} $ is compact, a finite cover exists, i.e. $ 2^\mathbb{N} = O_{p_0} \cup \ldots \cup O_{p_n} $.

VII. It is easy to see that the extra axioms of $ F_{p_i} $ can be equivalently written as a single formula $ P_i[a] $ such that $ F_{p_i} = F_{\emptyset} + P_i[a] $, where $ \emptyset $ denotes the everywhere undefined function.

VIII. Because we had a covering of $ 2^\mathbb{N} $ we have $$ F_{\emptyset} \vdash P_0[a] \vee \ldots \vee P_n[a]. $$

IX. Moreover, $$ F_{\emptyset} \vdash P_i[a] \to A(a, c_{p_i}(a)). $$

X. Finally, this means we have $ F \vdash P_0[x] \vee \ldots \vee P_n[x] $ and $ F \vdash P_i[x] \to A(x, c_{p_i}(x)) $. The desired function $ c $ can be defined by cases as $ c x = c_{p_i} x $ for the least $ i \leq n $ such that $ P_i[x] $ holds.

[1] Jean-Yves Girard, Functional interpretations and Kripke models. Logic and Foundations of Mathematics and Computability Theory. The University of Western Ontario Series in Philosophy of Science, volume 9, 33-57, 1977.

Monday, 4 March 2013

Goodman's theorem

Let HA denote Heyting (intuitionistic) arithmetic and HA$^\omega$ be Heyting arithmetic extended to the language of finite types, i.e. with quantifiers for each finite type. AC stands for the axiom of choice. Finally, recall that a formula is arithmetical if it only contains quantifications over numbers.

In [1,2] Goodman proved the following amazing result:

Theorem. If HA$^\omega$ + AC proves an arithmetical formula $ A $ then HA already proves $ A $.

In proof theory one would simply say that HA$^\omega$ + AC is conservative over HA. The fact that HA$^\omega$ is conservative over HA was already known and is easy to show. But the conservation of HA$^\omega$ + AC over HA is quite a surprising result.

Recall that adding the axiom of choice to classical mathematics leads to all sorts of strange things (e.g. the Banach-Tarski paradox). Goodman's theorem essentially says that this is not the fault of the axiom of choice, but rather a fault of the combination of the axiom of choice with classical logic. Classical logic on its own makes perfect sense. The axiom of choice in an intuitionistic setting is harmless. But when these two are put together all hell breaks loose.

Proof of Goodman's theorem. Goodman's proof involves two major proof-theoretic techniques: forcing and realizability. Realizability is used to eliminate the axiom of choice, since the axiom of choice has a trivial realizer. But then one ends up with a proof of "$t$ realizes $A$". Forcing is used to recover the truth of $ A $ given that $ A $ has a realizer. That is done by choosing the forcing conditions to be approximations of the Skolem functions of the sub-formulas of $ A $. So, the axiom of choice is replaced by finite approximations of Skolem functions! Let's see a few of the details:

We present here a sketch of Beeson's proof [3] of Goodman's theorem. Beeson's proof has the advantage (over Goodman's proof) that the two techniques of forcing and realizability are clearly separated. The proof consists of the following steps:

(1) Let $ A $ be an arithmetical formula such that HA$^\omega$ + AC $ \vdash A $.

(2) Let HA$^\omega_a $ denote the extension of HA$^\omega$ with a new function symbol $ a $. By the soundness of Kleene realizability relative to $ a $ we have HA$^\omega_a \vdash t $ realizes $ A $, where $ t $ is a term of HA$^\omega_a $.

(3) By the soundness of the forcing interpretation we have HA$^\omega \vdash \exists p (p \Vdash t $ realizes $ A) $, where the forcing conditions are chosen as in the main lemma (below). Forcing is done as usual, with the crucial difference that in the forcing of the atomic formulas the forcing condition $ p $ replaces the "generic" function $ a $, i.e. $ p \Vdash a(n) = m $ is defined as $ p(n) = m $, where I'm actually using the equality symbol "=" for partial equality.

(4) By the main lemma HA$^\omega \vdash \forall p \exists q \leq p (q \Vdash (t $ realizes $ A) \Rightarrow A) $.

(5) By (3) and (4) we have HA$^\omega \vdash \exists q (q \Vdash A) $.

(6) As for arithmetical formula HA$^\omega \vdash (q \Vdash A) \Leftrightarrow A $ we have, HA$^\omega \vdash A $.

Steps (2) and (3) are pretty standard realizability and forcing interpretations. And only the soundness theorems of these are used. Step (6) is also easy to check by a simple induction on $ A $. So, the crucial bit of the proof is the choice of the forcing conditions and the main lemma which we discuss next.

Main lemma. Fix an arithmetical sentence $ A $. Then there is a set $ C $ of forcing conditions such that $$ \mbox{HA}^\omega \vdash \forall p \exists q \leq p (q \Vdash (t \mbox{ realizes } A) \Rightarrow A). $$
Proof of main lemma. The set of forcing conditions $ C $ consists of finite functions $ p $ which are approximations to the Skolem functions of all sub-formulas of $ A $. More precisely, these are partial functions $ p $ with finite domain such that for each sub-formula $ B(x, y) $ of $ A $ we have $$ \exists x B(x, y) \wedge (p_{\exists x B(x, y)}(y) \mbox{ defined }) \Rightarrow B(p_{\exists x B(x, y)}(y), y).$$ Hence, whenever the approximation to the Skolem function $ p_{\exists x B(x, y)} $ is defined then it produces the right witness. Relative to these (approximation of) Skolem functions, it's easy to show the following:

(i) HA$^\omega \vdash \forall p \exists q \leq p (q \Vdash (B(y) \Rightarrow \{j_B\}(y) \mbox{ realizes } B )) $, for some index $ j_B $ which we can construct (using the Skolem functions).

(ii) HA$^\omega \vdash \forall p \exists q \leq p (q \Vdash (t \mbox{ realizes } B ) \Rightarrow B) $.

Remark. Ulrich Kohlenbach [4] has shown the interesting fact that Goodman's theorem does not hold for fragments of HA$^\omega$. This means that in order to eliminate AC from the proof of an arithmetical formula $ A $ we might have to use a more complex induction than in the proof which is allowed to use AC. Thierry Coquand [5] has just published another proof of Goodman's theorem.

[1] Goodman, N., The theory of the Gödel functionals, Journal of Symbolic Logic 41, 574-583 (1976)

[2] Goodman, N. Relativized realizability in intuitionistic arithmetic of all finite types. Journal of Symbolic Logic 43, pp. 23-44 (1978)

[3] Beeson, M., Goodman's theorem and beyond. Pacific J. Math. 84, 1-16 (1979)

[4] Kohlenbach, U., A note on Goodman's theorem. Studia Logica 63, 1-5 (1999)

[5] Coquand, T., About Goodman's theorem. Annals of Pure and Applied Logic 164(4), 437-442 (2013)

Wednesday, 30 January 2013

Independence of the continuum hypothesis

Following up from previous posts, let us continue on the theme of independence proofs, i.e. showing that a certain statement is not provable in a given formal system. The most celebrated of these results is of course Cohen's proof that the continuum hypothesis (CH) is not provable in set theory. For that Cohen developed the new method of "forcing". A few years after Cohen's result, Dana Scott come up with a brilliant proof of the independence of the CH that avoids forcing, but uses what are now called boolean valued-models. I want to give a brief sketch of Scott's proof, but one should definitely read the extremely well-writen paper [1] for all the details.

The first nice thing about Scott's paper is that it starts by giving a completely natural formulation of CH. CH is often stated using cardinals as the equation $$2^{\aleph_0} = \aleph_1$$ so it's easy to dismiss it as abstract non-sense. But it can also be rephrase in concrete terms as: Given any set of reals, either the set of integers can be mapped onto the set, or the set can be mapped onto the whole set of reals. So, the question is: Is there a set of reals whose cardinality lies strictly between the cardinalities of natural numbers and the whole set of reals?

Formally this can be written as $$ \forall X ( \exists f^{\mathbb{N} \rightarrow X} (\mbox{Img}(f) = X) \vee \exists g^{X \to \mathbb{R}} (\mbox{Img}(g) = \mathbb{R}) ). $$ I'm writing $ x^X $ as an abbreviation for $ x \in X $.

Scott then builds a model of set theory (a version slightly different than the usual presentation of the Zermelo-Frankel set theory) in which CH is false.

The first novelty is that the construction of the model is parametrised by a probability space $ (\Omega, \cal{A}, P) $. The real numbers are interpreted as random variables, i.e. in the constructed model the set of real number $ \cal{R} $ consists of all $ \eta \colon \Omega \to \mathbb{R} $ such that $$ \{\omega \in \Omega \; \colon \; \eta(\omega) \leq r \} \in \cal{A} $$ where $ \mathbb{R} $ is the "real" set of real numbers. The whole point is to make the set of reals in the model $ \cal{R} $ much much bigger than the actual set of real numbers $ \mathbb{R} $.

The second novelty is that formulas are no longer either true or false, but they have a real (in the sense of $ \mathbb{R} $) value between zero and one (saying how true it is, with one meaning "very" true). So, equality between two "real numbers" $ \eta \colon \Omega \to \mathbb{R} $ and $ \zeta \colon \Omega \to \mathbb{R} $ is given as $$ val(\eta = \zeta) = \{\omega \in \Omega \; \colon \; \eta(\omega) = \zeta(\omega) \} / [P = 0] $$ where one has to quotient out the set of events which have $P$-measure zero. This valuation function can be extended to all formulas, and the main result is that if a formula $ A $ is provable in set theory then $ val(A) = 1 $, independent of the probability space we started with.

Hence, in order to show that CH is not provable, it's enough to find some probability space $ (\Omega, \cal{A}, P) $ such that in that space $val($CH$) < 1$. Scott fixes $ I $ an arbitrary set of cardinality greater than $ 2^{\aleph_0} $, and then takes $ \Omega = [0,1]^I $. Because $ I $ is huge, the set of sample points $ \Omega $ will also be huge, which means the set of reals $ \cal{R} $ in the model has an immense cardinality.

So, now we need to build a set $ X \subseteq \cal{R}$ such that neigher $ \exists f^{\mathbb{N} \rightarrow X} (\mbox{Img}(f) = X) $ nor $ \exists g^{X \to \cal{R}} (\mbox{Img}(g) = {\cal R}) $. One can check that the projection $ \xi_i \colon \Omega \to \mathbb{R} $ are indeed random variables, and hence are real numbers in the model (i.e. belong to $\cal{R}$). Pick a subset $ J \subset I $ which is uncountable but still of cardinality strictly smaller than $ I $. Our set of reals $ X \subset \cal{R}$ will be (informally) $$ X = \{ \xi_j \; \colon \; j \in J \} $$ which one can show that is not countable but has (inside the model) cardinality smaller than the reals $ \cal{R} $. Neat!

Anyway, I hope Scott will forgive me for such a sketchy account of his paper, but I hope this will serve as an advertisement for the curious ones to go and have a closer look at all the details in [1].

[1] Dana Scott, A proof of the independence of the continuum hypothesis, Mathematical systems theory, 1(2):89-111, 1967.

Wednesday, 16 January 2013

Disjunction property implies existence property

A theory $ T $ is said to have the disjunction property if whenever a sentence $ A \vee B $ is provable then either $ A $ is provable or $ B $ is provable. It is important that $ A \vee B $ is a sentence, i.e. no free variables, otherwise a simple counter-example would be $ A(x) = {\sf Even}(x) $ and $ B(x) = {\sf Odd}(x) $.

A theory $ T $ is said to have the numerical existence property if whenever a sentence $ \exists n A(n) $ is provable then $ A(n) $ is provable for some numeral $ n $.

Clearly if a theory has the numerical existence property it must also have the disjunction property, as, with a bit of arithmetic, one can write a disjunction $ A \vee B $ using an existential quantifier as $ \exists n ( ( n = 0 \rightarrow A ) \wedge ( n \neq 0 \rightarrow B ) ) $.

Harvey Friedman proved in [1] the amazing result that these two properties are in fact equivalent.

Theorem. Any recursively enumerable extension of intuitionistic arithmetic which has the disjunction property must also have the numerical existence property.

Let us write $ \vdash_y x $ for the binary predicate "$ y $ is the code of a proof of the formula coded by $ x $". If $ k $ is the code of the formula $ A(x) $ then we write $ Sub(k) $ for the code of the formula $ A(k) $. The code of a formula $ A $ is sometimes written as $ \# A $. Finally, we write $ \neg x $ for $ \# \neg A $ if $ x = \# A $.

Let $ A(x) $ be the formula $$ \exists y ( ( (\vdash_y \neg x) \vee P(y) ) \wedge \forall z < y \not\vdash_z x) $$ which roughly says that "there exists a number $ y $ such that either $ y $ is a proof of $ \neg x $ or $ P(y) $ holds, and moreover, no proof of $ x $ with code shorter than $ y $ exists.

Let $ k $ be a numeral such that $ \# A(Sub(k)) = Sub(k) $, e.g. $ k = \# A(Sub(x)) $. Note that $ Sub(k) $ behaves like a fixed point of $ A(x) $ essentially for the same reasons as in my first post here on Kleene's recursion theorem. We are just replacing the self application of a function $ x (x) $ by the "self-substitution" $ Sub(x) $.

The main trick in Friedman's proof is the following:

Lemma 1. If $ T \vdash A(Sub(k)) $ then $ T \vdash P(n) $, for some $ n $.

Proof. Let us see how such witness $ n $ comes naturally from a given proof of $ A(Sub(k)) $. Let $ n $ be the code of a proof of $ A(Sub(k)) $. Then $ \vdash_n Sub(k) $. By the definition of $ A $ we also have that $ T $ proves $$ \exists y ( ( (\vdash_y \neg Sub(k)) \vee P(y) ) \wedge \forall z < y \not\vdash_z Sub(k)). $$ Hence, taking $ z = n $ it follows that $ T $ proves $$ \exists y \leq n ((\vdash_y \neg Sub(k)) \vee P(y) ). $$ Obviously we can't have $ \exists y \leq n (\vdash_y \neg Sub(k)) $, or $ T $ would be inconsistent. So, we must have that $ T $ proves $ \exists y \leq n P(y) $. Because $ n $ is a numeral, this can be written as a big disjunction, and by the disjunction property $ P(m) $ holds for some $ m \leq n $. QED

So, if we are able to prove $ A(Sub(k)) $ then essentially the "size" of such proof would be a bound on some $ n $ which witnesses $ P $. It remains to show that if $ \exists y P(y) $ is provable then so is $ A(Sub(k)) $. One must first show that $ \neg A(Sub(k)) $ is not provable.

Lemma 2. If $ T $ proves $ \neg A(Sub(k)) $ then it also proves $ A(Sub(k)) $.

Proof. Assume $ T \vdash \neg A(Sub(k)) $. Then we have that some $ n $ codes a proof of $ \neg A(Sub(k)) $, i.e. $ \vdash_n \neg A(Sub(k)) $. We consider two cases: Either $ \forall z < n \not\vdash_z Sub(k) $, in which case $ A(Sub(k)) $ holds (a contradiction to the consistence of $ T $ ), or $ \exists z < n \vdash_z Sub(k) $, which also implies that $ A(Sub(k)) $ is provable. QED

Note that we might as well assume that $ T $ is consistent, just because it is more sensible to work with consistent theories. Of course if $ T $ is inconsistent then it proves everything and in particular will have both the disjunction and the existence property. So, assuming $ T $ is consistent Lemma 2 says that $ \neg A(Sub(k)) $ is not provable.

The missing piece of the puzzle is the following:

Lemma 3. $ T $ proves $ \exists y P(y) \rightarrow A(Sub(k)) \vee \exists z (\vdash_z Sub(k)) $.

Proof. Let $ y $ be such that $ P(y) $. Consider two cases: Either $ \forall z < y (\not\vdash_z Sub(k)) $, in which case we have $ A(Sub(k)) $; or $ \exists z < y (\vdash_z Sub(k)) $. QED

Note that if $ P(y) $ is primitive recursive we can in fact conclude that $ T $ proves $ \exists y P(y) \rightarrow A(Sub(k)) $, as follows. Assume that $ y $ is the least such that $ (\vdash_y \neg Sub(k)) \vee P(y) $ holds. That means $$ ((\vdash_y \neg Sub(k)) \vee P(y)) \wedge \forall z < y ((\not\vdash_z \neg Sub(k)) \wedge \neg P(z)) $$ which implies $ A(Sub(k)) $.

This already gives us the result for primitive recursive $ P $'s:

Lemma 4. If $ P $ is primitive recursive and $ T \vdash \exists y P(y) $ then $ T \vdash P(n) $ for some $ n $.

Proof. By the preceding remark we have $ T \vdash A(Sub(k)) $, which by Lemma 1 gives the desired witness $ n $. QED

Finally all can be put together:

Proof of Theorem. Assume $ T $ proves $ \exists y A(y) $. By Lemma 3, $ T $ proves $ A(Sub(k)) \vee \exists z (\vdash_z Sub(k)) $. By the disjunction property we have to cases: (1) $ T $ proves $ A(Sub(k)) $, which by Lemma 1 implies $ T \vdash P(n) $ for some $ n $; or (2) $ T \vdash \exists z (\vdash_z Sub(k)) $. By Lemma 4 we have a witness $ n $ such that $ \vdash_n Sub(k) $, which is in fact a code for a proof of $ A(Sub(k)) $, and the same argument as in case (1) can be followed.

A short reflection on the proof: The proof essentially shows that given a concrete proof of $ \exists y P(y) $ one can "calculate" a bound $ n $ on $ y $ simply using the algorithm for "calculating" the disjunction property. More precisely, let $ f_0, f_1 $ be functions such that $$ \vdash_\pi A \vee B \implies \vdash_{f_0 \pi} A \mbox{ or } \vdash_{f_1 \pi} B. $$ Note that Lemma 1 says $$ \vdash_n A(Sub(k)) \implies \vdash \exists y \leq n \, P(y). $$ Hence, in order to establish the relation between the size of the given proof $ \pi $ of $ \exists y A(y) $ and the final bound $ n $ we must simply track the impact of $ \pi $ on the size of the proof of $ A(Sub(k)) $. So from the proof we can read off a primitive recursive functional $ \phi $ such that if $ \pi $ is the code of a proof of $ \exists n P(n) $ then $ \exists n \leq \phi(f_0, f_1, \pi) P(n) $ is also provable.

[1] Harvey Friedman, "The disjunction property implies the numerical existence property", Proc. Nat. Acad. Sci. USA, 72(8):2877-2878, 1975.