August 07, 2021

Bayesianism in Math: No Dice

Attention conservation notice: Sniping at someone else's constructive attempt to get the philosophy of mathematics to pay more attention to how mathematicians actually discover stuff, because it uses an idea that pushes my buttons. Assumes you know measure-theoretic probability without trying to explain it. Written by someone with absolutely no qualifications in philosophy, and precious few in mathematics for that matter. Largely drafted back in 2013, then laid aside. Posted now in lieu of new content.

Wolfgang points to an interesting post [archived] at "A Mind for Madness" on using Bayesianism in the philosophy of mathematics, specifically to give a posterior probability for conjectures (e.g., the Riemann conjecture) given the "evidence" of known results. Wolfgang uses this as a jumping-off point for looking at whether a Bayesian might slide around the halting problem and Gödel's theorem, or more exactly whether a Bayesian with \( N \) internal states can usefully calculate any posterior probabilities of halting for another Turing machine with \( n < N \) states. (I suspect that would fail for the same reasons my idea of using learning theory to do so fails; it's also related to work by Aryeh "Absolutely Regular" Kontorovich on finite-state estimation, and even older ideas by the late great Thomas Cover and Martin Hellman.)

My own take is different. Knowing how I feel about the idea of using Bayesianism to give probabilities to theories about the world, you can imagine that I look on the idea of giving probabilities to theorems with complete disfavor. And indeed I think it would run into insuperable trouble for purely internal, mathematical reasons.

Start with what mathematical probability is. The basics of a probability space are a carrier space \( \Omega \), a \( \sigma \)-field \( \mathcal{F} \) on \( \Omega \), and a probability measure \( P \) on \( \mathcal{F} \). The mythology is that God, or Nature, picks a point \( \omega \in \Omega \), and then what we can resolve or perceive about it is whether \( \omega \in F \), for each set \( F \in \mathcal{F} \). The probability measure \( P \) tells us, for each observable event \( F \), what fraction of draws of \( \omega \) are in \( F \). Let me emphasize that there is nothing about the Bayes/frequentist dispute involved here; this is just the structure of measure-theoretic probability, as agreed to by (almost) all parties ever since Kolmogorov laid it down in 1933 ("Andrei Nikolaevitch said it, I believe it, and that's that").

To assign probabilities to propositions like the Riemann conjecture, the points in the base space \( \omega \) would seem to have to be something like "mathematical worlds", say mathematical models of some axiomatic theory. That is, selecting an \( \omega \in \Omega \) should determine the truth or falsity of any given proposition like the fundamental theorem of algebra, the Riemann conjecture, Fermat's last theorem, etc. There would then seem to be three cases:

  1. The worlds in \( \Omega \) conform to different axioms, and so the global truth or falsity of a proposition like the Riemann conjecture is ambiguous and undetermined.
  2. All the worlds \( \Omega \) conform to the same axioms, and the conjecture, or its negation, is a theorem of those axioms. That is, it is true ( or false) in all models, no matter how the axioms are interpreted, and hence it has an unambiguous truth value.
  3. The worlds all conform to the same axioms, but the proposition of interest is true in some interpretations of the axioms and false in others. Hence the conjecture has no unambiguous truth value.
Case 0 is boring: we know that different axioms will lead to different results. Let's concentrate on cases 1 and 2. What do they say about the probability of a set like \( R = \left\{\omega: \text{Riemann conjecture is true in}\ \omega \right\} \)?
Case 1: The Conjecture Is a Theorem
Case 1 is that the conjecture (or its negation) is a theorem of the axioms. Then the conjecture must be true (or false) in every \( \omega \), so \( P(R) = 0 \) or \( P(R) = 1 \). Either way, there is nothing for a Bayesian to learn.
The only escape I can see from this has to do with the \( \sigma \)-field \( \mathcal{F} \). Presumably, in mathematics, this would be something like "everything easily deducible from the axioms and known propositions", where we would need to make "easy deduction" precise, perhaps in terms of the length of proofs. It then could happen that \( R \not\in \mathcal{F} \), i.e., the set is not a measurable event. In fact, we can deduce from Gödel that many such sets are not measurable if we take \( \mathcal{F} \) to be "is provable from the axioms", so even more must be non-measurable if we restrict ourselves to not seeing very far beyond the axioms. We could then bracket the probability of the Riemann conjecture from below, by the probability of any measurable sub-set (sub-conjecture?), and from above, by the probability of any measurable super-set. (The "inner" and "outer" measures of a set come, roughly speaking, from making those bounds as tight as possible. When they match, the set is measurable.) But even then, every measurable set has either probability 0 or probability 1, so this doesn't seem very useful.
(The poster, hilbertthm90, suggests bracketing the probability of the conjecture by getting "the most optimistic person about a conjecture to overestimate the probability and the most skeptical person to underestimate the probability", but this assumes that we can have a probability, rather than just inner and outer measures. This is also a separate question from the need to make up a number for the probability of known results if the conjecture is false. This is the problem of the catch-all or unconceived-alternative term, and it's crippling.)
Another way to get to the same place is to look carefully at what's meant by a \( \sigma \)-field. It is a collection of subsets of \( \Omega \) which is closed under repeating the Boolean operations of set theory, namely intersection, union and negation, a countable infinity of times. Anything which can be deduced from the axioms in a countable number of steps is included. This is a core part of the structure of probability theory; if you want to get rid of it, you are not talking about what we've understood by "probability" for a century, but about something else. It is true that some people would weaken this requirement from a \( \sigma \)-field to just a field which is closed under a finite number of Boolean operations, but that would still permit arbitrarily long chains of deduction from axioms. (One then goes from "countably-additive probability" to "finitely-additive probability".) That doesn't change the fact that anything which is deducible from the axioms in a finite number of steps (i.e., has a finite proof) would have measure 1.
Said yet a third way, a Bayesian agent immediately has access to all logical consequences of its observations and its prior, including in its prior any axioms it might hold. Hence to the extent that mathematics is about finding proofs, the Bayesian agent has no need to do math, it just knows mathematical truths. The Bayesian agent is thus a very, very bad formalization of a human mathematician indeed.
Case 2: The Conjecture Is Not a Theorem
In this case, the conjecture is true under some models of the axioms but false in others. We thus can get intermediate probabilities for the conjecture, \( 0 < P(R) < 1 \). Unfortunately, learning new theorems cannot change the probability that we assign to the conjecture. This is because theorems, as seen above, have probability 1, and conditioning on an event of probability 1 is the same as not conditioning at all.

There are a lot of interesting thoughts in the post about how mathematicians think, especially how they use analogies to get a sense of which conjectures are worth exploring, or feel like they are near to provable theorems. (There is also no mention of Polya: but sic transit gloria mundi.) It would be very nice to have some formalization of this, especially if the formalism was both tractable and could improve practice. But I completely fail to see how Bayesianism could do the job.

That post is based on Corfield's Towards a Philosophy of Real Mathematics, which I have not laid hands on, but which seems, judging from this review, to show more awareness of the difficulties than the post does.

Addendum, August 2021: I have since tracked down an electronic copy of Corfield's book. While he has sensible things to say about the role of conjecture, analogy and "feel" in mathematical discovery, drawing on Polya, he also straightforwardly disclaims the "logical omniscience" of the standard Bayesian agent. But he does not explain what formalism he thinks we should use to replace standard probability theory. (The terms "countably additive" and "finitely additive" do not appear in the text of the book, and I'm pretty sure "\( \sigma \)-field" doesn't either, though that's harder to search for. I might add that Corfield also does nothing to explicate the carrier space \( \Omega \).) I don't think this is because Corfield isn't sure about what the right formalism would be; I think he just doesn't appreciate how much of the usual Bayesian machinery he's proposing to discard.

Mathematics; Philosophy; Bayes, anti-Bayes

Posted at August 07, 2021 19:00 | permanent link

Three-Toed Sloth