My dime-store understanding of measure theory and its history

I’m really enjoying Thomas Hawkins’s Lebesgue’s Theory of Integration: Its Origins and Development. It’s a historical treatment of where measure theory, and the modern theory of integration (in the calculus sense) came from. I’m coming at this without knowing much of the mathematics, apart from a general outline. That makes some of the reading unclear, but I’m getting it.

The basic thrust seems to start with Fourier, or maybe there is a parallel track starting with Cauchy and Riemann. Fourier comes up with the idea of representing a function as an infinite sum of sines and cosines, which immediately brings out a bunch of mathematical puzzles. In particular, when are you allowed to integrate a Fourier series term by term? That is, when is the integral of the sum equal to the sum of the integrals? While this may not seem like a practical question, it very much is. I can testify to this in my limited capacity as an amateur mathematician: you want to be able to perform operations on symbols without thinking terribly hard about it. It would be nice if you could just say “the integral of the sum is the sum of the integrals” without thinking. And, long story short, it turns out that you can say that (or so I gather) if you’re talking about an integral in the sense of Lebesgue rather than an integral in the sense of Riemann.

It takes a while to get there, though. And when Riemann introduces his definition of the integral, which is applicable to a wide swath of functions, many (all?) mathematicians believed that the integral concept had reached its “outermost limits” (to quote Paul du Bois-Reymond). It took half a century and more of mathematicians studying the structure of the real numbers, teasing out the fine distinctions between different subtle classes of real numbers, before we arrived at a theory of integration that handled all of these cases correctly. Now we can talk coherently about the integral of a function which takes value 1 for every rational number and takes value 0 for every irrational number.

Tracing the path from Riemann to Lebesgue is fascinating, for at least a couple reasons. First, I think it conflicts with an idealized picture of mathematicians carefully progressing from one obviously true statement to another via the ineluctable laws of logic. As Hawkins writes of Hermann Hankel’s purported proof that a nowhere-dense set can be covered by sets of zero content, “Here Hankel’s actual understanding — as opposed to his formal definition — of a ‘scattered’ set becomes more evident.” For decades, mathematicians didn’t have a stock of counterexamples ready to hand. A modern book like Counterexamples In Analysis makes these available: functions that are continuous everywhere but differentiable nowhere, a nowhere-dense set with positive measure, etc. The theorems come from somewhere, and it seems like they come from mathematicians’ intuition for the objects they’re dealing with. If the only examples that you’ve dealt with share a certain look and feel, perhaps it’s unavoidable that your mental picture will differ from what logic alone would tell you.

Second, Hawkins’s book puts Georg Cantor’s work in greater perspective, at least for me. This business about finding the conditions under which Fourier series can be integrated term-by-term is a fundamentally useful pursuit, and Cantor’s work involved constructing interesting counterexamples of bizarre sets with weird properties. Cantor’s work is often presented as fundamentally metaphysical in nature; his diagonalization argument is used to prove, e.g., Gödel’s incompleteness theorem. It’s rarely presented as part of a program to make mathematicians’ lives easier.

Perhaps Hawkins gets here (I’m only a fraction of the way into his fascinating book), but I wonder what the experience of developing these counterexamples did to later mathematical practice. Did it make future mathematicians in some sense hew more closely to the words in their definitions, under the theory that words are a surer guide to the truth than intuition? Or is that not how it works? If the definitions don’t match your intuition, perhaps you need to pick different definitions. After all, the definitions are tools for human use; you’re not plugging your Platonic bandsaw into a Platonic power outlet to help you construct a Platonic chest of drawers. If the tool doesn’t fit in the hand that’s using it, it’s not much of a tool.

I hope that’s how Lebesgue integrals end up working, as the story unfolds: the definitions function as you’d expect them to, so you can use them freely without having to preface every assertion with a pile of assumptions.

What I don’t know — what my dilettante’s understanding of integration thus far hasn’t totally answered — is whether Lebesgue integrals are really, truly, the “outermost limits” of the integral concept. I understand that the following is how modern measure theory works. We start with some set — let’s say the set of all infinite sequences of coin tosses, where a coin toss can — by definition — only result in heads or tails. Then we choose some collection of subsets of that set to which we’re allowed to attach meaningful ‘measure’ (think ‘weight’ or ‘length’ or ‘volume’ or ‘probability’). Maybe we allow ourselves to consider only finite sequences of coin tosses, for instance. Talking about the probability of an infinite sequence of coin tosses would be, under this thought experiment, literally impossible: the system would assign the words no meaning. Finally, we attach a rule for the assignment of probabilities; maybe we say that any sequence of [math: n] coin tosses has the same probability as any other sequence of [math: n] coin tosses; this “equiprobability” assumption is how we typically model fair coin tosses.

These together — the set, the collection of admissible subsets, and the measure attached to each admissible subset — constitute a measure space, or, in a particular context, a “probability triple”. (When we’re talking about probabilities rather than more general measures, the probability of the set — the probability that something happens — must equal 1.)

Now, why would we pick a collection of subsets? Why not just stipulate that we can meaningfully attach a measure to every subset of the set? It turns out that this is in general impossible, which I find fascinating; see the Vitali set for an example. I don’t know at the moment whether non-measurable subsets arise from countable sets (e.g., our infinite sequence of coin tosses, above), or whether they can only arise from uncountable sets. In any case, the upshot is that you always have to specify a set, a collection of admissible subsets, and a measure that you’ll attach to each subset.

There are several directions that you can go from here. One is to restrict your collection of subsets such that all of them are measurable; this is how you end up with Borel sets, or more generally how you end up with σ-algebras. And that’s where I’m curious: can we show that there is no more useful way to define an integral than to define a σ-algebra of subsets on the set we care about, then define the Lebesgue measure on that σ-algebra? Do σ-algebras leave out any subsets that are obviously interesting? Is there some measure more general than the Lebesgue measure, which will fit more naturally into the mathematician’s hand? Or can we prove that the Lebesgue measure is where we can stop?

In order to make statements about integrals of all kinds, we’d need to define what an integral in general is, such that the Riemann integral and the Lebesgue integral are special cases of this general notion. I gather that the very definition of “measure” is that general notion of integral. A measure is a function that takes a subset of our parent set and attaches some weight to it, such that certain intuitive ideas apply to it: a measure is non-negative (i.e., the weight of an object, by definition, cannot be less than zero); the measure of the empty set must be zero (the weight of nothing is zero); and the measure of distinct objects, taken together, must be the sum of the measures of the objects, measured separately. We call this last axiom the “additivity axiom.” You can add other axioms that a measure should intuitively satisfy, such as translation-invariance: taking an object and moving it shouldn’t change its measure.

The additivity axiom introduces some problems, because infinity is weird. Do we use the weaker axiom that the measure of the sum of two objects must be the sum of the measures of the two? Or do we use the stronger one that the measure of a countable infinity of objects, taken together, must equal the countable sum of the measures of each object? These alternatives are described, respectively, as “finite additivity” and “countable additivity”. One reason to pick finite additivity is that finiteness is, in general, easier to reason about, and has fewer bizarre gotchas. But finite additivity is also not as far-reaching as what we need. You can’t reach infinity by a progression of finite steps, so finite additivity doesn’t allow you to talk about, say, the probability that a limit of some infinite sequence is thus-and-such; without that ability, you can’t prove theorems like the strong law of large numbers. (I’m pretty sure you can prove the weak law using only finite additivity.)

So that would seem to be one answer to the question of whether Lebesgue integrals are the be-all and end-all of the idea of an integral: it depends upon how sure you want to be in your axioms. If you’re willing to introduce all the weirdness of infinity, then go ahead and use countable additivity. And it’s probably the case that there are intuitively true statements to which most everyone would agree, which can only be proved if you admit countable additivity.

The idea of a non-measurable set also rests on the Axiom of Choice. (I can’t prove it, but I imagine that — like so many things — the existence of a non-measurable set is equivalent to the Axiom of Choice.) So if you reject the Axiom of Choice — which Cohen and Gödel’s proofs allow you to do, free of charge — you could make all your sets measurable. But presumably there are good, useful reasons to keep the Axiom of Choice.

So maybe — and I don’t know this, but it sounds right, and maybe Hawkins will eventually get there — we arrive at the final fork in the road, from which there are a few equally good paths to follow through measure theory. We can toss out the Axiom of Choice and thereby allow ourselves to measure all sets; we could replace countable additivity with finite additivity and accept a weaker, but perhaps more intuitive, measure theory that doesn’t use the Axiom of Choice at all; or we could go with what we’ve got. In any case, the search for the One Final Notion Of Integration would probably be the same: keep looking for counterexamples that prove that our axioms need reworking. That will probably always mean looking for obviously true statements that any sound measure theory ought to be able to prove true, and obviously false statements that any sound measure theory ought to be able to prove false. The ultimate judge of what’s “obviously true” and “obviously false” is the mathematician’s. A similar approach would be to come up with a system of axioms from which all the statements that we accept as true today can still be derived, but from which, in addition, we can derive other, interesting theorems. Again, the definition of ‘interesting’ will rest with the mathematician; some interesting results will just be logical curiosities, whereas others will prove immediately useful in physics, probability, etc.

Phew. This has been my brain-dump about what I know of measure theory, while I work through a fascinating history of the subject. Thank you for listening.

Steve Reads