Measure Theory

Motivations #

Consider the function $f:[0,1]\to \mathbb{R}$ that takes all the irrational numbers to one and all the rational numbers to zero, ie

\[f(x) = \begin{cases}0 & x\in \mathbb{Q} \\ 1 & \text{otherwise}\end{cases}\]

If you try to graph this function in the plane, it looks like an equals sign. There’s a solid line of unit length going from (0,0) to (1,0) and another going from (0,1) to (1,1). Nevertheless, we know that these two lines aren’t really equally “solid” or equally “dense.” There’s a break in the upper line at every rational $x$ and a break in the lower line at every irrational $x,$ which means that the lower line is broken almost everywhere in [0,1]. It contains only countably many points whereas the upper line contains uncountably many. So intuitively, it’s clear that the area under the graph of $f$ should be one.

Since we like to think of the integral of a function as the area under its graph, we might therefore hope that \[\int_0^1 f(x)\, dx =^? 1,\] but it turns out that the Riemann integral doesn’t vindicate our hope. Why not? It’s because the Riemann integral of $f$ doesn’t even exist.(1) (1)I’m going to use the notation from chapter 13 of Michael Spivak’s Calculus, which was my first introduction to integration. Given any partition $P$ of [0,1], density of both the rationals and the irrationals guarantees that $f$ takes on the values 0 and 1 in every one of the subintervals defined by $P.$ Hence \[L(f,P) = 0\; \text{and}\; U(f,P) = 1,\] and there’s no way for the infimum of the upper sums to equal the supremum of the lower sums. As far as the Riemannian theory is concerned, $f$ simply isn’t integrable.

This is an obvious glitch in the theory of integration, and one of the main goals of measure theory is to repair it. Lebesgue integration’s first big insight is that slicing the equals sign into vertical cross sections is the wrong way to go about calculating its area. The vertical cross sections are just squashed versions of the full graph, so if we don’t know how to assign an area to $f$ over the whole interval, we won’t know how to give an area to its restrictions down to subintervals either. Instead, we need to take horizontal cross sections. We take each value in the rage of $f,$ multiply by the “length” of its preimage, and add them all up to get the Lebesgue integral of $f,$ denoted(2) (2)Schilling’s notation \[\int_{[0,1]}f\,d\lambda = 1\lambda\big([0,1]\setminus \mathbb{Q}\big) + 0 \lambda \big([0,1]\cap \mathbb{Q}\big).\] All that’s left now is to define a function $\lambda$ called a measure that takes a subset of $\mathbb{R}$ and tells us what its length should be.

Another motivation for measure theory comes out of probability. Suppose that you want to assign probabilities to events in infinite sample spaces. For example, you might want to find the probability that a dart thrown at random at the interval $[0,1]$ will land in $[1/3,2/3]$ . Obviously the probability should be $1/3$ , but the principle of indifference that we would rely on to deliver this result in a finite context breaks here.(3) (3)I mean Laplace’s finite P of I, not the slightly more sophisticated continuous version that still runs into paradox. The cardinality of the unit interval is $\mathfrak C$ , and all points in the interval are equally likely to get hit by the dart, so each one should be assigned probability $1/\mathfrak C\,$ ? And since $[1/3,2/3]$ also contains a continuum of points, the probability of hitting the subinterval is… $\mathfrak C/\mathfrak C\,$ ? Clearly something has gone wrong here. The solution is to give up on associating the probability of a set with the number of points it contains and instead associate the probability with the size of the set. We want to say that for some measure $\lambda$ , \[P(\text{dart in the interval})=\frac{\lambda \left([1/3,2/3]\right)}{\lambda \left([1,0]\right)} = 1/3.\]

Why can’t we measure every set? #

Let’s stick for a moment with the task of defining a measure $\lambda$ on $\R$ . Our measure is going to take a subset of $\R$ and map it to a real number that represents, roughly, the subset’s length. Question—can the domain of $\lambda$ be all of $\mathcal{P}(\mathbb{R})$ ? That is, can we consistently assign a length to every subset of the real line? The answer turns out to be no, at least not if $\lambda$ is to respect some basic intuitions about length.

We should be able to agree on the following. (1) Any reasonable measure has to be translationally invariant. If I take $X\subset\R$ in the domain of $\lambda$ and shift it by a constant, that had better not change its measure. After all, an object (usually) doesn’t get longer or shorter when I translate it in space. (2) The measure of a countable union of disjoint measurable sets should be the sum of the sets’ measures. This requirement is highly plausible at least in the finite case. (3) The measure of a closed interval $[a,b]$ should be $b-a$ . Clearly.

But now consider a set $V\subseteq [0,1]$ that contains exactly one point from each coset of $\mathbb{Q}$ in $\R$ .(4) (4)Such a set is called Vitali set. The rest of this section is a summary of a proof by James Belk. If you translate the set $V$ by a rational number $q_1$ to get $q_1 + V$ , it can never intersect with $q_2 + V$ for any distinct rational $q_2$ , essentially because the rationals are closed under addition. Further, every element of $[0,1]$ lands in some set of the form $q + V$ for $q\in [-1,1]\cap \mathbb{Q}$ . But then we must have

\[[0,1]\subseteq \bigsqcup_{q\in [-1,1]\cap \mathbb{Q}} q+V\subseteq [-1,2],\] where the right containment holds because you can’t add a number in $[0,1]$ to a number in $[-1,1]$ and get more than two or less than minus one. An immediate consequence of these containment relations is that the thing in the middle—the union of all the $q+V$ —needs to have length between one and three. But notice that the sets we’re unioning together are all translations of $V,$ so by assumptions (1) and (2), you must get something between one and three when you add $\lambda(V)$ to itself infinitely many times. No real number has this property. Conclusion—the set $V$ cannot be measurable.

Questions #

What exactly is measure theory good for? I recently asked one of my physics professors whether I should learn measure theory, and his view was that it’s nice to have but seldom useful. Most integrals physicists encounter in practice are either classically integrable or else completely intractable, so Lebesgue integration doesn’t unlock much.

Reading List #

Schilling, Measures, Integrals, and Martingales is excellent so far, but light on motivation.
Pollard, A User’s Guide to Measure Theoretic Probability. Quirky notation. Magnificently clear.
Hanson, “Any Function I Can Actually Write Down is Measurable, Right?” Bizarrely, wrong.
Kapinski and Kopp, Measure, Integral, and Probability. Not recommended.
James Belk’s measure theory notes

Last updated 6 February 2025