What does measurability mean? - Biased and Inefficient

Attention conservation notice: A long, meandering, and inconclusive attempt to explain why you perhaps shouldn’t worry about a technical issue you almost certainly weren’t worrying about already.

Mathematical proofs in statistics are, in some formal sense, useless. That is, they formally have conditions such as finite moments, boundedness, differentiability or stochastic equicontinuity that either apply to all things in the real world or to none. The proofs are also often formally about infinite sequences; these don’t crop up all that often in data analysis.

Proofs are useful to the extent that the conditions say what you can ignore. That is, showing a remainder term \(r_n(x)\) is \(o_p(1)\) formally means that for every positive \(\epsilon\)
\[\lim_{n\to\infty} \Pr(|r_n(x)|>\epsilon)=0\]
but practically means that in data sets of reasonable size there’s a reasonable hope that ignoring \(r_n(x)\) won’t matter. Continuity assumptions mean that a result is likely to work for functions that don’t jump around too much. Moment bounds mean a variable shouldn’t have too many outliers, and boundedness assumptions mean a transformation shouldn’t create too many outliers.

Asymptotics works surprisingly well in statistics. The Central Limit Theorem appplies to infinite sequences, but means are usually pretty well Normal well before \(n=100\). In principle, sharp bounds for the accuracy of the Normal approximation (such as those given by the Berry–Esseen Theorem) would be better. In practice, though, the Normal approximation to a mean works a whole lot better than the Berry-Esseen bound would suggest.

There’s one mathematical condition that doesn’t at first glance seem to have any approximate meaning: measurability. In most of mathematical statistics any set or function anyone could reasonably be interested in is measurable. In fact, there is a model of set theory (Solovay’s model) in which all sets of real numbers are measurable. While it doesn’t have the Axiom of Choice, it has a weaker version that’s enough for many theoretical purposes. If you care about measurability, you have to care about the difference between the Axiom of Choice and the Axiom of Dependent Choice. If you’re not a set theorist, this is hard. It can feel as if you’re repeating the equivalent of “Four legs good; two legs bad.”

In a sense, measurability problems are Pythagoras’s fault. He (his followers) found that there wasn’t a number whose square was 2. Later researchers found other gaps between numbers. Even after you decide that all the powers of rational numbers and all the solutions of polynomial equations count as numbers, you still have more gaps than numbers. We’ve decided to just decree that the gaps are numbers too. ‘Constructions’ (and I use that word in the weakest possible sense) of non-measurable sets rely on the way these gaps now constitute nearly all numbers. A non-measurable set or non-measurable function is one that treats gap numbers differently from numbers that have some definite reason for existence. It can’t quite be as simple that, because the Axiom of Constructibility implies non-measurable sets, but it seems like something in that direction. Because we have too many numbers, we have too many sets of numbers, and they can’t all be measurable. Solovay’s model solves this problem by saying that a lot of the sets (including all the non-measurable ones) aren’t really sets, which emphasizes how difficult it’s going to be to translate this into something helpful in statistics.

More evidence that measurability may not mean anything in most contexts comes from Littlewood’s three principles for heuristics in real analysis. The first principle is that a measurable set is ‘almost’ a finite union of intervals and the second principle is that an integrable measurable function is ‘almost’ continuous: that is, the practical approximations to measurability are basically the same as the practical approximations to continuity. Measurability is heuristically just a slightly weaker version of continuity. That’s more or less what Robins and Ritov argue in their 1997 paper “Towards a curse-of-dimensionality-appropriate asymptotics for semiparametric models.” Their point is that data in high dimensional spaces is always going to be sparse, so that continuity doesn’t buy you anything at reasonable sample sizes, and measurability is the right model for what you can really assume. Even so, they are using ‘measurable’ to mean ‘basically anything’.

The only examples I know of where measurability fails and you care are in really, really big spaces. Suppose you have a random number uniformly distributed on \(\[0,1\]\) and consider the random empirical distribution function that results. There are good reasons why you might want to use the uniform metric on distribution functions – taking advantage of the Glivenko–Cantelli theorem – but if you do, the map from random numbers to random distribution functions is not measurable. Because the uniform metric on distribution functions doesn’t care which \(x\)s are close to each other, you can find an open set of distribution functions with jumps at any set of values. If the map were measurable, this would imply all subsets of \([0,1]\) were measurable, and that won’t fly.

In this example measurability is a property that random distribution functions have under the Skorohod topology but not under the uniform topology. Potentially that gives us a lever to find a way to say something about what sort of property it is and why we should care. But, after all this, I’m still not sure whether measurability is sometimes a useful approximate property or whether it’s a purely formal property. Math is hard.