Rasmus Bååth wrote a post using Approximate Bayesian Computation to estimate a posterior distribution for Karl’s socks. What he didn’t consider was the impact of publication bias. In order for us to see the tweet, it was not only necessary that Karl’s first 11 socks were distinct, it was also necessary that he found this remarkable, and, probably, that no-one he follows on Twitter had made a similar laundry-related observation at any recent time. Now we’ve seen his socks, other laundry data will face a higher barrier to publication.
I wrote about related issues in March, but the basic point is that the likelihood principle doesn’t save you, as a reader, from having to model the publication process. Your data are other people’s reports, so your likelihood involves their reporting behaviour.
In this particular example, publication bias appears not to be very strong. Based on the actual number of socks involved (21 pairs, 3 singletons), the probability of the first 11 socks being distinct is about 0.23. The relative sparsity of published reports of the sock-uniqueness phenomenon imply that either not many people wash their socks, or that the decision to tweet about laundry is determined by factors other than the level of surprise in the data.
(the reference in the title is to a paper by John Bell explaining Bell’s Inequality and the incompatibility of local hidden variable theories with quantum mechanics. Since the paper was for a philosophy conference, it’s fairly accessible)