Space is big. Really big. You just won’t believe how vastly, hugely, mind-bogglingly big it is. I mean, you may think it’s a long way down the road to the chemist, but that’s just peanuts to space. Hitchhikers Guide to the Galaxy
There’s a simple simulation that I used in stat computing class last year and for Rob Hyndman’s working group last week. Simulate data uniformly on a \(p\)-dimensional hypercube \([0,\,1]^p\) and compute nearest-neighbour distances.
Obviously these distances increase with \(p\), since the squared Euclidean distance is a sum of non-negative terms. The mean distance between two random points is \(\sqrt{p}/3\). How does this compare to the nearest-neighbour distance?
> x1<-runif(5e4)
> x2<-matrix(runif(5e4*2),ncol=2)
> x10<-matrix(runif(5e4*10),ncol=10)
> x100<-matrix(runif(5e4*100),ncol=100)
system.time(d1<-knn.dist(x1,k=1))
user system elapsed
0.044 0.002 0.045
> system.time(d2<-knn.dist(x2,k=1))
user system elapsed
0.066 0.002 0.088
> system.time(d10<-knn.dist(x10,k=1))
user system elapsed
4.005 0.010 3.993
> system.time(d100<-knn.dist(x100,k=1))
user system elapsed
705.281 0.996 702.481
> mean(d1/sqrt(1)/(1/3))
[1] 2.994883e-05
> mean(d2/sqrt(2)/(1/3))
[1] 0.004764815
> mean(d10/sqrt(10)/(1/3))
[1] 0.306989
> mean(d100/sqrt(100)/(1/3))
[1] 0.9269603
The typical nearest-neighbour distance for 50,000 uniform points in \([0,\,1]\) is 0.00003 times the mean interpoint distance, which is sort of what you’d expect. In \([0,\,1]^2\) it’s 0.005 times the mean distance, which is not too surprising.
In \([0,\,1]^{100}\), though, the nearest-neighbour distance is over 90% of the mean distance. If you haven’t seen this sort of computation before, that seems outrageous. Roughly speaking, in 100-dimensional space, if you have 50,000 points in a unit cube they are all about equidistant from each other!
If you think of the interpoint distance as a sum, that makes sense in terms of the Law of Large Numbers and Central Limit Theorem. The squared Euclidean distance is a sum of \(p\) approximately independent things, so it should be close to its expected value; differing from it by a relative amount \(O_p(1/\sqrt{p})\). If we have \(n\) independent random variables like that, and they don’t have heavier tails than a Gaussian (which they don’t), the worst departure from the expected value should scale like \[O_p\left(\sqrt{\frac{\log n}{p}}\right).\] In other words, nearest-neighbours can’t be very much closer than the average interpoint distance.
Many ‘black-box’ prediction methods are based on using outcome values from nearby points to predict outcome values for new points. This can’t work in high-dimensional space; there are no nearby points.
In a sense, everyone knows this. Statisticians call it the ‘curse of dimensionality’, and modern sparse regression methods are aimed at bypassing it by not working in high-dimensional spaces. It’s still more dramatic that most people realise, and it’s hard for our intuition to cope, to think of a high-dimensional unit cube or sphere and ‘believe how vastly, hugely, mind-bogglingly big it is’.