3 min read

Probabilities not bounded away from zero

We have a population or cohort of N people divided into H sampling strata, with a sample of size nh taken from the population Nh in stratum h. Let πij be the sampling probability for person i in stratum h. When we do asymptotics we usually assume πih are bounded away from zero. That’s not ideal for, say, case-control studies of rare diseases, where we might want asymptotic approximations based on the case incidence being small (ie, converging to zero). 

In the situations where I’m interested in πih being small, it’s usually small for a whole stratum. Since sampling is independent between strata, there should be a central limit theorem separately for each stratum, and we should be able to add up the limiting Normal approximations for the stratum totals to get a Normal limit for the population total estimate and the population mean estimate. 

To formalise this,  suppose nh for every stratum (so that asymptotics makes sense), and that πihNh/nh is bounded above and below, so that within each stratum the sampling probability has a finite (relative) range. As a simple example, we might have a case stratum with πi1 and a control stratum with very small πi

[Update: As Stas Kolenikov points out, I’m assuming the same strata are small large along the infinite sequence, so I need something like nh1/(nh1+nh2)ch1,h2[0,1] for each pair of strata.  This isn’t a meaningful loss of generality since (a) the infinite sequence is an analytic fiction and we might as well set it up for our maximum convenience; and (b) even without assuming anything, every subsequence will have a subsubsequence along which the condition holds]

By standard results, nh1/2(X¯.hμh)dN(0,σh2) for each stratum h , and by the Skorohod representation theorem we can find an H-variate normal vector Zhh=1H with
nh1/2(X¯.hμh)pZh
(possibly on a different probability space), to get
X¯.h=μh+nh1/2Zh+op(nh1/2)
The Zh will be independent, with mean zero; write σh2 for the variances. 

[Update: Note that σh2 is just var[Zh], nothing more fundamental. Under stratified random sampling, σh2 will be var[X] in stratum h multiplied by the ‘finite population correction” (Nhnh)/Nh, but under other sampling schemes it will be something else]

Now,
X¯..=1Nh=1HNhX¯.h
giving
X¯..=h=1HNhNμh+Nhnh1/2NZh+op(Nhnh1/2N)=μ+(h=1HNhnh1/2NZh)+op(h=1HNhNnh)

First, suppose $ N_h/N$ converges to a non-zero constant for each h. Let n=minhnh and define H={h:limn/nh>0}
X¯..=μ+(h=1HNhnh1/2NZh)+op(maxhNhNminhnh)=μ+(hHNhn1/2NZh)+hHop(n1/2)+op(maxhNhNn)=μ+n1/2Z+op(n1/2)

where ZN(0,σ2) with
σ2=limnhHNh2nσh2N2nh

Alternatively, for case–control sampling we may have Nh/N0 in the case stratum, but we would have nh all of the same order, and so of the same order as their total, n. The limiting distribution is dominated by the largest strata: define H={h:limNh/N>0} (which is non-empty as H is finite)

X¯..=μ+(h=1HNhnh1/2NZh)+op(h=1HNhNnh)=μ+(hHNhn1/2NZh)+hHop(n1/2)+op(n1/2) =μ+n1/2Z+op(n1/2)
where ZN(0,σ2) with
σ2=limnhHNh2nσh2N2nh

Weaker conditions on Nh and nh are clearly possible: it is only necessary to identify which terms dominate the limiting distribution of X¯.., since the limiting distribution of estimated stratum totals is always independent H-variate Normal under appropriate scaling.