2 min read

Statistical software matters

This is a picture of all the genetic associations found in genome-wide association studies, sorted by chromosome. You can find more detail at the NHGRI GWAS catalog

There are two chromosomes with many fewer associations. One is the Y chromosome. There isn’t much there because there isn’t much there. The other is the X chromosome. There isn’t much there because GWAS took a lot longer to get started for the X chromosome, and that’s partly for software reasons.

One of the important steps in GWAS consortia was imputation. Different research groups used different SNP chips, but they all had pretty good coverage of the genome, and all the sets of results could be imputed for a common set of SNPs – typically 2.5 million from the HapMap project.

The first step in imputation is to estimate haplotypes: you’ve got two copies of each chromosome, and the SNPchips don’t know which allele came from which copy. The software ran into trouble for the X chromosome, because while women have two copies of the X chromosome, men only have one. There were work-arounds, but for a while most people ignored the X chromosome because they (we) were in a hurry.

Another fairly simple software-related obstacle was the handling of X-chromosome inactivation. If you have two X chromosomes, one of them will mostly be switched off, with the choice being effectively random. The patches of inactivation are small enough that most tissues have close to 50:50 representation of each of the chromosomes. For the additive genetic models we mostly used, you can sensibly code the number of copies of an allele on the X chromosome as 0, 1/2, 1 for women and 0,1 for men. But the default in PLINK was the less-desirable 0,1,2 for women and 0,1 for men, leading people to want sex-specific analyses with more work and lower power.

Software isn’t the only issue: there was a lot of reluctance in some groups to pool X-chromosome data for males and females. I think this is misguided, both because of X-chromosome inactivation and because even if alleles on the X-chromosome are more likely to have sex-specific effects, there’s still a lot of ordinary genes there.

Anyway, even now, more than ten years after the Wellcome Trust study there’s still less known from GWAS of the X chromosome than the other 22 major chromosomes. And a definite part of that is availability of software.