Quality Control

Heritability analyses tends to be very sensitive to genotyping errors. Estimates of heritability represent the sum of the contributions of many predictors, so even if the error at individual predictors is small, this can accumulate across predictors to produce a large overall error. Further, estimates of heritability are sensitive to inflation due to population structure and familial relatedness; the former can create artificial correlations between predictors and the phenotype, while the latter causes long-range linkage disequilibrium (meaning that we can no longer be confident that the heritability estimates reflect only the heritabilty contributed by the predictors in the dataset).

Therefore, prior to performing a heritability analysis, it is necessary to perform very careful quality control of samples and predictors. Here we provide some guidelines, followed by ways to test whether your quality control was sufficient.

Please note that the thresholds below are generally suggestions; ideally you should check they are sensible, and change if necessary. For example, to decide an information score threshold, you can first generate a histogram of all scores, and look for a "cliff-edge". We have suggested using 0.95, but if almost all (very few) of the predictors exceed this threshold, then you should consider increasing (reducing) it. For some good advice, see Mike Weale's Chapter on Quality Control.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Filtering of samples:

Our aim is to obtain a set of ancestrally-homogeneous, unrelated samples. Here we explain how we did this when analysing UK Biobank data for our paper Evaluating and improving heritabilty models using summary statistics (Nature Genetics, 2020); for more details, see Supplementary Note 5.

We first identified a set of common (MAF>0.01), high-quality (information score >0.95 and missingness <0.01), autosomal SNPs, that are also present in the 1000 Genome Project, and are in approximate linkage disequilibrium (we suggest Thinning Predictors with options --window-cm 1 and --window-prune 0.05).

Using these SNPs, we projected UK Biobank samples onto the two principal component axes of the 1000 Genome Project data (see Principal Component Analysis for example scripts). This enabled us to identify samples whose genotypes were consistent with those recorded as being "White British". Using these samples (and the same SNPs as above), we created a unweighted kinship matrix (we Calculated Kinships adding the options --ignore-weights YES and --power -1), then removed individuals until no pair remained with kinship greater than the absolute value of the smallest observed (this is the default approach when Filtering Relatedness).
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Filtering of predictors:

Most heritability analyses are designed for common predictors (typically MAF>0.01 or MAF>0.005), as it is not currently clear how best to include rare variants. Similarly, most analyses focus on autosomal predictors, as it is not obvious how best to include sex chromosomes.

Our preferred measure of SNP quality is the information score from imputation, which reflects how closely a SNP's genotypes match those expected given the neighbouring SNPs. We typically exclude SNPs with information score below 0.95 or 0.99. While it is reasonable to also filter SNPs based on other metrics, such as missingness or Hardy-Weinberg Equilibrium, in general we find this is not necessary (at least for common SNPs). Note that some imputation software, such as IMPUTE2, compute multiple information scores, in which case we would filter based on all scores. Alternatively, given genotype probabilities, LDAK is able to compute its own information score, that represents the expected correlation between the recorded genotypes and the true genotypes (see Making Data).
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Testing for inflation due to population structure and familial relatedness:

For each phenotype, we advise computing T1=(h2A+h2B+h2C+h2D-h2SNP)/3, where h2SNP is an estimate of SNP heritability from the whole genome, while h2A, h2B, h2C and h2D are estimates from quarters of the genome. For example, if analysing human data, h2A, h2B, h2C and h2D could be estimates of SNP heritability from Chromosomes 1-3, 4-7, 8-12 and 13-22, respectively.

If the individuals are unrelated, and there is no population structure, then the sum of estimates from each quarter should equal the estimate from the whole genome, and as a result, T1 will be close to zero. By contrast, if there is (substantial) relatedness or structure, then SNPs will be correlated across chromosomes, and each of h2A, h2B, h2C and h2D will capture not just the heritability contributed by SNPs in the corresponding quarter, but also a fraction of heritability contributed by SNPs in other quarters, and thus their sum will be greater than the heritability contributed by all SNPs.

Note that to test whether T1 is significantly greater than zero, it is necessary to estimate its distribution. We did this empirically by repeatedly sampling h2SNP, h2A, h2B, h2C and h2D from their estimated distributions. Here is some sample R code :

#eSNP, eA, eB, eC, eD are the five estimates of SNP heritability
#sSNP, sA, sB, sC, sD are the five corresponding standard deviations

N=10000
rSNP=rnorm(N,eSNP,sSNP);
rA=rnorm(N,eA,sA);
rB=rnorm(N,eB,sB);
rC=rnorm(N,eC,sC);
rD=rnorm(N,eD,sD);

T1samp=(rA+rB+rC+rD-rSNP)/3;
pvalue=mean(T1samp>0);
pvalue;
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Testing for inflation due to genotyping errors:

For each phenotype, we advise computing T2=h2Same-h2Diff, where h2Sameand h2Diff are estimates of SNP heritability (from all SNPs) calculated, respectively, across pairs of individuals in the same genotyping batch, and across pairs of individuals in different genotyping batches. Note that this is only possible when using HE Regression or PCGC Regression and individuals have been genotyped in distinct cohorts. Further, if the phenotype is binary, then each cohort must contain both cases and controls.

Genotyping errors will cause pairs of individuals in the same batch to appear more genetically similar than they actually are, and pairs of individuals in different batches to appear less genetically similar. This is problematic when pairs individuals in the same batch are more phenotypically similar than pairs of individuals in different batches. In this case, h2Same will tend to be higher than h2Diff, and therefore T2 will be positive.

To compute T2, use --subset-number and --subset-prefix when performing HE or PCGC Regression (see Subset Options for more details). As well as calculating T2, LDAK will also report whether it is significantly different to zero.