SNP Heritability

The SNP heritability of a trait is the proportion of phenotypic variation explained by all (common) SNPs. To estimate SNP heritability, the first step is to obtain a tagging file. Either you can use Pre-computed Taggings or Calculate Taggings yourself. This step requires you to choose a Heritability Model. For human traits, we recommend using the BLD-LDAK Model, while for non-human traits, we recommend using the LDAK-Thin Model. The second step is to regress (correctly-formatted) Summary Statistics onto the tagging file, described below.

Always read the screen output, which suggests arguments and estimates memory usage.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

The main argument is --sum-hers <outfile>

This requires the options

--tagfile <taggingfile> - to specify the tagging file

--summary <sumsfile> - to specify the file containing the summary statistics.

Analyses can be sensitive to large-effect loci (particularly, if these are located in regions of extreme linkage disequilibrium, such as the MHC). Previously, we recommended first using --remove-tags <outfile> to identify predictors tagging loci that explain more than 1% of phenotypic variance, then excluding these using --extract <extractfile> and/or --exclude <excludefile>. However, this can now be done more easily using the option --cutoff <float>; for example, to remove predictors that explain more than 1% of phenotypic variance, add --cutoff 0.01 (note that this will not also remove predictors tagging the large-effect loci, but in practice, we find this makes little difference).

LDAK will report an error if the summary statistics file does not provide summary statistics for all predictors in the tagging file. If a relatively small proportion of predictors are affected (say less than 20%), it should be OK to override this error by adding --check-sums NO.

If the summary statistics come from analysing a binary phenotype, then you can use --prevalence <float> and --ascertainment <float> to specify the proportion of cases in the population and in the GWAS; LDAK will then also report estimates of variance explained on the liability scale.

If you require estimates of per-predictor heritabilities (necessary if performing Prediction), add --matrix <matrixfile>, where <matrixfile> is the heritability matrix created by adding --save-matrix YES when Calculating Taggings.

As explained in our paper Evaluating and improving heritability models using summary statistics (Nature Genetics, 2020), LDAK now estimates parameters via maximum likelihood. To instead revert to weighted least-squares regression, use --chisq-solver NO. When estimating parameters, predictors are weighted by 1/uj, where uj is the sum of r2jl across neighbouring predictors (a measure of how well the predictor is tagged by its neighbours). By default, the uj are read from the tagging file, however, alternative values can be provided using the option --alternative-tags <alttagsfile>.

To allow for confounding bias, use either --genomic-control YES (to allow for multiplicative inflation of test statistics) or --intercept YES (to allow for additive inflation of test statistics). However, note that we DO NOT recommend using either (see Publications for our reasoning). Instead, we strongly recommend that you only analyse summary statistics when confident they come from a GWAS that performed careful quality control.

The main output files are <outfile>.hers and <outfile>.extra. In particular, the final row of <outfile>.hers provides the estimate of SNP heritability, while <outfile>.extra contains loglSS, the approximate log likelihood, for the null and alternative model. Most of the remaining files provide different ways of measuring the relative contribution of different categories (partitions or annotations) and are explained in Heritability Enrichments.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Example:

Here we use the binary PLINK files human.bed, human.bim and human.fam, and the phenotype quant.pheno from the Test Datasets, as well as the tagging files LDAK-Thin.tagging and BLD-LDAK.tagging created in the example for Calculate Taggings.

In order to estimate SNP heritability, we require summary statistics (i.e., the results from regressing the phenotype on each SNP individually). We can obtain these by running

./ldak.out --linear quant --bfile human --pheno quant.pheno

The summary statistics are saved in quant.summaries (already in the format required by SumHer). For more details on this command, see Single-Predictor Analysis.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

First we estimate SNP heritability assuming the LDAK-Thin Model (as per our Recommendations, we assume no confounding bias).

./ldak.out --sum-hers snpher --summary quant.summaries --tagfile LDAK-Thin.tagging

The estimated SNP heritability is 0.75 (SD 0.13), saved in snpher.hers. Note that LDAK warns us that there are large-effect loci. Usually, we should follow its advice and add --cutoff 0.01, in order to exclude SNPs that individually explain at least 1% of phenotypic variance (we have not done so here, but only because this is a toy example).

Next we estimate SNP heritability assuming the BLD-LDAK Model.

./ldak.out --sum-hers snpher2 --summary quant.summaries --tagfile BLD-LDAK.tagging

The estimated SNP heritability is 0.75 (SD 0.24), saved in snpher2.hers (again, we ignore the warning to exclude large-effect loci, but only because this is a toy example).

To compare the LDAK-Thin and BLD-LDAK Models, we can examine the model fits (measured using our approximate log likelihood loglSS) in the files snpher.extra and snpher2.extra. First it is important to confirm the log likelihoods are the same (in this case, both -854), as this indicates that the two model fits were computed using the same predictors and predictor weightings (and thus ensures a fair comparison). Next, we see that the alternative likelihood for the one-parameter LDAK-Thin Model is -826 (i.e., 28 above the null), while the alternative likelihood for the 66-parameter BLD-LDAK Model is -854 (41 above the null).

While this indicates the BLD-LDAK Model fits better, the improvement over the LDAK-Thin Model (13) is small relative to its extra complexity (it uses 65 more parameters), suggesting the improvement is largely due to chance. More formally, we recommend you prefer the results from the heritability model with lowest Akaike Information Criterion (AIC), equal to 2K-2logl, where K is the number of parameters and logl is the (approximate) log likelihood. Here, the LDAK-Thin and BLD-LDAK Models have AIC -1495 and -1650, respectively, indicating that the LDAK-Thin Model is to be preferred. Note that this conclusion reflects that this is a toy example; for real datasets, we generally find the BLD-LDAK Model results in substantially lower AIC than the LDAK-Thin Model.