Per-Predictor Heritabilities

Please note that we no longer recommend using per-predictor heritabilities when constructing prediction models. Although their use can result in more accurate models, in general, the impact is slight, and it is simpler to skip this step.

This page explains how to estimate per-predictor heritabilities, which used to be the first step when constructing a prediction model using Ridge-Predict, Bolt-Predict, BayesR-Predict or MegaPRS. These instructions require you to specify a Heritability Model; if analysing human SNP data, we recommend using the BLD-LDAK Model, while if analysing non-human or non-SNP data, we recommend using the LDAK-Thin Model (these models are formally defined in Technical Details).

Always read the screen output, which suggests arguments and estimates memory usage.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Preparation:

You require genetic data (to calculate the tagging file and heritability matrix) and summary statistics (to estimate the heritability contributed by each predictor).

If you are analysing individual-level data, you will already have genetic data. You can obtain summary statistics using the command --linear <outfile>. For details on this command, see the example below or Single-Predictor Analysis.

If you are analysing summary statistics, you should make sure that these are in the format required by LDAK (see Summary Statistics for details), and that you have a (well-matched) Reference Panel.

If you are analysing human SNP data, we recommend that your predictor names are in the form Chr:BP using genomic positions from the GRCh37/hg19 assembly (this is the form used by the BLD-LDAK Annotations).
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Calculate a tagging file and heritability matrix:

Having decided which heritability model to use, you should calculate the corresponding tagging file and heritability matrix using the command --calc-taggings <outfile>. For details on this command, see the example below or Calculate Taggings. Note that you should use --extract <extractfile> to reduce to predictors for which you have summary statistics, and you should add --save-matrix YES so that LDAK saves the heritability matrix.

Note that if you are analysing individual-level data for tens of thousands of samples, it is not necessary to use all of these when calculating the tagging file and heritability matrix. Instead we suggest using --keep <keepfile> to specify 5000 randomly-selected samples, which will substantially reduce computational demands.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Estimate heritabilities:

Having calculated the tagging file and corresponding heritability matrix, you should regress the summary statistics onto these using the command --sum-hers <outfile>. For details on this command, see the example below or SNP Heritability. Note that you must add --matrix <matrixfile>, where <matrixfile> is the heritability matrix, so that LDAK calculates the per-predictor heritabilities.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Example:

Here we use the binary PLINK files human.bed, human.bim and human.fam, and the phenotype quant.pheno from the Test Datasets. We show how to estimate per-predictor heritabilities assuming the GCTA, LDAK-Thin and BLD-LDAK Models (as a reminder, we recommend using the BLD-LDAK Model if analysing human SNP data, else using the LDAK-Thin Model).

As we are analysing individual-level data, it is necessary to first obtain summary statistics. For this we can use the command

./ldak.out --linear quant --bfile human --pheno quant.pheno

The summary statistics are saved in quant.summaries (already in the format required by LDAK). For more details on this command, see Single-Predictor Analysis.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

1 - Estimate per-predictor heritabilities assuming the GCTA Model.

Run the commands

./ldak.out --calc-tagging gcta --bfile human --power -1 --save-matrix YES
./ldak.out --sum-hers gcta --tagfile gcta.tagging --summary quant.summaries --matrix gcta.matrix

The per-predictor heritabilities are saved in gcta.ind.hers.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

2 - Estimate per-predictor heritabilities assuming the LDAK-Thin Model.

To assume the LDAK-Thin Model, we must first create a weightsfile that gives weighting one to the predictors that remain after thinning for duplicates, and weighting zero to those removed. This can be achieved using the commands

./ldak.out --thin thin --bfile human --window-prune .98 --window-kb 100
awk < thin.in '{print $1, 1}' > weights.thin

Now we can run the commands

./ldak.out --calc-tagging ldak.thin --bfile human --weights weights.thin --power -.25 --save-matrix YES
./ldak.out --sum-hers ldak.thin --tagfile ldak.thin.tagging --summary quant.summaries --matrix ldak.thin.matrix

The per-predictor heritabilities are saved in ldak.thin.ind.hers.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

3 - Estimate per-predictor heritabilities assuming the BLD-LDAK Model.

To assume the BLD-LDAK Model, we first download the files bld1, bld2, ..., bld64 from the BLD-LDAK Annotations. Then we calculate the LDAK Weightings and rename them bld65, using the commands

./ldak.out --cut-weights sections --bfile human
./ldak.out --calc-weights-all sections --bfile human
mv sections/weights.short bld65

Finally, we can run the command

./ldak.out --calc-tagging bld.ldak --bfile human --power -.25 --annotation-number 65 --annotation-prefix bld --save-matrix YES
./ldak.out --sum-hers bld.ldak --tagfile bld.ldak.tagging --summary quant.summaries --matrix bld.ldak.matrix

The per-predictor heritabilities are saved in bld.ldak.ind.hers.