Per-Predictor Heritabilities

In order to create a prediction model, the first step is to estimate the heritability contributed by each predictor, given the Heritability Model. These estimates are then used to construct the prior distributions when estimating effect sizes using Ridge Regression, Bolt-Predict, BayesR or MegaPRS.

Below we explain how to estimate per-predictor heritabilities assuming the GCTA, LDAK-Thin and BLD-LDAK Model. If analyzing human SNP data, we recommend using the BLD-LDAK Model, while if using non-human or non-SNP data, we recommend using the LDAK-Thin Model. These models are formally defined in Technical Details.

Always read the screen output, which suggests arguments and estimates memory usage.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Preparation:

To estimate per-predictor heritabilities requires genetic data (to calculate the tagging file) and summary statistics (to estimate the heritability contributed by each predictor).

If you are analysing individual-level data, you will already have genetic data. You can obtain summary statistics using the command --linear <outfile>. For details on this command, see the example below or Single-Predictor Analysis.

If you are analysing summary statistics, you should make sure that these are in the format required by LDAK (see Summary Statistics for details), and that you have a (well-matched) Reference Panel.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Calculate taggings:

Having decided which heritability model to use, you should calculate the corresponding tagging file using the command --calc-taggings <outfile>. For details on this command, see the example below or Calculate Taggings. Note that you must add --save-matrix YES so that LDAK saves the heritability matrix.

Note that when analysing individual-level data, you will likely have tens of thousands of samples. It is not necessary to use all of these when calculating taggings. Instead we suggest using --keep <keepfile> to provide a file containing, say, the names of 5000 randomly-selected samples, which will substantially reduce computational demands.

When analysing summary statistics, you can use all samples in the Reference Panel (when subsequently using MegaPRS to estimate effect sizes, it might be necessary to use only a subset of samples, but that is not an issue here).
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Estimate heritabilities:

Having calculated the tagging file, you should regress the summary statistics onto this using the command --sum-hers <outfile>. For details on this command, see the example below or SNP Heritability. Note that you must add --matrix <matrixfile>, where <matrixfile> is the heritability matrix, so that LDAK calculates the per-predictor heritabilities.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Example:

Here we use the binary PLINK files human.bed, human.bim and human.fam, and the phenotype quant.pheno from the Test Datasets. As we are analysing individual-level data, it is necessary to first obtain summary statistics. For this we can use the command

./ldak.out --linear quant --bfile human --pheno quant.pheno

The summary statistics are saved in quant.summaries (already in the format required by LDAK). For more details on this command, see Single-Predictor Analysis.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

1 - Estimate per-predictor heritabilities assuming the GCTA Model.

To calculate a tagging file under the GCTA Model, we use the options --ignore-weights YES and --power -1

./ldak.out --calc-tagging gcta --bfile human --ignore-weights YES --power -1 --window-cm 1 --save-matrix YES
./ldak.out --sum-hers gcta --tagfile gcta.tagging --summary quant.summaries --matrix gcta.matrix

The per-predictor heritabilities are saved in gcta.ind.hers.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

2 - Estimate per-predictor heritabilities assuming the LDAK-Thin Model.

To assume the LDAK-Thin Model, we must first create a weightsfile that gives weighting one to the predictors that remain after thinning for duplicates, and weighting zero to those removed. This can be achieved using the commands

./ldak.out --thin thin --bfile human --window-prune .98 --window-kb 100
awk < thin.in '{print $1, 1}' > weights.thin

Now when calculating the tagging file, we use the options --weights <weightsfile> and --power -.25

./ldak.out --calc-tagging ldak.thin --bfile human --weights weights.thin --power -.25 --window-cm 1 --save-matrix YES
./ldak.out --sum-hers ldak.thin --tagfile ldak.thin.tagging --summary quant.summaries --matrix ldak.thin.matrix

The per-predictor heritabilities are saved in ldak.thin.ind.hers.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

3 - Estimate per-predictor heritabilities assuming the BLD-LDAK Model.

To assume the BLD-LDAK Model, we first download the files bld1, bld2, ..., bld64 from the BLD-LDAK Annotations. Then we calculate the LDAK Weightings and rename them bld65, using the commands

./ldak.out --cut-weights sections --bfile human
./ldak.out --calc-weights-all sections --bfile human
mv sections/weights.short bld65

Finally, when calculating the tagging files, we use the options --ignore-weights YES and --power -.25, as well as providing the 65 annotations

./ldak.out --calc-tagging bld.ldak --bfile human --ignore-weights YES --power -.25 --annotation-number 65 --annotation-prefix bld --window-cm 1 --save-matrix YES
./ldak.out --sum-hers bld.ldak --tagfile bld.ldak.tagging --summary quant.summaries --matrix bld.ldak.matrix

The per-predictor heritabilities are saved in bld.ldak.ind.hers.