June 2023: Please note that it is only necessary to compute per-predictor heritabilities when assuming a multi-parameter heritability model (e.g., the BLD-LDAK Model). In general, we recommend using the Human Default heritability model.
This page explains how to estimate per-predictor heritabilities, which is the first step when constructing a prediction model using Ridge-Predict, Bolt-Predict, BayesR-Predict or MegaPRS. These instructions require you to specify a Heritability Model; if analysing human SNP data, we recommend using the BLD-LDAK Model, while if analysing non-human or non-SNP data, we recommend using the LDAK-Thin Model (these models are formally defined in Technical Details).
Always read the screen output, which suggests arguments and estimates memory usage.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
Preparation:
You require genetic data (to calculate the tagging file and heritability matrix) and summary statistics (to estimate the heritability contributed by each predictor).
If you are analysing individual-level data, you will already have genetic data. You can obtain summary statistics using the command --linear <outfile>. For details on this command, see the example below or Single-Predictor Analysis.
If you are analysing summary statistics, you should make sure that these are in the format required by LDAK (see Summary Statistics for details), and that you have a (well-matched) Reference Panel.
If you are analysing human SNP data, we recommend that your predictor names are in the form Chr:BP using genomic positions from the GRCh37/hg19 assembly (this is the form used by the BLD-LDAK Annotations).
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
Calculate a tagging file and heritability matrix:
Having decided which heritability model to use, you should calculate the corresponding tagging file and heritability matrix using the command --calc-taggings <outfile>. For details on this command, see the example below or Calculate Taggings. Note that you should use --extract <extractfile> to reduce to predictors for which you have summary statistics, and you should add --save-matrix YES so that LDAK saves the heritability matrix.
Note that if you are analysing individual-level data for tens of thousands of samples, it is not necessary to use all of these when calculating the tagging file and heritability matrix. Instead we suggest using --keep <keepfile> to specify 5000 randomly-selected samples, which will substantially reduce computational demands.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
Estimate heritabilities:
Having calculated the tagging file and corresponding heritability matrix, you should regress the summary statistics onto these using the command --sum-hers <outfile>. For details on this command, see the example below or SNP Heritability. Note that you must add --matrix <matrixfile>, where <matrixfile> is the heritability matrix, so that LDAK calculates the per-predictor heritabilities.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
Example:
Here we use the binary PLINK files human.bed, human.bim and human.fam, and the phenotype quant.pheno from the Test Datasets. We show how to estimate per-predictor heritabilities assuming the GCTA, LDAK-Thin and BLD-LDAK Models (as a reminder, we recommend using the BLD-LDAK Model if analysing human SNP data, else using the LDAK-Thin Model).
As we are analysing individual-level data, it is necessary to first obtain summary statistics. For this we can use the command
./ldak.out --linear quant --bfile human --pheno quant.pheno
The summary statistics are saved in quant.summaries (already in the format required by LDAK). For more details on this command, see Single-Predictor Analysis.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
1 - Estimate per-predictor heritabilities assuming the GCTA Model.
To calculate a tagging file under the GCTA Model, we use the options --ignore-weights YES and --power -1
./ldak.out --calc-tagging gcta --bfile human --ignore-weights YES --power -1 --window-cm 1 --save-matrix YES
./ldak.out --sum-hers gcta --tagfile gcta.tagging --summary quant.summaries --matrix gcta.matrix
The per-predictor heritabilities are saved in gcta.ind.hers.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
2 - Estimate per-predictor heritabilities assuming the LDAK-Thin Model.
To assume the LDAK-Thin Model, we must first create a weightsfile that gives weighting one to the predictors that remain after thinning for duplicates, and weighting zero to those removed. This can be achieved using the commands
./ldak.out --thin thin --bfile human --window-prune .98 --window-kb 100
awk < thin.in '{print $1, 1}' > weights.thin
Now when calculating the tagging file, we use the options --weights <weightsfile> and --power -.25
./ldak.out --calc-tagging ldak.thin --bfile human --weights weights.thin --power -.25 --window-cm 1 --save-matrix YES
./ldak.out --sum-hers ldak.thin --tagfile ldak.thin.tagging --summary quant.summaries --matrix ldak.thin.matrix
The per-predictor heritabilities are saved in ldak.thin.ind.hers.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
3 - Estimate per-predictor heritabilities assuming the BLD-LDAK Model.
To assume the BLD-LDAK Model, we first download the files bld1, bld2, ..., bld64 from the BLD-LDAK Annotations. Then we calculate the LDAK Weightings and rename them bld65, using the commands
./ldak.out --cut-weights sections --bfile human
./ldak.out --calc-weights-all sections --bfile human
mv sections/weights.short bld65
Finally, when calculating the tagging files, we use the options --ignore-weights YES and --power -.25, as well as providing the 65 annotations
./ldak.out --calc-tagging bld.ldak --bfile human --ignore-weights YES --power -.25 --annotation-number 65 --annotation-prefix bld --window-cm 1 --save-matrix YES
./ldak.out --sum-hers bld.ldak --tagfile bld.ldak.tagging --summary quant.summaries --matrix bld.ldak.matrix
The per-predictor heritabilities are saved in bld.ldak.ind.hers.