Pre-computed Taggings

On this page, you can download tagging files corresponding to the LDAK-Thin, BLD-LDAK and BLD-LDAK-Lite+Alpha Models, computed using data from the UK Biobank. These are designed for use with LDAK 5.1 or above (obtain the latest version from Downloads). We recommend using the BLD-LDAK Model when estimating SNP heritability or Heritability Enrichments, the LDAK-Thin Model when estimating Genetic Correlations, and the BLD-LDAK-Lite+Alpha Model when estimating the selection-related parameter alpha.

For each heritability model, there are four versions: GBR (computed using 2000 white British individuals), SAS (4214 Indian and Pakistani individuals), EAS (1279 Chinese individuals) and AFR (2577 African individuals). Click here to see a principal component plot illustrating the four different populations.

Note that you can only use the pre-computed tagging files when analysing summary statistics for human traits (for non-human traits, you must Calculate Taggings yourself, for which we recommend using the LDAK-Thin Model).
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

The predictor names in the pre-computed tagging files are in the form Chr:BP, using positions from the Chr37/hg19 assembly. If the names in your summary statistics file are also in the form Chr:BP, but use positions from a different assembly, you can update them using the LiftOver Tool. If instead the names are rs ids, we explain how you can convert them in Summary Statistics.

To generate these tagging files, it was necessary to specify the Reference SNPs, Regression SNPs and Heritability SNPs (see SNP Subsets for details of these terms). The Reference SNPs were the 10M SNPs that were both present in the 1000 Genomes Project and had MAF>0.005. The Regression SNPs were the Reference SNPs that were also in HapMap3 (1.0-1.2M SNPs, depending on population). The Heritability SNPs were all Reference SNPs.

Your summary statistics file should provide summary statistics for the majority of predictors in the tagging file. If you are missing statistics for more than 20%, you should Calculate Taggings yourself.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

LDAK-Thin Model:

LDAK-Thin Tagging File (GBR population)
LDAK-Thin Tagging File (SAS population)
LDAK-Thin Tagging File (EAS population)
LDAK-Thin Tagging File (AFR population)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

BLD-LDAK Model:

BLD-LDAK Tagging File (GBR population)
BLD-LDAK Tagging File (SAS population)
BLD-LDAK Tagging File (EAS population)
BLD-LDAK Tagging File (AFR population)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

BLD-LDAK-Lite-Alpha Model:

BLD-LDAK-Lite+Alpha Tagging File (GBR population)
BLD-LDAK-Lite+Alpha Tagging File (SAS population)
BLD-LDAK-Lite+Alpha Tagging File (EAS population)
BLD-LDAK-Lite+Alpha Tagging File (AFR population)

Note that these are in fact multi-tagging files, each containing 31 versions of the BLD-LDAK-Lite+Alpha Model (Technical Details explains why this is the case). Click here to download the file pow.txt, which specifies the 31 values of alpha used to construct each tagging file. There are two ways to use these tagging files. The first way is to use the command --reduce-tagging <outfile> to extract the 31 individual tagging files (first extract columns 1-7, then 8-14, then 15-21, etc). However, the easier alternative is to tell LDAK to expect a multi-tagging file using the option --divisions <integer> when estimating SNP heritability (see below for an example).
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Example:

Here we use the summary statistic files height.txt and neur.txt, created in the example for Summary Statistics, which contain results from the association studies of human height by the GIANT Consortium, and of neuroticism by Nagel et al. We use pre-computed tagging files for the LDAK-Thin, BLD-LDAK and BLD-LDAK-Lite+Alpha Models (note that both association studies used European samples, so we will download the GBR versions of the tagging files), as well as the file pow.txt.

For more details on the scripts below, see the pages for estimating SNP Heritability, Heritability Enrichments, Genetic Correlations or the selection-related parameter alpha.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Obtain and extract the GBR tagging files

wget https://genetics.ghpc.au.dk/doug/ldak.thin.hapmap.gbr.tagging.gz
wget https://genetics.ghpc.au.dk/doug/bld.ldak.hapmap.gbr.tagging.gz
wget https://genetics.ghpc.au.dk/doug/bld.ldak.lite.alpha.hapmap.gbr.tagging.gz

gunzip ldak.thin.hapmap.gbr.tagging.gz
gunzip bld.ldak.hapmap.gbr.tagging.gz
gunzip bld.ldak.lite.alpha.hapmap.gbr.tagging.gz

To use the BLD-LDAK-Lite+Alpha tagging file, we must also download the list of alpha values

wget https://www.dropbox.com/s/o7xphugm4mln9xa/pow.txt
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

First we estimate SNP heritability and heritability enrichments using the BLD-LDAK Model tagging file (as per our Recommendations, we assume no confounding bias). It is necessary to add --check-sums NO because the height summary statistics do not contain all the SNPs in the tagging file.

./ldak.out --sum-hers height --summary height.txt --tagfile bld.ldak.hapmap.gbr.tagging --check-sums NO

The output tells us that we have valid summary statistics for 1,063,918 of the 1,166,147 SP (i.e., about 91%, so more than enough). The estimates of SNP heritability are saved in height.hers, while the estimates of enrichment are in height.enrich. These tell us, for example, that the estimated (total) SNP heritability is 0.56 (SD 0.01), and that the first category (which BLD-LDAK Annotations tells us corresponds to coding SNPs) is 5.8X enriched (SD 0.6).
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Now we estimate genetic correlation between height and neuroticism using the LDAK-Thin Model (this is the only scenario we recommend allowing for confounding bias, which LDAK does by default)

./ldak.out --sum-cors height.neur --summary height.txt --summary2 neur.txt --tagfile ldak.thin.hapmap.gbr.tagging --check-sums NO

The estimated genetic correlation is -0.06 (SD 0.02), saved in height.neur.cors.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Finally we estimate alpha (the same as when estimating SNP heritability or heritability enrichments, we recommend assuming no confounding bias)

./ldak.out --sum-hers height2 --summary height.txt --tagfile bld.ldak.lite.alpha.hapmap.gbr.tagging --divisions 7 --powerfile pow.txt --check-sums NO

The estimate of alpha is -0.37 (SD 0.01), saved in height2.power.