Pre-computed Taggings

Please note that the files on this page should only be used for estimating SNP heritability, heritability enrichments, genetic correlations or the selection-related parameter alpha. They can not be used for estimating per-predictor heritabilities (the first step when performing Prediction).

On this page, you can download tagging files corresponding to the LDAK-Thin, BLD-LDAK and BLD-LDAK-Lite+Alpha Models, computed using data from the UK Biobank. These are designed for use with LDAK v5.2 (obtain the latest version from Downloads). We recommend using the BLD-LDAK Model when estimating SNP heritability or Heritability Enrichments, the LDAK-Thin Model when estimating Genetic Correlations, and the BLD-LDAK-Lite+Alpha Model when estimating the selection-related parameter alpha.

For each heritability model, there are four versions: GBR (computed using 2000 white British individuals), SAS (4214 Indian and Pakistani individuals), EAS (1279 Chinese individuals) and AFR (2577 African individuals). Click here to see a principal component plot illustrating the four different populations.

For each population, we provide sets of pre-computed files corresponding to two SNP subsets: 1.0-1.2M non-ambiguous HapMap3 SNPs and 320-580k non-ambiguous directly genotyped SNPs. You should use whichever version best matches your summary statistics (in general, you should use the HapMap3 version if you have summary statistics from a GWAS that used imputation, otherwise use the directly genotyped version).

Note that you can only use the pre-computed tagging files when analysing summary statistics for human SNP data (for non-human or non-SNP data, you must Calculate Taggings yourself, for which we recommend using the LDAK-Thin Model).
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

The SNP names in the pre-computed tagging files are in the form Chr:BP, using positions from the Chr37/hg19 assembly. If the names in your summary statistics file are also in the form Chr:BP, but use positions from a different assembly, you can update them using the LiftOver Tool. If instead the names are rs ids, we explain how you can convert them in Summary Statistics.

To generate these tagging files, it was necessary to specify the Reference SNPs, Regression SNPs and Heritability SNPs (see SNP Subsets for details of these terms). The Reference SNPs were the 10M SNPs that were both present in the 1000 Genomes Project and had MAF>0.005. The Heritability SNPs were all Reference SNPs. The Regression SNPs were either the non-ambiguous Reference SNPs that were both present in HapMap3 and had MAF>0.01, or the non-ambiguous Reference SNPs that were both present on the UK Biobank Axiom Array and had MAF>0.01.

Your summary statistics file should provide summary statistics for the majority of SNPs in the tagging file. If you are missing statistics for more than 20% of SNPs, you should Calculate Taggings yourself.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

LDAK-Thin Model:

LDAK-Thin Tagging File (GBR population, HapMap3 SNPs)
LDAK-Thin Tagging File (SAS population, HapMap3 SNPs)
LDAK-Thin Tagging File (EAS population, HapMap3 SNPs)
LDAK-Thin Tagging File (AFR population, HapMap3 SNPs)

LDAK-Thin Tagging File (GBR population, directly genotyped SNPs)
LDAK-Thin Tagging File (SAS population, directly genotyped SNPs)
LDAK-Thin Tagging File (EAS population, directly genotyped SNPs)
LDAK-Thin Tagging File (AFR population, directly genotyped SNPs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

BLD-LDAK Model:

BLD-LDAK Tagging File (GBR population, HapMap3 SNPs)
BLD-LDAK Tagging File (SAS population, HapMap3 SNPs)
BLD-LDAK Tagging File (EAS population, HapMap3 SNPs)
BLD-LDAK Tagging File (AFR population, HapMap3 SNPs)

BLD-LDAK Tagging File (GBR population, directly genotyped SNPs)
BLD-LDAK Tagging File (SAS population, directly genotyped SNPs)
BLD-LDAK Tagging File (EAS population, directly genotyped SNPs)
BLD-LDAK Tagging File (AFR population, directly genotyped SNPs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

BLD-LDAK-Lite-Alpha Model:

BLD-LDAK-Lite+Alpha Tagging File (GBR population, HapMap3 SNPs)
BLD-LDAK-Lite+Alpha Tagging File (SAS population, HapMap3 SNPs)
BLD-LDAK-Lite+Alpha Tagging File (EAS population, HapMap3 SNPs)
BLD-LDAK-Lite+Alpha Tagging File (AFR population, HapMap3 SNPs)

BLD-LDAK-Lite+Alpha Tagging File (GBR population, directly genotyped SNPs)
BLD-LDAK-Lite+Alpha Tagging File (SAS population, directly genotyped SNPs)
BLD-LDAK-Lite+Alpha Tagging File (EAS population, directly genotyped SNPs)
BLD-LDAK-Lite+Alpha Tagging File (AFR population, directly genotyped SNPs)

Note that these are in fact multi-tagging files, each containing 31 versions of the BLD-LDAK-Lite+Alpha Model (Technical Details explains why this is the case). Click here to download the file pow.txt, which specifies the 31 values of alpha used to construct each tagging file. There are two ways to use these tagging files. The first way is to use the command --reduce-tagging <outfile> to extract the 31 individual tagging files (first extract columns 1-7, then 8-14, then 15-21, etc). However, the easier alternative is to tell LDAK to expect a multi-tagging file using the option --divisions <integer> when estimating SNP heritability (see below for an example).
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Regression SNP details:

HapMap3 SNP details (GBR population)
HapMap3 SNP details (SAS population)
HapMap3 SNP details (EAS population)
HapMap3 SNP details (AFR population)

Directly genotyped SNP details (GBR population)
Directly genotyped SNP details (SAS population)
Directly genotyped SNP details (EAS population)
Directly genotyped SNP details (AFR population)

These files provide details (name, chromosome, basepair, alleles and rs id) for the Regression SNPs (either HapMap3 SNPs or directly genotyped SNPs that are non-ambiguous and have MAF>0.01 in the corresponding population).
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Example:

Here we use the summary statistic files height.txt and neur.txt, created in the example for Summary Statistics, which contain results from the association studies of human height by the GIANT Consortium, and of neuroticism by Nagel et al. We use pre-computed tagging files for the LDAK-Thin, BLD-LDAK and BLD-LDAK-Lite+Alpha Models (note that both association studies used European samples and imputation, so we will download the GBR, HapMap3 versions of the tagging files), as well as the file pow.txt.

For more details on the commands used below, see the pages for estimating SNP Heritability, Heritability Enrichments, Genetic Correlations or the selection-related parameter alpha.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Obtain and extract the GBR, HapMap3 versions of tagging files

wget https://genetics.ghpc.au.dk/doug/ldak.thin.hapmap.gbr.tagging.gz
wget https://genetics.ghpc.au.dk/doug/bld.ldak.hapmap.gbr.tagging.gz
wget https://genetics.ghpc.au.dk/doug/bld.ldak.lite.alpha.hapmap.gbr.tagging.gz

gunzip ldak.thin.hapmap.gbr.tagging.gz
gunzip bld.ldak.hapmap.gbr.tagging.gz
gunzip bld.ldak.lite.alpha.hapmap.gbr.tagging.gz

To use the BLD-LDAK-Lite+Alpha tagging file, we must also download the list of alpha values

wget https://www.dropbox.com/s/o7xphugm4mln9xa/pow.txt
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

First we estimate SNP heritability and heritability enrichments using the BLD-LDAK Model tagging file (as per our Recommendations, we assume no confounding bias). It is necessary to add --check-sums NO because the height summary statistics do not contain all the SNPs in the tagging file.

./ldak.out --sum-hers height --summary height.txt --tagfile bld.ldak.hapmap.gbr.tagging --check-sums NO

The screen output tells us that we have summary statistics for 1,066,353 of the 1,168,975 SNPs (i.e., about 91%, so more than enough). The estimates of SNP heritability are saved in height.hers, while the estimates of enrichment are in height.enrich. These tell us, for example, that the estimated (total) SNP heritability is 0.56 (SD 0.01), and that the first category (which BLD-LDAK Annotations tells us corresponds to coding SNPs) is 5.9X enriched (SD 0.6).
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Now we estimate genetic correlation between height and neuroticism using the LDAK-Thin Model (this is the only scenario we recommend allowing for confounding bias, which LDAK does by default)

./ldak.out --sum-cors height.neur --summary height.txt --summary2 neur.txt --tagfile ldak.thin.hapmap.gbr.tagging --check-sums NO

The estimated genetic correlation is -0.06 (SD 0.02), saved in height.neur.cors.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Finally we estimate alpha (the same as when estimating SNP heritability or heritability enrichments, we recommend assuming no confounding bias)

./ldak.out --sum-hers height2 --summary height.txt --tagfile bld.ldak.lite.alpha.hapmap.gbr.tagging --divisions 7 --powerfile pow.txt --check-sums NO

The estimate of alpha is -0.33 (SD 0.03), saved in height2.power.