Download Tagging Files

In our latest publication (currently available on Biorxiv), we develop a number of improved heritability models. We also show that it is better (results in higher model fit) to use an extensive reference panel, rather than to use only high-quality SNPs for which summary statistics are available (our previous recommendation). This finding means that we can now provide pre-computed tagging files.

This page provides tagging files corresponding to a variety of heritability models, computed using data from the UK Biobank (10M SNPs with MAF>0.005). For each heritability model, there are four versions: GBR (computed using 2000 white British individuals), SAS (3244 Indian and Pakistani individuals), EAS (1282 Chinese individuals) and AFR (2512 African individuals). Click here to see a principal component plot illustrating the four different populations. Note that the predictor names take the form chr:bp (e.g., 12:345678), using positions from the Chr37/hg19 assembly.

When performing an analysis using SumHer, it is necessary to specify the regression SNPs (the SNPs used when regressing the observed test statistics on their expected values). These tagging files were computed assuming the regression SNPs are those with MAF>0.01 present in HapMap3 (1.0-1.2M SNPs, depending on population). When using these tagging files, you should ideally have (valid) summary statistics for all the HapMap3 SNPs. If you are missing statistics for more than 20% (about 200,000 SNPs) you should consider instead computing the tagging file manually.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Estimating SNP heritability. The tagging files below were computed assuming the 66-parameter BLD-LDAK Model, our preferred model for estimating SNP heritability.

BLD-LDAK Tagging File (GBR population)
BLD-LDAK Tagging File (SAS population)
BLD-LDAK Tagging File (EAS population)
BLD-LDAK Tagging File (AFR population)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Estimating the selection-related parameter alpha. Our preferred model for estimating alpha is the 67-parameter BLD-LDAK+Alpha Model, but this is too large to provide here. Therefore, we instead provide tagging files computed assuming the 8-parameter BLD-LDAK-Lite+Alpha Model. Note that these are in fact multi-tagging files, each containing 31 versions of the BLD-LDAK-Lite+Alpha Model. There are two ways to use them. Either you can use the command --reduce-tagging to extract the 31 individual tagging files (first extract columns 1-7, then 8-14, then 15-21, etc). However, the easier alternative is to tell LDAK to expect a multi-tagging file using the option --divisions (see below for an example).

BLD-LDAK-Lite+Alpha Tagging File (GBR population)
BLD-LDAK-Lite+Alpha Tagging File (SAS population)
BLD-LDAK-Lite+Alpha Tagging File (EAS population)
BLD-LDAK-Lite+Alpha Tagging File (AFR population)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

The following two files are used in the examples below

pow.txt - this lists the 31 values of alpha used when making the above BLD-LDAK-Lite+Alpha tagging files

hapmap3.snps - this provides details of the 1.2M HapMap3 SNPs
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Below are example bash scripts scripts. For these we use summary statistics for height from the Giant Consortium. They performed a GWAS of European individuals, so we will use the GBR tagging files.

#Obtain and extract the GBR tagging files (right click on above links to get file locations)
wget http://144.6.225.166/downloads/bld.ldak.lite.alpha.gbr.hapmap.tagging.gz
wget https://www.dropbox.com/s/076sagva2x5hhs5/bld.ldak.gbr.hapmap.tagging.gz
gunzip bld.ldak.lite.alpha.gbr.hapmap.tagging.gz
gunzip bld.ldak.gbr.hapmap.tagging.gz

#To use the BLD-LDAK-Lite+Alpha tagging file, we must also download the list of alpha values
wget https://www.dropbox.com/s/o7xphugm4mln9xa/pow.txt

#Download summary statistics for height (note that these use rs SNP names)
wget https://portals.broadinstitute.org/collaboration/giant/images/0/01/GIANT_HEIGHT_Wood_et_al_2014_publicrelease_HapMapCeuFreq.txt.gz

#Put these into LDAK format (columns Predictor, A1, A2, Direction, Stat and n)
gunzip -c GIANT_HEIGHT_Wood_et_al_2014_publicrelease_HapMapCeuFreq.txt.gz | awk '(NR>1){snp=$1;a1=$2;a2=$3;dir=$5;stat=($5/$6)^2;n=$8}(NR==1){print "Predictor A1 A2 Direction Stat n"}(NR>1 && (a1=="A"||a1=="C"||a1=="G"||a1=="T") && (a2=="A"||a2=="C"||a2=="G"||a2=="T")){print snp, a1, a2, dir, stat, n}' - > height.txt

#Download list of HapMap3 SNPs; reduce summary statistics to these, switching from rs to generic snps names
wget https://www.dropbox.com/s/xabjdu6squ6u56r/hapmap3.snps
awk '(NR==FNR){arr[$1]=$2;ars[$1]=$3$4;next}(FNR==1){print $0}($1 in arr && ($2$3==ars[$1]||$3$2==ars[$1])){$1=arr[$1];print $0}' hapmap3.snps height.txt > height.hapmap
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

#First we estimate SNP heritability (here we assume no confounding bias)
./ldak5.linux --sum-hers height --summary height.hapmap --tagfile bld.ldak.gbr.hapmap.tagging --check-sums NO
#The estimate of SNP heritability is in height.hers, with estimates of enrichment in height.enrich

#Now we estimate alpha (again we assume no confounding bias)
./ldak5.linux --sum-hers height2 --summary height.hapmap --tagfile lite.gbr.hapmap.tagging --divisions 7 --powerfile pow.txt --check-sums NO
#The estimate of alpha is in height2.power