Calculate Taggings

All applications of SumHer require a tagging file, which records the (relative) expected heritability tagged by each predictor. To create a tagging file, you must specify a Heritability Model. In our paper Evaluating and improving heritabilty models using summary statistics (Nature Genetics, 2020), we compared 12 different heritability models based on how well they fit data from 31 complex human traits. We found that the 66-parameter BLD-LDAK Model performed best overall, while the one-parameter LDAK-Thin Model was the best of the simple heritability models.

When estimating SNP heritability

finding that the . We currently recommend using the 66-parameter BLD-LDAK Model. However, we recognise this is a however, best available heritability model is the e recommend using the LDAK Model. For most applications, a 1-part tagging file will suffice; the exception is when estimating Heritability Enrichments, for which a multi-part tagging file is required.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

The command for calculating a tagging file is

--calc-tagging <output>

which requires the options

--bfile/--gen/--sp/--speed <prefix> - specifies the data files (see File Formats).

--weights <weightsfile> and --power <float> - specify the Heritability Model.

--window-cm <float> - specifies the window size (how far to search for tagging predictors). It generally suffices to use 1cM. If the bim file does not contain genetic distances, an approximate solution is to instead use --window-kb 1000.

In most cases, you should use --extract to restrict to predictors for which you have summary statistics (when estimating Genetic Correlations, you might also want to restrict to non-ambiguous predictors). Additionally, you should use --exclude to ignore predictors within the major histocompatibility complex and those in LD with large-effect loci (see Summary Statistics for more details).

The tagging file is saved in <output>.tagging. The final column of this file contain the relative expected heritabilities tagged by each predictor (if making a multi-part tagging file, there will be one column for each category); these values are referred to as ujk in the SumHer paper. The first nine columns contain auxilliary details for each predictor, including the number of neighbours, the sum of correlation squared between the predictor and each of its neighbours, its weighting and MAF, how many categories it belongs to, and the relative heritability it is expected to contribute (referred to as qj in the SumHer paper).
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

To calculate a multi-part tagging file, you should use either --partition-number and --partition-prefix, or --annotation-number and --annotation-prefix. See Heritability Enrichments for details.

By default, LDAK will exclude SNPs with weighting zero; to prevent this add --reduce NO (this is often necessary when wishing to compare the fit of different heritability models).

To parallelize this process, you can add --chr to calculate taggings for each chromosome separately, then combine these using the argument --join-tagging <output> with the option --taglist <list_of_tagfiles> (see the example below).
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Example: for this we assume the Reference Panel is stored in binary PLINK format in the files ref.bed, ref.bim and ref.fam. We use the files height.txt, height.predictors, height.nonamb and height.exclude constructed in the example for Summary Statistics, as well as the weightings file sumsect/weights.short computed in the example for Heritability Model.

To calculate a tagging file under the LDAK Model, use

../ldak.out --calc-tagging sumldak --bfile ref --weights sumsect/weights.short --power -0.25 --window-cm 1 --extract height.predictors --exclude height.exclude

The taggings are saved in sumldak.tagging. If we wanted to exclude ambiguous predictors we would use --extract height.nonamb instead of --extract height.predictors. To parallelize this process, we can compute weights separately for each chromosome, then merge

for j in {1..22}; do
../ldak.out --calc-tagging sumldak$j --bfile ref --weights sumsect/weights.short --power -0.25 --extract height.predictors --exclude height.exclude --window-cm 1 --chr $j

rm list.txt; for j in {1..22}; do echo "sumldak$j.tagging" >> list.txt; done
../ldak.out --join-tagging sumldak --taglist list.txt

To instead create a tagging file assuming the LDSC Model, replaceĀ --weights sumsect/weights.short --power -0.25 with --ignore-weights YES --power -1.