Tagging File

All applications of SumHer require a tagging file, which records the (relative) expected heritability tagged by each predictor. While SumHer is able to calculate these expectations under any Heritability Model, we recommend using the LDAK Model. For most applications, a 1-part tagging file will suffice; the exception is when estimating Heritability Enrichments, for which a multi-part tagging file is required.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

The command for calculating a tagging file is

--calc-tagging <output>

which requires the options

--bfile/--gen/--sp/--speed <prefix> - specifies the data files (see File Formats).

--weights <weightsfile> and --power <float> - specify the Heritability Model.

--window-cm <float> - specifies the window size (how far to search for tagging predictors). It generally suffices to use 1cM. If the bim file does not contain genetic distances, an approximate solution is to instead use --window-kb 1000.

In most cases, you should use --extract to restrict to predictors for which you have summary statistics (when estimating Genetic Correlations, you might also want to restrict to non-ambiguous predictors). Additionally, you should use --exclude to ignore predictors within the major histocompatibility complex and those in LD with large-effect loci (see Summary Statistics for more details).

The tagging file is saved in <output>.tagging.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

To calculate a multi-part tagging file, you should use either --partition-number and --partition-prefix, or --annotation-number and --annotation-prefix. See Heritability Enrichments for details.

By default, LDAK will exclude SNPs with weighting zero; to prevent this add --reduce NO (this is often necessary when wishing to compare the fit of different heritability models).

To parallelize this process, you can add --chr to calculate taggings for each chromosome separately, then combine these using the argument --join-tagging <output> with the option --taglist <list_of_tagfiles> (see the example below).
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Example: for this we assume the Reference Panel is stored in binary PLINK format in the files ref.bed, ref.bim and ref.fam. We use the files height.txt, height.predictors, height.nonamb and height.exclude constructed in the example for Summary Statistics, as well as the weightings file sumsect/weights.short computed in the example for Heritability Model.

To calculate a tagging file under the LDAK Model, use

../ldak.out --calc-tagging sumldak --bfile ref --weights sumsect/weights.short --power -0.25 --window-cm 1 --extract height.predictors --exclude height.exclude

The taggings are saved in sumldak.tagging. If we wanted to exclude ambiguous predictors we would use --extract height.nonamb instead of --extract height.predictors. To parallelize this process, we can compute weights separately for each chromosome, then merge

for j in {1..22}; do
../ldak.out --calc-tagging sumldak$j --bfile ref --weights sumsect/weights.short --power -0.25 --extract height.predictors --exclude height.exclude --window-cm 1 --chr $j

rm list.txt; for j in {1..22}; do echo "sumldak$j.tagging" >> list.txt; done
../ldak.out --join-tagging sumldak --taglist list.txt

To instead create a tagging file assuming the LDSC Model, replace --weights sumsect/weights.short --power -0.25 with --ignore-weights YES --power -1.