Heritability Enrichments

We now consider the case where we have categories of predictors. Our aim is to determine whether each category contributes more or less than expected under a model of no enrichment. Now, when using --calc-tagging to compute the Tagging File, we must specify the predictor categories; we do this by either adding the options --partition-number and --partition-prefix or --annotation-number and --annotation-prefix.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Scenario 1: the categories partition the predictors. When calculating taggings, add the options --partition-number and --partition-prefix. Under a model of no enrichment, each partition is expected to contribute according to its size, so our aim is to identify partitions which explain more or less than expected.

Scenario 2: the categories correspond to annotations. When calculating taggings, add the options --annotation-number and --annotation-prefix. Under a model of no enrichment, the annotations a predictor belongs to have no impact on its contribution, so our aim is to identify annotations which have a positive or negative impact.

Having calculated the tagging file, we regress the summary statistics onto this using --sum-hers (this command is explained in SNP Heritability). When the tagging file was constructed using partitions, the key output files are those with suffixes .hers and .share; when using annotations, you might also be interested in those with suffixes .cats and .enrich. A description of the output files are provided in the example below.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Example: for this we assume the Reference Panel is stored in binary PLINK format in the files ref.bed, ref.bim and ref.fam. We use the files height.txt, height.predictors and height.exclude constructed in the example for Summary Statistics, as well as the weightings file sumsect/weights.short computed in the example for Heritability Model.

We also use the file genes/genes.predictors.used, which provides a list of genic predictors, as well as the files ann_snps.1, ann_snps.2, ..., ann_snps.24, which indicate which predictors are in each of the 24 functional categories considered by Finucane et al. (see Functional Annotations for more details).
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

First we will estimate the heritability enrichment of genic SNPs. For this, we partition the genome into genic and inter-genic SNPs, so use the options --partition-number and --partition-prefix. First we save the genic predictors in genes.1 and the inter-genic predictors in genes.2.

cp genes/genes.predictors.used genes.1
awk '(NR==FNR){arr[$1];next}!($2 in arr){print $2}' genes.1 ref.bim > genes.2

To create the tagging file use

../ldak.out --calc-tagging sumgenes --bfile ref --weights sumsect/weights.short --power -0.25 --extract height.predictors --exclude height.exclude --window-cm 1 --partition-number 2 --partition-prefix genes.

A 2-part tagging file will be saved in sumgenes.tagging. Note that we could achieve the same using

../ldak.out --calc-tagging sumgenes --bfile ref --weights sumsect/weights.short --power -0.25 --extract height.predictors --exclude height.exclude --window-cm 1 --partition-number 1 --partition-prefix genes. --background YES

Now we regress the test statistics onto the tagging file (read the screen output to see whether it is necessary to add --check-sums NO)

../ldak.out --sum-hers sumgenes --tagfile sumgenes.tagging --summary height.txt

The main output files are sumgenes.hers and sumgenes.share, which provide estimates of the SNP heritability contributed by each partition, these values reported as fractions of total SNP heritability, and the influence of each partition. The latter indicates how the per-SNP heritability of a partition compares to the average per-SNP heritablity. For example, if a partition has influence 0.5 (-0.5), this means that its SNPs on average contribute 50% more (less) than the overall average.

Additional results are provided in sumgenes.cats and sumgenes.enrich, but because we are considering partitions, these (essentially) provide the same information as sumgenes.hers and sumgenes.share.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Now we estimate enrichments for the 24 functional categories. These represent (overlapping) annotations rather than partitions, so when calculating the tagging file we add --annotation-number and --annotation.prefix.

../ldak.out --calc-tagging sumanns --bfile ref --weights sumsect/weights.short --power -0.25 --extract height.predictors --exclude height.exclude --window-cm 1 --annotation-number 24 --annotation-prefix ann_snps.

The file sumanns.tagging is a 25-part part tagging file (one part for each category, plus one corresponding to the base category which contains all SNPs). Next we regress the test statistics onto the tagging file (again you will likely have to add --check-sums NO)

../ldak.out --sum-hers sumanns --tagfile sumanns.tagging --summary height.txt

In our view, the most important output files are sumanns.hers and sumanns.share, which provide estimates of the (direct) SNP heritability of each annotation, these values reported as fractions of total SNP heritability, and the influence of each annotation. The latter values indicates whether each annotation increases or decreases per-SNP heritability. For example, if an annotation has influence 0.5 (-0.5), then on average it raises (lowers) per-SNP heritability by 50%.

The files sumanns.cats and sumanns.enrich provides estimates of the cumulative SNP heritability of each annotation, these values expressed as fractions of the total SNP heritability, and the enrichment of each annotation.  These are the values reported by Finucane et al., so for formal definitions please see their paper. However, to get a rough idea, suppose we have 1,000 SNPs of which SNPs 1-200 belong to Annotation 1 and SNPs 151-350 belong to Annotation 1. The cumulative SNP heritability of Annotation 1 will equal the direct SNP heritability of Annotation 1, plus a quarter of the direct SNP heritability of Annotation 2 (because 25% of the SNPs in Annotation 2 are also in Annotation 1), and a fifth of the direct SNP heritability of the base category (because 20% of the SNPs in the base category are also in Annotation 1). The enrichment of Annotation 1 is obtained by dividing its cumulative SNP heritability by the total SNP heritability, then dividing the result by its expected value under a model of no enrichment (here 0.2, because Annotation 1 contains 20% of SNPs).

If the aim is to identify the most important subsets of predictors, then the annotation enrichments are most relevant. However, we feel that in terms of understanding the biology of a trait, the influence of an annotation is more informative than its enrichment. For example, Finucane et al. found that both coding SNPs and conserved regions were enriched. However, these annotations highly overlap (for example, approximately a third of coding SNPs are in coding regions). Therefore, it is possible that only conserved SNPs are important, and that the observed enrichment of coding regions is predominantly due to their overlap with conserved regions (or vice versa). By considering the influence of annotations, rather than their enrichments, we should get a better idea of which annotations are driving enrichments.