Technical Details

A general form for the heritability model is

E[h2j] = tau1 a1j + tau2 a2j + ... + tauK aKj

where E[h2j] is the expected heritability (uniquely) contributed by SNP j, a1, a2, ..., aK are SNP annotations, while tau1, tau2, ..., tauK are the corresponding coefficients. The annotations are specified in advance, while the taus are estimated from the data.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Note that heritability models are used throughout statistical genetics. However, most analyses do not explicitly explain which heritability model they use, nor justify why the model they use is appropriate. In particular, whenever a method first standardizes SNPs, then assigns to each the same prior distribution or penalty function, this corresponds to the assumption that E[h2j] is constant.

Below we explain how to specify the heritability model in LDAK. When analysing individual-level data, the choice of heritability model determines the number of kinship matrices and how they are calculated. When analysing summary statistics, the choice of heritability model determines how to calculate the tagging file.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Simple heritability models:

We first consider one-parameter heritability models of the form

E[h2j] = tau1 wj [fj(1-fj)](1+alpha)

where wj is the weighting for SNP j and fj is its minor allele frequency (MAF). To provide SNP weightings use the option --weights (or --ignore-weights YES to set wj=1), while to specify alpha use --power.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

The GCTA Model assumes E[h2j] = tau1 (i.e., that expected heritability is constant across SNPs). In LDAK, this model is achieved by adding the options --ignore-weights YES and --power -1 when calculating the kinship matrix or tagging file.

The LDAK Model assumes E[h2j] = tau1 wj [fj(1-fj)]0.75, where wj are the LDAK weightings; these will tend to be lower for SNPs in regions of high linkage disequilibrium (LD), and vice versa. Therefore, the LDAK Model assumes that E[h2j] is higher for SNPs in regions of lower LD and for those with higher MAF. In LDAK, this model is achieved by adding the options --weights <weightsfile> and --power -0.25 when calculating the kinship matrix or tagging file, where <weightsfile> are the LDAK weightings computed using Calculate Weightings.

The LDAK-Thin Model assumes E[h2j] = tau1 Ij [fj(1-fj)]0.75, where Ij indicates whether SNP j remains after thinning for duplicate SNPs. Like the LDAK Model, the LDAK-Thin Model assumes that E[h2j] is higher for SNPs in regions of lower LD and for those with higher MAF. However, using Ij gives less weight to lower-LD regions than using the LDAK weightings, and thus the LDAK-Thin Model can be viewed as intermediate of the GCTA and LDAK Models. In LDAK, the LDAK-Thin Model is achieved by adding the options --ignore-weights YES--power -0.25 and --extract <extractlist> when calculating the kinship matrix or tagging file, where <extractlist> are the SNPs that remain after Thinning (with options --window-kb 100 and --window-prune 0.98).
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Complex heritability models when analysing individual-level data

When analyzing individual-level data, you must calculate one kinship matrix for each annotation (i.e., in total K kinship matrices). For Kinship Matrix k, you would use Calculate Kinships with the options --weights <weightsfile> and --power -1, where <weightsfile> contains akj for each SNP. Note that if the annotations are of the form akj= Pkj [fj(1-fj)](1+alpha), then it is equivalent to instead use Calculate Kinships with the options --weights <weightsfile> and --power alpha, where <weightsfile> now contains Pkj for each SNP.

For computational reasons, there is a limit to how many kinship matrices you can analyse at once, which therefore limits how complex the heritability model can be when analysing individual-level data. For example, while it may be feasible (with a large sample size and sufficient computational resources) to analyse 22 kinship matrices, each corresponding to a different chromosome of the human genome (useful if wishing to perform Genomic Partitioning), it is almost certainly not feasible analyse 66 or 75 kinship matrices, which would be required to implement the BLD-LDAK or Baseline LD Models (see below).
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Complex heritability models when analysing summary statistics

When analysing summary statistics, you must first make files named <prefix>1, <prefix>2, ..., <prefix>K (replace <prefix> with a word of your choice); the file <prefix>k should have two columns, that provide the SNP names and corresponding values for Annotation k. Then you use Calculate Taggings with the options --partition-number K, --partition-prefix <prefix> and --power -1. Note that SNPs not present in <prefix>k will get akj=0, while if <prefix>k has only one column, then all SNPs within the file get akj=1 (this means that if you have a binary annotation, the file needs only contain the names of SNPs within the category).

If the annotations are of the form akj= Pkj [fj(1-fj)](1+alpha), then it is equivalent to instead use Calculate Taggings with the options --partition-number K, --partition-prefix <prefix> and --power alpha, where the file <prefix>k provides the values of Pkj for each SNP.

Note that using the options --annotation-number K and --annotation-prefix <prefix> is equivalent to using --partition-number K and --partition-prefix <prefix>, except that the former will create K+1 annotations, where the final annotation is a base category with values a(K+1)j= [fj(1-fj)](1+alpha).
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

The authors of LD Score Regression proposed the 75-parameter Baseline LD Model, which takes the form

E[h2j] = tau1 a1j + tau2 a2j + ... + tau74 a74j + tau75

where a1, a2, ..., a74 comprise 67 binary annotations (e.g., indicating which SNPs are in coding regions) and 7 continuous annotations (e.g., estimated allele age). To implement this model in LDAK, you could first download the Baseline LD annotation files from the LDSC website, then use these to make files called baselineLD1, baselineLD2, ..., baselineLD74 (where bldk has two columns, providing first the SNP names then values of Annotation k). You would then use Calculate Taggings adding --annotation-number 74, --annotation-prefix bld and --power -1.

Equivalently, you could make an extra file called baselineLD75 that contains the names of all SNPs, then replace --annotation-number 74 and --annotation-prefix baselineLD with --partition-number 75 and --partition-prefix baselineLD.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

We created the 66-parameter BLD-LDAK Model by removing 10 binary annotations from the Baseline LD Model, then incorporating the LDAK weightings and scaling annotations based on MAF. It takes the form

E[h2j] = tau1 b1j vj0.75 + tau2 b2j vj0.75 + ... + tau64 b64j vj0.75 + tau65 wj vj0.75 + tau66 vj0.75

where b1, b2, ..., b64 are the non-MAF annotations from the Baseline LD Model, wj is the LDAK weighting (computed using only high-quality SNPs) and vj=[fj(1-fj)]0.75. To implement this model in LDAK, you should first download the files bldldak1, bldldak2, ..., bldldak65 from Annotations, then use Calculate Taggings adding --annotation-number 65, --annotation-prefix bld and --power -0.25.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

The 67-parameter BLD-LDAK+Alpha Model generalizes the BLD-LDAK Model by allowing alpha to vary (instead of fixing it to -0.25). SumHer is unable to estimate alpha directly, so we instead create 31 instances of the model, corresponding to alpha equals -1, -0.95, ..., 0.45, 0.5, and see which fits the data best.

To implement this model in LDAK, you should first download the files bldldak1, bldldak2, ..., bldldak65 from Annotations, then use Calculate Taggings 31 times, first adding --annotation-number 65, --annotation-prefix bldldak and --power -1, then adding --annotation-number 65, --annotation-prefix bldldak and --power -0.95, and so on.