A general form for the heritability model is

E[h^{2}_{j}] = tau_{1} a_{1j} + tau_{2} a_{2j} + ... + tau_{K} a_{Kj}

where E[h^{2}_{j}] is the expected heritability (uniquely) contributed by SNP j, a_{1}, a_{2}, ..., a_{K} are SNP annotations, while tau_{1}, tau_{2}, ..., tau_{K} are the corresponding coefficients. The annotations are specified in advance, while the taus are estimated from the data.

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Note that heritability models are used throughout statistical genetics. However, most analyses do not explicitly explain which heritability model they use, nor justify why the model they use is appropriate. In particular, whenever a method first standardizes SNPs, then assigns to each the same prior distribution or penalty function, this corresponds to the assumption that E[h^{2}_{j}] is constant.

Below we explain how to specify the heritability model in LDAK. When analysing individual-level data, the choice of heritability model determines the number of kinship matrices and how they are calculated. When analysing summary statistics, the choice of heritability model determines how to calculate the tagging file.

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

**Simple heritability models:**

We first consider one-parameter heritability models of the form

E[h^{2}_{j}] = tau_{1} w_{j} [f_{j}(1-f_{j})]^{(1+alpha)}

where w_{j} is the weighting for SNP j and f_{j} is its minor allele frequency (MAF). To provide SNP weightings use the option --weights (or --ignore-weights YES to set w_{j}=1), while to specify alpha use --power.

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

The **GCTA Model** assumes E[h^{2}_{j}] = tau_{1} (i.e., that expected heritability is constant across SNPs). In LDAK, this model is achieved by adding the options --ignore-weights YES and --power -1 when calculating the kinship matrix or tagging file.

The **LDAK Model** assumes E[h^{2}_{j}] = tau_{1} w_{j} [f_{j}(1-f_{j})]^{0.75}, where w_{j} are the LDAK weightings; these will tend to be lower for SNPs in regions of high linkage disequilibrium (LD), and vice versa. Therefore, the LDAK Model assumes that E[h^{2}_{j}] is higher for SNPs in regions of lower LD and for those with higher MAF. In LDAK, this model is achieved by adding the options --weights <weightsfile> and --power -0.25 when calculating the kinship matrix or tagging file, where <weightsfile> are the LDAK weightings computed using Calculate Weightings.

The **LDAK-Thin Model** assumes E[h^{2}_{j}] = tau_{1} I_{j} [f_{j}(1-f_{j})]^{0.75}, where I_{j} indicates whether SNP j remains after thinning for duplicate SNPs. Like the LDAK Model, the LDAK-Thin Model assumes that E[h^{2}_{j}] is higher for SNPs in regions of lower LD and for those with higher MAF. However, using Ij gives less weight to lower-LD regions than using the LDAK weightings, and thus the LDAK-Thin Model can be viewed as intermediate of the GCTA and LDAK Models. In LDAK, the LDAK-Thin Model is achieved by adding the options --ignore-weights YES, --power -0.25 and --extract <extractlist> when calculating the kinship matrix or tagging file, where <extractlist> are the SNPs that remain after Thinning (with options --window-kb 100 and --window-prune 0.98).

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

**Complex heritability models when analysing individual-level data
**

When analyzing individual-level data, you must calculate one kinship matrix for each annotation (i.e., in total K kinship matrices). For Kinship Matrix k, you would use Calculate Kinships with the options --weights <weightsfile> and --power -1, where <weightsfile> contains a_{kj} for each SNP. Note that if the annotations are of the form a_{kj}= P_{kj} [f_{j}(1-f_{j})]^{(1+alpha)}, then it is equivalent to instead use Calculate Kinships with the options --weights <weightsfile> and --power alpha, where <weightsfile> now contains P_{kj} for each SNP.

For computational reasons, there is a limit to how many kinship matrices you can analyse at once, which therefore limits how complex the heritability model can be when analysing individual-level data. For example, while it may be feasible (with a large sample size and sufficient computational resources) to analyse 22 kinship matrices, each corresponding to a different chromosome of the human genome (useful if wishing to perform Genomic Partitioning), it is almost certainly not feasible analyse 66 or 75 kinship matrices, which would be required to implement the BLD-LDAK or Baseline LD Models (see below).

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

**Complex heritability models when analysing summary statistics**

When analysing summary statistics, you must first make files named <prefix>1, <prefix>2, ..., <prefix>K (replace <prefix> with a word of your choice); the file <prefix>k should have two columns, that provide the SNP names and corresponding values for Annotation k. Then you use Calculate Taggings with the options --partition-number K, --partition-prefix <prefix> and --power -1. Note that SNPs not present in <prefix>k will get a_{kj}=0, while if <prefix>k has only one column, then all SNPs within the file get a_{kj}=1 (this means that if you have a binary annotation, the file needs only contain the names of SNPs within the category).

If the annotations are of the form a_{kj}= P_{kj} [f_{j}(1-f_{j})]^{(1+alpha)}, then it is equivalent to instead use Calculate Taggings with the options --partition-number K, --partition-prefix <prefix> and --power alpha, where the file <prefix>k provides the values of P_{kj} for each SNP.

Note that using the options --annotation-number K and --annotation-prefix <prefix> is equivalent to using --partition-number K and --partition-prefix <prefix>, except that the former will create K+1 annotations, where the final annotation is a base category with values a_{(K+1)j}= [f_{j}(1-f_{j})]^{(1+alpha)}.

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

The authors of LD Score Regression proposed the 75-parameter **Baseline LD** **Model**, which takes the form

E[h^{2}_{j}] = tau_{1} a_{1j} + tau_{2} a_{2j} + ... + tau_{74} a_{74j} + tau_{75}

where a_{1}, a_{2}, ..., a_{74} comprise 67 binary annotations (e.g., indicating which SNPs are in coding regions) and 7 continuous annotations (e.g., estimated allele age). To implement this model in LDAK, you could first download the Baseline LD annotation files from the LDSC website, then use these to make files called baselineLD1, baselineLD2, ..., baselineLD74 (where bldk has two columns, providing first the SNP names then values of Annotation k). You would then use Calculate Taggings adding --annotation-number 74, --annotation-prefix bld and --power -1.

Equivalently, you could make an extra file called baselineLD75 that contains the names of all SNPs, then replace --annotation-number 74 and --annotation-prefix baselineLD with --partition-number 75 and --partition-prefix baselineLD.

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

We created the 66-parameter **BLD-LDAK Model** by removing 10 binary annotations from the Baseline LD Model, then incorporating the LDAK weightings and scaling annotations based on MAF. It takes the form

E[h^{2}_{j}] = tau_{1} b_{1j} v_{j0.75 }+ tau_{2} b_{2j} v_{j0.75} + ... + tau_{64} b_{64j} v_{j0.75} + tau_{65} w_{j} v_{j0.75 }+ tau_{66} v_{j0.75}

where b_{1}, b_{2}, ..., b_{64} are the non-MAF annotations from the Baseline LD Model, w_{j} is the LDAK weighting (computed using only high-quality SNPs) and v_{j}=[f_{j}(1-f_{j})]^{0.75}. To implement this model in LDAK, you should first download the files bldldak1, bldldak2, ..., bldldak65 from Annotations, then use Calculate Taggings adding --annotation-number 65, --annotation-prefix bld and --power -0.25.

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

The 67-parameter **BLD-LDAK+Alpha Model** generalizes the BLD-LDAK Model by allowing alpha to vary (instead of fixing it to -0.25). SumHer is unable to estimate alpha directly, so we instead create 31 instances of the model, corresponding to alpha equals -1, -0.95, ..., 0.45, 0.5, and see which fits the data best.

To implement this model in LDAK, you should first download the files bldldak1, bldldak2, ..., bldldak65 from Annotations, then use Calculate Taggings 31 times, first adding --annotation-number 65, --annotation-prefix bldldak and --power -1, then adding --annotation-number 65, --annotation-prefix bldldak and --power -0.95, and so on.