Technical Details

A general form for the heritability model is

E[h2j] = tau1 a1j + tau2 a2j + ... + tauK aKj

where E[h2j] is the expected heritability (uniquely) contributed by SNP j, a1, a2, ..., aK are vectors of SNP annotations, while tau1, tau2, ..., tauK are the corresponding coefficients. The annotations are specified in advance, while the taus are estimated from the data.

Below we explain how to implement different heritability model in LDAK. When analysing individual-level data, the choice of heritability model determines the number of kinship matrices and how they are calculated. When analysing summary statistics, the choice of heritability model determines how to calculate the tagging file.

As a reminder, we generally recommend using the LDAK-Thin Model when analysing individual-level data (or non-human data), and the BLD-LDAK Model when analysing summary statistics.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

One-parameter heritability models:

We first consider heritability models of the form

E[h2j] = tau1 wj [fj(1-fj)](1+alpha)

where wj is the weighting for SNP j and fj is its minor allele frequency (MAF). To provide SNP weightings use the option --weights <weightsfile> (or --ignore-weights YES to set wj=1), while to specify alpha use --power <float>.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

The GCTA Model assumes E[h2j] = tau1 (i.e., that expected heritability is constant across SNPs). Note that this is the model assumed by any method that first standardizes SNPs, then assigns to each the same prior distribution or penalty function. In LDAK, this model is achieved by adding the options --ignore-weights YES and --power -1 when Calculating Kinships or Calculating Taggings.

The LDAK Model assumes E[h2j] = tau1 wj [fj(1-fj)]0.75, where wj are the LDAK weightings; these will tend to be lower for SNPs in regions of high linkage disequilibrium (LD), and vice versa. Therefore, the LDAK Model assumes that E[h2j] is higher for SNPs in regions of lower LD and for those with higher MAF. In LDAK, this model is achieved by adding the options --weights <weightsfile> and --power -0.25 when Calculating Kinships or Calculating Taggings, where <weightsfile> provides the LDAK Weightings.

The LDAK-Thin Model assumes E[h2j] = tau1 Ij [fj(1-fj)]0.75, where Ij indicates whether SNP j remains after thinning for duplicate SNPs. Like the LDAK Model, the LDAK-Thin Model assumes that E[h2j] is higher for SNPs in regions of lower LD and for those with higher MAF. However, using Ij gives less weight to lower-LD regions than using the LDAK weightings, and thus the LDAK-Thin Model can be viewed as intermediate of the GCTA and LDAK Models. In LDAK, the LDAK-Thin Model is achieved by adding the options --weights <weightsfile> and --power -0.25 when Calculating Kinships or Calculating Taggings, where <weightsfile> gives weight one to the SNPs that remain after Thinning Predictors with options --window-kb 100 and --window-prune 0.98.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Multi-parameter heritability models when analysing individual-level data

When analysing individual-level data, you must calculate one kinship matrix for each annotation (i.e., in total K kinship matrices). For Kinship Matrix k, you would use Calculate Kinships with the options --weights <weightsfile> and --power -1, where <weightsfile> provides akj for each SNP. Note that if the annotations are of the form akj= bkj [fj(1-fj)](1+alpha), where fj is the MAF of SNP j, then it is equivalent to instead use the options --weights <weightsfile> and --power alpha, where <weightsfile> now contains bkj for each SNP.

Multiple kinship matrices are commonly used for Genomic Partitioning, for example, to estimate how much heritability is contributed by each chromosome. In this case, each kinship matrix corresponds to a subset of SNPs. To create each kinship matrix, you should use --extract <extractfile> to specify the corresponding SNP subset (this is equivalent to modifying <weightsfile> so that SNPs outside the subset get weighting zero).

For computational reasons, there is a limit to how many kinship matrices you can analyse at once, which therefore limits how complex the heritability model can be when analysing individual-level data. For example, while it is often feasible (with a large sample size and sufficient computational resources) to analyse 22 kinship matrices, each corresponding to a different chromosome of the human genome, it is almost certainly not feasible to analyse 66 or 75 kinship matrices, which would be required to implement the Baseline LD or BLD-LDAK Models (see below).
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Multi-parameter heritability models when analysing summary statistics

When analysing summary statistics, you must first make files named <prefix>1, <prefix>2, ..., <prefix>K (replace <prefix> with a word of your choice). The file <prefix>k is used to provide ak=(ak1, ak2, ak3, ...), the values for Annotation k. It will usually have two columns, that provide the SNP names then the annotation values. Next you use Calculate Taggings with the options --partition-number K, --partition-prefix <prefix>, --ignore-weights YES and --power -1. Note that SNPs not present in <prefix>k will get akj=0, while if <prefix>k has only one column, then all SNPs within the file get akj=1 (this means that if you have a binary annotation, the file needs only contain the names of the SNPs within the category).

We often recommend using annotations of the form akj= bkj [fj(1-fj)](1+alpha), where fj is the MAF of SNP j. In this case, it is equivalent (and easier) to instead calculate taggings with the options --partition-number K, --partition-prefix <prefix>, --ignore-weights YES and --power alpha, where the file <prefix>k now provides the values of bkj for each SNP. Finally, should you wish to use annotations of the form akj= ckj wj [fj(1-fj)](1+alpha), where wj is the weighting of SNP j, you should use the options --partition-number K, --partition-prefix <prefix>, --weights <weightsfile> and --power alpha, where <weightsfile> contains the weightings, and the file <prefix>k now provides ckj for each SNP.

Note that using the options --annotation-number K-1 and --annotation-prefix <prefix> is the same as using --partition-number K and --partition-prefix <prefix>, except that in the former, the final annotation is the base category with values aKj= wj [fj(1-fj)](1+alpha) (so if using --ignore-weights YES and --power -1, then aKj=1).
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

The authors of LD Score Regression (LDSC) proposed the 75-parameter Baseline LD Model, which takes the form

E[h2j] = tau1 a1j + tau2 a2j + ... + tau74 a74j + tau75

where a1, a2, ..., a74 comprise 67 binary annotations (e.g., indicating which SNPs are in coding regions) and 7 continuous annotations (e.g., estimated allele age). To implement this model in LDAK, you should first download the Baseline LD annot.gz files from the LDSC website, then use these to make files called baselineLD1, baselineLD2, ..., baselineLD74 (where baselineLDk has two columns, providing the SNP names then values of Annotation k). You would then use Calculate Taggings adding --annotation-number 74, --annotation-prefix baselineLD, --ignore-weights YES and --power -1.

Equivalently, you could make an extra file called baselineLD75 that contains the names of all SNPs, then replace --annotation-number 74 and --annotation-prefix baselineLD with --partition-number 75 and --partition-prefix baselineLD.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

We created the 66-parameter BLD-LDAK Model by removing 10 binary annotations from the Baseline LD Model, then incorporating the LDAK weightings and scaling annotations based on MAF. It takes the form

E[h2j] = [fj(1-fj)]0.75 x (tau1 b1j + tau2 b2j + ... + tau64 b64j + tau65 wj + tau66)

where b1, b2, ..., b64 are the non-MAF annotations from the Baseline LD Model and wj is the LDAK weighting (computed using only high-quality SNPs).

The easiest way to implement the BLD-LDAK Model is to use the Pre-computed Taggings. However, if you wish to construct the tagging files yourself, you should first download the files bld1, bld2, ..., bld64 from the BLD-LDAK Annotations. Next calculate the LDAK Weightings and rename them bld65. Finally, use Calculate Taggings adding --annotation-number 65, --annotation-prefix bld, --ignore-weights YES and --power -0.25.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

The 67-parameter BLD-LDAK+Alpha Model generalizes the BLD-LDAK Model by allowing alpha to vary (instead of fixing it to -0.25). SumHer is unable to estimate alpha directly, so we instead create 31 instances of the model, corresponding to alpha equals -1, -0.95, ..., 0.45, 0.5, and see which fits the data best.

To implement this model in LDAK, you should begin by downloading the files bld1, bld2, ..., bld64 from the BLD-LDAK Annotations. Next calculate the LDAK Weightings and rename them bld65. Finally use Calculate Taggings 31 times, first adding --annotation-number 65, --annotation-prefix bldldak, --ignore-weights YES and --power -1, then adding --annotation-number 65, --annotation-prefix bldldak, --ignore-weights YES and --power -0.95, and so on.

Note that we are unable to provide tagging files corresponding to the BLD-LDAK+Alpha Model (due to their size). However, you can instead download Pre-Computed Taggings for the 8-parameter BLD-LDAK-Lite+Alpha Model, which is a simplified version of the BLD-LDAK+Alpha Model (specifically, it uses only Numbers 57, 61, 62, 63, 64 & 65 of the BLD-LDAK Annotations; for an explanation, click here).