Below are summaries of our major publications (starting with the most recent). If you use LDAK in an analysis, please cite whichever you consider most relevant.

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

**Improved genetic prediction of complex traits from individual-level data or summary statistics**, link to appear here. Most prediction tools assume the GCTA Model, whereby each SNP is expected to contribute equally to the phenotype. We consider four of the most widely-used tools: lasso, ridge regression, Bolt-LMM and BayesR. We show that when we replace the GCTA Model with a more realistic heritability model (in particular, the LDAK-Thin or BLD-LDAK Models), prediction accuracy always improves.

When constructing prediction models from individual-level data, we recommend using Bolt-Predict, which improves the original Bolt-LMM software by allowing the user to specify the heritability model; when constructing prediction models from summary statistics, we recommend using MegaPRS, our new tool that constructs a variety of lasso, ridge regression, Bolt-LMM and BayesR models, then selects the best one.

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

**Evaluating and improving heritabilty models using summary statistics**, Nature Genetics, 2020. We previously compared heritability models based on the model fit from REML. However, this approach requires individual-level data and is only feasible for simple heritability models. In this paper, we proposed loglSS, a log likelihood that can be computed from summary statistics and for complex heritability models. Using loglSS, we showed that when analysing human complex traits, the 75-parameter Baseline LD Model is the most realistic of the existing heritability models, but that it can be significantly improved by incorporating features from the LDAK Model. This resulted in the 66-parameter BLD-LDAK and 67-parameter BLD-LDAK+Alpha Model, which we first used to estimate SNP heritability and heritabilty enrichments for 31 complex traits, then to measure the impact of selection (specifically, we estimated the power parameter alpha, that determines the relationship between effect sizes and allele frequency).

Based on our results, we now recommend the BLD-LDAK Model. However, some analyses (particularly those using individual-level data) can not accommodate multi-parameter heritability models. When this is the case, we recommend the LDAK-Thin Model, which was the best-performing of the one-parameter models.

Heritability Model | K | loglSS | AIC |

GCTA Model | 1 | 56 | 0 |

LDAK Model | 1 | 174 | -111 |

LDAK-Thin Model | 1 | 257 | -349 |

GCTA+1Fun Model | 2 | 257 | -513 |

LDAK+1Fun Model | 2 | 145 | -288 |

GCTA-LDMS-R Model | 20 | 173 | -309 |

GCTA-LDMS-I Model | 20 | 179 | -321 |

LDAK+24Fun Model | 25 | 220 | -391 |

Baseline Model | 53 | 430 | -756 |

BLD-LDAK Model | 66 | 611 | -1092 |

BLD-LDAK+Alpha Model | 67 | 612 | -1092 |

Baseline LD | 75 | 561 | -974 |

The table above reports, for 12 heritability models, the number of parameters (K) and average model fit (loglSS) across 14 traits from the UK Biobank. We advise ranking models based on the Akaike Information Criterion (AIC), according to which the BLD-LDAK and BLD-LDAK+Alpha Models perform best, while the LDAK-Thin Model is the best of the one-parameter heritability models.

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

**Summary statistic analyses can mistake confounding bias for heritability**, Genetic Epidemiology, 2019. Estimates of confounding bias (both those from LD Score Regression or SumHer) are based on an assumption that the inflation of GWAS test statistics caused by relatedness or population structure is constant across SNPs. However, this paper found examples where the inflation was SNP-specific, violating the assumption and resulting in misleading estimates of confounding bias. It is for this reason that we now recommend not allowing for confounding bias when using SumHer, and therefore only analysing summary statistics when confident they come from a GWAS that performed careful quality control.

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

**SumHer better estimates the SNP heritability of complex traits from summary statistics**, Nature Genetics, 2019. We proposed SumHer, our software for performing heritability analyses using summary statistics. In essence, SumHer is a version of LD Score Regression that allows the user to specify the heritability model. Its four main aims are estimating SNP heritability, confounding bias, heritability enrichments and genetic correlations.

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

**Reevaluation of SNP heritability in complex human traits**, Nature Genetics, 2017. We compared the GCTA and LDAK heritability models based on the model fit from REML, finding that the LDAK Model performed best for 36 out of 42 complex traits. We showed that the original LDAK Model could be improved by weighting SNPs based on minor allele frequency (as well as based on linkage disequilibrium). We also demonstrated that previous estimates of the heritability contributed by DNAse Hypersensitivity Sites, obtained assuming the GCTA Model, were likely to be exaggerated.

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

**MultiBLUP: improved SNP-based prediction for complex traits**, Genome Research, 2014. A common method for constructing prediction models is Best Linear Unbiased Prediction (BLUP). BLUP assumes that every SNP has a very small contribution to the phenotype. MultiBLUP generalizes the BLUP model by allowing some regions of genome to have a large impact. For a few years, MultiBLUP was best prediction method (however, now, I would instead recommend Bolt-Predict if using individual-level data, or MegaPRS if using summary statistics).

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

**Describing the genetic architecture of epilepsy through heritability analysis**, Brain, 2014. We used LDAK to estimate the SNP heritability of epilepsy, to estimate the number of causal variants and to demonstrate that partial and generalized epilepsy are genetically distinct subtypes.

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

**Improved heritability estimation from genome-wide SNPs**, AJHG, 2012. The first estimates of SNP heritability were obtained assuming the GCTA Model, in which all SNPs are expected to contribute equally heritability. However, we found that using the GCTA Model will result in biased estimates if causal variants are predominantly in regions of high or low linkage disequilibrium (LD). To guard against these biases, we proposed the first LDAK Model, which introduces weightings that reduce the expected contribution of SNPs in high-LD regions. We analysed seven diseases from the WTCCC, finding that estimates of SNP heritability using the LDAK Model were higher than those using the GCTA Model.