Publications

Below are summaries of our major publications (starting with the most recent). If you use LDAK in an analysis, please cite whichever you consider most relevant.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Improved genetic prediction of complex traits from individual-level data or summary statistics, currently on Biorxiv. Most prediction tools assume the GCTA Model, whereby each SNP is expected to contribute equally to the phenotype. We create new versions of eight widely-used tools (lasso, BLUP, Bolt-LMM, BayesR, lassosum, sBLUP, LDPred and SBayesR) that allow the user to specify the heritability model. We show that for each tool, when we replace the GCTA Model with a more realistic model (in particular, the LDAK-Thin or BLD-LDAK Model), prediction accuracy always improves, on average by about 10%.

When constructing prediction models from individual-level data, we recommend using Bolt-Predict, which is an improved version of the original Bolt-LMM software; when constructing prediction models from summary statistics, we recommend using MegaPRS to create a BayesR model.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Evaluating and improving heritability models using summary statistics, Nature Genetics, 2020. We previously compared heritability models based on the model fit from REML. However, this approach requires individual-level data and is only feasible for simple heritability models. In this paper, we proposed loglSS, a log likelihood that can be computed from summary statistics and for complex heritability models. Using loglSS, we showed that when analysing human complex traits, the 75-parameter Baseline LD Model is the most realistic of the existing heritability models, but that it can be significantly improved by incorporating features from the LDAK Model. This resulted in the 66-parameter BLD-LDAK and 67-parameter BLD-LDAK+Alpha Models, which we first used to estimate SNP heritability and heritability enrichments for 31 complex traits, then to measure the impact of selection (specifically, we estimated the power parameter alpha, that determines the relationship between effect sizes and allele frequency).

Based on our results, we now recommend the BLD-LDAK Model. However, some analyses (particularly those using individual-level data) can not accommodate multi-parameter heritability models. When this is the case, we recommend the LDAK-Thin Model, which was the best-performing of the one-parameter models.

Edit
Heritability Model K loglSS AIC
GCTA Model 1 56 0
LDAK Model 1 174 -111
LDAK-Thin Model 1 257 -349
GCTA+1Fun Model 2 257 -513
LDAK+1Fun Model 2 145 -288
GCTA-LDMS-R Model 20 173 -309
GCTA-LDMS-I Model 20 179 -321
LDAK+24Fun Model 25 220 -391
Baseline Model 53 430 -756
BLD-LDAK Model 66 611 -1092
BLD-LDAK+Alpha Model 67 612 -1092
Baseline LD 75 561 -974

The table above reports, for 12 heritability models, the number of parameters (K) and average model fit (loglSS) across 14 traits from the UK Biobank. We advise ranking models based on the Akaike Information Criterion (AIC), according to which the BLD-LDAK and BLD-LDAK+Alpha Models perform best, while the LDAK-Thin Model is the best of the one-parameter heritability models.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Summary statistic analyses can mistake confounding bias for heritability, Genetic Epidemiology, 2019. Estimates of confounding bias (both those from LD Score Regression or SumHer) are based on an assumption that the inflation of association study test statistics caused by relatedness or population structure is constant across SNPs. However, this paper found examples where the inflation was SNP-specific, violating the assumption and resulting in misleading estimates of confounding bias. It is for this reason that we now recommend not allowing for confounding bias when using SumHer, and therefore only analysing summary statistics when confident they come from an association study that performed careful quality control.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

SumHer better estimates the SNP heritability of complex traits from summary statistics, Nature Genetics, 2019. We proposed SumHer, our software for performing heritability analyses using summary statistics. In essence, SumHer is a version of LD Score Regression that allows the user to specify the heritability model. Its four main aims are estimating SNP heritability, confounding bias, heritability enrichments and genetic correlations.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Reevaluation of SNP heritability in complex human traits, Nature Genetics, 2017. We compared the GCTA and LDAK heritability models based on the model fit from REML, finding that the LDAK Model performed best for 36 out of 42 complex traits. We showed that the original LDAK Model could be improved by weighting SNPs based on minor allele frequency (as well as based on linkage disequilibrium). We also demonstrated that previous estimates of the heritability contributed by DNAse I Hypersensitivity Sites, obtained assuming the GCTA Model, were likely to be exaggerated.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

MultiBLUP: improved SNP-based prediction for complex traits, Genome Research, 2014. A common method for constructing prediction models is Best Linear Unbiased Prediction (BLUP). BLUP assumes that every SNP has a very small contribution to the phenotype. MultiBLUP generalizes the BLUP model by allowing some regions of genome to have a large impact. For a few years, MultiBLUP was the best prediction method (however, I now instead recommend Bolt-Predict if using individual-level data and MegaPRS if using summary statistics).
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Describing the genetic architecture of epilepsy through heritability analysis, Brain, 2014. We used LDAK to estimate the SNP heritability of epilepsy, to estimate the number of causal variants and to demonstrate that partial and generalized epilepsy are genetically distinct subtypes.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Improved heritability estimation from genome-wide SNPs, AJHG, 2012. The first estimates of SNP heritability were obtained assuming the GCTA Model, in which all SNPs are expected to contribute equally heritability. However, we found that using the GCTA Model will result in biased estimates if causal variants are predominantly in regions of high or low linkage disequilibrium (LD). To guard against these biases, we proposed the first LDAK Model, which introduces weightings that reduce the expected contribution of SNPs in high-LD regions. We analysed seven diseases from the WTCCC, finding that estimates of SNP heritability using the LDAK Model tended to be higher than those using the GCTA Model.