Comparing Models

For many heritability analyses, changing the Heritability Model can lead to very different results. This is particularly true when estimating SNP heritability or heritability enrichments. Therefore, when performing heritability analysis, care should be taken to use an appropriate heritability model.

When analysing human data, we recommend using the LDAK-Thin Model (if analysing individual-level data) or the BLD-LDAK Model (if analysing summary statistics); we describe how to implement these two models in Technical Details. Our recommendation is based on the analysis in our paper Evaluating and improving heritability models using summary statistics (Nature Genetics, 2020), where we compared 12 heritability models using data from 31 complex human traits.

If you would like to compare heritability models yourself, there are two ways: the first is to measure how well each model fits real data; the second is apply hybrid heritability models to real data. Note that some groups have compared models using simulated data. We are against this approach, as the results of a simulation study will be sensitive to the assumptions used when simulating the data.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Measure model fit:

Models can be compared via the log likelihood, computed either for a single phenotype, or averaged across many. Note that if the heritability models are of different complexity, this should be taken into account when comparing likelihoods. For this reason, we prefer ranking models based on the Akaike Information Criterion (AIC), equal to 2K-2logl, where K is the number of parameters in the model and logl is the log likelihood (or average log likelihood, if comparing over multiple phenotypes). Note that lower AIC is better.

If you are analysing individual-level data, you should use the log likelihood from REML. Specifically, you should use the alternative log likelihood reported in the output file with suffix .reml. For a valid comparison, you must compute all kinship matrices using the same set of samples. If this is the case, then the null log likelihood (also reported in the .reml file), will be the same for all heritability models.

If you are analysing summary statistics, you should use the approximate log likelihood loglSS from estimating SNP Heritability. Specifically, you should use the alternative log likelihood reported in the output file with suffix .extra. For a valid comparison, you must compute all tagging files using the same set of predictors. If this is the case, then the null log likelihood (also reported in the .extra file), will be the same for all heritability models.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Create hybrid models:

Hybrid models can be used to test whether it is beneficial to combine heritability models. For example, we have found that the LDAK Model tends to fit real data better than the GCTA Model, indicating that if require to choose, results from the LDAK Model should be preferred. However, an alternative would be to use the GCTA-LDAK Model, obtained by combining the GCTA and LDAK Models. If the AIC from this hybrid model is lower than that from the LDAK Model, this indicates that it is better to use the hybrid model. Similarly, if estimates from the hybrid model are substantially different to those from the LDAK Model, this also indicates that it would be beneficial to use the hybrid model.

If you are analysing individual-level data, then to create the hybrid model you simply include both sets of kinship matrices when performing REML, Hasemen-Elston or PCGC Regression. For the example above, instead of regressing the phenotype on just the GCTA kinship matrix, or on just the LDAK kinship matrix, you would regress it on both (you can provide multiple kinship matrices using the option --mgrm <kinstems>).

If you are analysing summary statistics, you should specify a hybrid heritability model when Calculating Taggings (e.g., include a partition corresponding to the GCTA Model and one corresponding to the LDAK Model), then use the resulting tagging file when estimating SNP Heritability. Note that if you already have tagging files for the two models, and they were calculated using the same Reference and Regression SNPs (see SNP Subsets), you can merge them using the command --merge-tagging <outfile> (use the option --taglist <tagstems> to provide the names of the two tagging files).