REML / HE / PCGC

Here we focus on heritability analyses designed for use with individual-level data from unrelated samples. If you have GWAS summary statistics, you should instead use SumHer, while if you have related samples, you should instead consider using TetraHer or QuantHer.

When performing heritability analysis, there are three important points to note:

1 - Most analyses require careful quality control. For example, estimates of SNP heritability can be very sensitive to population structure, family relatedness and genotyping errors. Therefore, you should begin by reading Quality Control, which provides guidelines for cleaning your dataset, as well as ways to test for inflation due to genotyping errors or cryptic relatedness.

2 - Most analyses require a large number of unrelated samples. For example, to reliably estimate SNP heritability (standard deviation less than 5%), you generally need at least 7,000 unrelated samples. If after performing quality control, your dataset contains fewer than 5000 unrelated samples, then you will not be able to perform heritability analysis, sorry. However, you may instead be able to perform single-SNP or gene/chunk-based association analysis. See Small Datasets for more details.

3 - Most analyses can be performed with non-SNP data. Heritability analyses were originally designed for use with SNP data, in which case they produce estimates of SNP heritability. However, they can equally be used with other types of genetic variants. For example, they can be used with methylation data in order to estimate methylation heritability. Be aware that when using non-SNP data, Points 1 & 2 still apply (i.e., you should first perform careful quality control, and you will require large numbers of unrelated samples).
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

When performing heritability analyses, the first step is to Calculate Kinships. The number of kinship matrices, and how each is calculated, is determined by the choice of Heritability Model.

The second step is to estimate the heritability contributed by each kinship matrix. For this, you can use restricted maximum likelihood (REML), Haseman Elston (HE) regression or phenotype-correlation, genotype-correlation (PCGC) regression. REML and HE can be used for both quantitative and binary phenotypes, however PCGC can only be used for binary phenotypes.

REML is generally preferred because it produces the most precise estimates of heritability (those with the smallest standard deviation). Furthermore, you can subsequently obtain estimates of SNP effect sizes via BLUP (Best Linear Unbiased Prediction).

However, if you have many samples (say, over 30,000), you should consider instead using HE regression (if the phenotype is quantitative) or PCGC regression (if the phenotype is binary). These two methods are less computationally intensive than REML (especially when there are multiple kinship matrices). Moreover, because they make fewer assumptions than REML (in particular, they do not assume effect sizes are Gaussian), they will tend to produce less-biased estimates. Furthermore, it has been shown that for binary traits, PCGC should be preferred to REML when covariates explain a substantial proportion of phenotype variation (e.g., if analysing a disease for which sex is an important risk factor).