HE Regression

HE (Haseman-Elston) Regression is a method for estimating heritability, which for large datasets (>10,000 samples) is substantially faster than REML, with only a modest loss of precision. It is also able to estimate heritability using only individuals in different cohorts, which should protect against inflation due to genotyping errors (e.g., when including poorly-genotyped SNPs). Note that when including covariates, you should first regress the kinship matrices on the covariates.

The argument for HE Regression is

--he <output>

There is only one required option, but many optional ones.

--pheno <phenofile> - specifies the response (in PLINK format). Individuals without a phenotype will be excluded. If <phenofile> contains more than one phenotype, specify which should be used with --mpheno.

--grm <grmstem> or --mgrm <grmlist> - provide one or more kinship matrices. Note that if using covariates, these should be adjusted kinship matrices.

--region-number and --region-prefix - provide one or more regions, in which case you must also specify the datafiles with --bfile/--gen/--sp/--speed <prefix>, use --weights to specify the predictor weightings (or --ignore-weights YES to set them to 1)  and --power to indicate how to scale predictors (we advise using -0.25). By default, LDAK will remove a regional predictor if (effectively) identical to one which remains (correlation squared > 0.98); to change this threshold use --region-prune.

--covar <covarfile> - provide covariates (in PLINK format) as fixed effects in the regression; when calculating heritabilties, the variance explained by these will be discounted.

--top-preds <list_of_predictors> - provide a list of predictors  to include as fixed effects; when calculating heritabilities, the variance explained by these will be added to that explained by the kinship matrices and regions. Usually, these represent a pruned subset of highly-associated predictors with large effects (see the section "Accommodating loci with very large effects" in our recent paper).

--prevalence <float> - if the phenotype is binary, then specify the population prevalence to obtain estimates of variance explained on the liability scale (note that for binary traits, it is preferable to use PCGC Regression).

--memory-save <YES/NO> (default NO) - by default, LDAK will read into memory all kinship matrices at the start. If there are many kinship matrices, this can require large amounts of memory; therefore, consider adding --memory-save YES, and LDAK will instead read kinship matrices on-the-fly each time they are required.

--permute YES - the phenotypic values will be shuffled. This is useful if wishing to perform permutation analysis to see the distribution of estimates of variance explained obtained when there is no true signal.

The main output files are <output>.he, which contains estimates of the proportion of variance explained by each kinship matrix, region and the top predictors and <output>.share, which provides estimates of the fractions of total variance explained.

If your samples come from multiple cohorts, you can use --subset-number and --subset-prefix to specify which samples are in each cohort (see Subset Options). Then LDAK will also produce the files <output>.he.within, which contains estimates of heritability based only on pairs of samples in the same cohort, and <output>.he.across, which contains estimates of heritability based only on pairs of samples in different cohorts.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Example: for this we use the binary PLINK files test.bed, test.bim and test.fam available in the Test Datasets and the phenotype phen.pheno. We also use the kinship matrices in the folder partitions calculated in Get Kinships, and the kinship matrices adjusted for the covariate file sex.covar calculated in Adjust Kinships.

First we regress the phenotype phen.pheno on the kinship matrix with stem partitions/kinships.all

../ldak.out --he res6 --pheno phen.pheno --grm partitions/kinships.all

To include the covariate file sex.covar, we would add --covar and instead use the adjusted kinship matrix

../ldak.out --he res6b --pheno phen.pheno --grm partitions/kinships.all.sex --covar sex.covar

Next we regress the phenotype on the three kinship matrices listed in partitions/partition.list

../ldak.out --he res7 --pheno phen.pheno --mgrm partitions/partition.list

To include the covariate file sex.covar, instead use

echo "partitions/kinships.1.sex
partitions/kinships.2.sex
partitions/kinships.3.sex" > mlist.txt
../ldak.out --he res7b --pheno phen.pheno --mgrm mlist.txt --covar sex.covar

Finally, here is an example which uses regions

awk < partitions/kinships.1.grm.details '(NR>1){print $1}' > set1
awk < partitions/kinships.2.grm.details '(NR>1){print $1}' > set2
../ldak.out --he res8 --pheno phen.pheno --grm partitions/kinships.3 --region-number 2 --region-prefix set --bfile test --weights sections/weights.short --power -0.25