H-E Regression

Haseman-Elston regression is an alternative to REML, and as such, its usage is very similar. Its argument is

--he <output>

There is only one required option, but many optional ones.

--pheno <phenofile> - specifies the response (in PLINK format). Individuals without a phenotype will be excluded. If <phenofile> contains more than one phenotype, specify which should be used with --mpheno.

--grm <grmstem> or --mgrm <grmlist> - provide one or more kinship matrices.

--region-number and --region-prefix - provide one or more regions, in which case you must also specify the datafiles with --bfile/--gen/--sp/--speed <prefix>, use --weights to specify the predictor weightings (or --ignore-weights YES to set them to 1)  and --power to indicate how to scale predictors (we advise using -0.25). By default, LDAK will remove a regional predictor if (effectively) identical to one which remains (correlation squared > 0.98); to change this threshold use --region-prune.

--covar <covarfile> - provide covariates (in PLINK format) as fixed effects in the regression; when calculating heritabilties, the variance explained by these will be discounted.

--top-preds <list_of_predictors> - provide a list of predictors  to include as fixed effects; when calculating heritabilities, the variance explained by these will be added to that explained by the kinship matrices and regions. Usually, these represent a pruned subset of highly-associated predictors with large effects (see the section "Accommodating loci with very large effects" in our recent paper).

--prevalence <float> - if the phenotype is binary, then specify the population prevalence to obtain estimates of variance explained on the liability scale.

--memory-save <YES/NO> (default NO) - by default, LDAK will read into memory all kinship matrices at the start. If there are many kinship matrices, this can require large amounts of memory; therefore, consider adding --memory-save YES, and LDAK will instead read kinship matrices on-the-fly each time they are required.

--permute YES - the phenotypic values will be shuffled. This is useful if wishing to perform permutation analysis to see the distribution of estimates of variance explained obtained when there is no true signal.

The main output files are <output>.he, which contains estimates of the proportion of variance explained by each kinship matrix, region and the top predictors and <output>.share, which provides estimates of the fractions of total variance explained.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Example: for this we use the binary PLINK files test.bed, test.bim and test.fam available in the Test Datasets, the phenotype phen.pheno, and the kinship matrices in the folder partitions calculated in Get Kinships.

First we regress the phenotype phen.pheno on the kinship matrix with stem partitions/kinships.all

../ldak.out --he res6 --pheno phen.pheno --grm partitions/kinships.all

Next we regress the phenotype on the three kinship matrices listeed in partitions/partition.list

../ldak.out --he res7 --pheno phen.pheno --mgrm partitions/partition.list

Finally, here is an example which uses regions

awk < partitions/kinships.1.grm.details '(NR>1){print $1}' > set1
awk < partitions/kinships.2.grm.details '(NR>1){print $1}' > set2
../ldak.out --he res8 --pheno phen.pheno --grm partitions/kinships.3 --region-number 2 --region-prefix set --bfile test --weights sections/weights.short --power -0.25