REML Analysis

LDAK includes a generalized REML solver for estimating the proportions of variance explained by kinship matrices and/or regions. A region is a (usually small) subset of predictors; providing a region to REML is equivalent to first calculating a kinship matrix across the predictors in this region, then providing this matrix to REML. When the phenotype is binary, the solver can transform estimates of variance explained to the liability scale. The argument for REML is

–reml <output>

to which the following options can be added (to restrict to a subset of the data see Data Filtering):

–pheno <phenofile> – required to provide the responses (in PLINK format). Individuals without a phenotype will be excluded. If <phenofile> contains more than one phenotype, specify which should be used with –mpheno.

–grm <grmstem> or –mgrm <grmlist> – to provide one or more kinship matrices.

–region-number and –region-prefix – to provide one or more regions, in which case you must also specify the datafiles with –bfile/–chiamo/–sp/–speed <prefix> and use –weights or –ignore-weights YES.
By default, LDAK will remove a predictor if (effectively) identical to one which remains (correlation squared > 0.995). To change the threshold use –prune <float>; for example, a lower threshold will produce models with fewer predictors (at the possible expense of accuracy), while a threshold above one will result in no pruning.

–covar <covarfile> – to provide covariates (in PLINK format) as fixed effects in the regression

–permute YES – the phenotypic values will be shuffled. This is useful if wishing to perform permutation analysis, to see the distribution of estimates of variance explained obtained when there is no true signal.

–prevalence <float> – if the phenotype is binary, then specify the population prevalence to obtain estimates of variance explained on the liability scale.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

The following files are produced:

<output>.reml – contains the estimates of variance explained (with SDs) for each kinship matrix and region, as well as estimates of the fixed effects (if no covariates are provided, the only fixed effect will be the intercept).

<output>.liab – contains the estimates of variance explained on the liability scale (if phenotype is binary and –prevalence specified).

<output>.indi.blp – contains the estimates of random effects (breeding values). There is a pair of columns for each kinship matrix or region: the second of each pair provides g, the estimated breeding values; the first provides K-1g, the estimated breeding values pre-multiplied by the inverse of the kinship matrix.

<output>.indi.res – provides for each individual its phenotypic value, the total genetic effect (the sum of the breeding values in <output>.indi.blp) and the phenotype with the breeding values discounted.

<output>.reg.blp – contains the estimates of predictor effect sizes for the regions. The first four columns provide the predictor name, Allele 1 (test allele), Allele 2 (reference allele) and the predictor centre (the mean of its allele count with respect to Allele 1). The remaining columns provide each predictor’s effect size for each region; however, as typically predictors only feature in each region, the effect sizes for most predictors will be zero for all but one region.