REML

LDAK includes a generalized REML solver for estimating the proportions of variance explained (heritabilities) of kinship matrices and/or regions. A region is a (usually small) subset of predictors; providing a region to REML is equivalent to first calculating a kinship matrix across the predictors in this region, then providing this matrix to REML. When the phenotype is binary, the solver can transform estimates of variance explained to the liability scale. The argument for REML is

--reml <output>

There is only one required option, but many optional ones.

--pheno <phenofile> - specifies the response (in PLINK format). Individuals without a phenotype will be excluded. If <phenofile> contains more than one phenotype, specify which should be used with --mpheno.

--grm <grmstem> or --mgrm <grmlist> - provide one or more kinship matrices.

--region-number and --region-prefix - provide one or more regions, in which case you must also specify the datafiles with --bfile/--gen/--sp/--speed <prefix>, use --weights to specify the predictor weightings (or --ignore-weights YES to set them to 1)  and --power to indicate how to scale predictors (we advise using -0.25). By default, LDAK will remove a regional predictor if (effectively) identical to one which remains (correlation squared > 0.98); to change this threshold use --region-prune.

--covar <covarfile> - provide covariates (in PLINK format) as fixed effects in the regression; when calculating heritabilties, the variance explained by these will be discounted.

--top-preds <list_of_predictors> - provide a list of predictors  to include as fixed effects; when calculating heritabilities, the variance explained by these will be added to that explained by the kinship matrices and regions. Usually, these represent a pruned subset of highly-associated predictors with large effects (see the section "Accommodating loci with very large effects" in our recent paper).

--prevalence <float> - if the phenotype is binary, then specify the population prevalence to obtain estimates of variance explained on the liability scale.

--constrain <YES/NO/REG> (default REG) - by default, the estimates of variance explained of regions will be constrained to be non-negative, but the estimates for kinship matrices will not. Using --constrain YES will ensure all estimates are constrained, while using --constrain NO will ensure none are constrained.

--memory-save <YES/NO> (default NO) - by default, LDAK will read into memory all kinship matrices at the start. If there are many kinship matrices, this can require large amounts of memory; therefore, consider adding --memory-save YES, and LDAK will instead read kinship matrices on-the-fly each time they are required.

--permute YES - the phenotypic values will be shuffled. This is useful if wishing to perform permutation analysis to see the distribution of estimates of variance explained obtained when there is no true signal.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

The output files are

<output>.reml - contains estimates of the proportion of variance explained by each kinship matrix, region and the top predictors. It also reports for each the (mega)  intensity, which equals its variance explained proportion divided by size (x 1,000,000).

<output>.share - contains estimates of the fraction of total variance explained by each kinship matrix and region. It also reports for each the enrichment, which equals its estimated fraction divided by its expected fraction.

<output>.coeff - contains estimates of the fixed effects (if no covariates are provided, the only fixed effect will be the intercept).

<output>.indi.blp - contains the estimates of random effects (breeding values). There is a pair of columns for each kinship matrix or region: the second of each pair provides g, the estimated breeding values; the first provides K-1g, the estimated breeding values pre-multiplied by the inverse of the kinship matrix (for regions, the first column in each pair is set to 0).

<output>.reg.blup and <output>.reg.score - contains estimates of effect sizes for regional and top predictors (to get the effect size estimates for the kinship matrix predictors you must use BLUP).

<output>.indi.res - contains residuals after subtracting from the phenotypic values the estimated effects of the kinship matrices, regions and covariates (including top predictors).

If the phenotype is binary and --prevalence specified, there will be additional output files with the suffix .liab, which provide estimates converted from the observed to the liability scale.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Example: for this we use the binary PLINK files test.bed, test.bim and test.fam available in the Test Datasets, the phenotypes phen.pheno and phen_binary.pheno, and the kinship matrices in the folder partitions calculated in Get Kinships.

First we regress the phenotype phen.pheno on the kinship matrix with stem partitions/kinships.all

../ldak.out --reml res1 --pheno phen.pheno --grm partitions/kinships.all

By viewing res1.reml, we see that the estimate of variance explained is 0.56 (SD 0.10). Recall that kinship matrix with stem partitions/kinships.all was made by merging the three kinship matrices with stems partitions/kinships.1, partitions/kinships.2 and partitions/kinships.3. A list of these names are saved in partitions/partition.list, so we can regress the phenotype on all three kinship matrices simultaneously using

../ldak.out --reml res2 --pheno phen.pheno --mgrm partitions/partition.list

res2.reml now contains both an estimate of the total variance explained (0.58, SD 0.09), and how this is divided across the three kinship matrices. We could do the same by first making lists of which predictors contribute to each kinship matrix

awk < partitions/kinships.1.grm.details '(NR>1){print $1}' > set1
awk < partitions/kinships.2.grm.details '(NR>1){print $1}' > set2
awk < partitions/kinships.3.grm.details '(NR>1){print $1}' > set3

Then running

../ldak.out --reml res3 --pheno phen.pheno --region-number 3 --region-prefix set --bfile test --weights sections/weights.short --power -0.25

Note that these two analyses are not exactly the same, because by default LDAK constrains regions but not kinship matrices; to make the two analyses identical add --constrain NO (there will remain slight differences in the results due to algorithmic differences). Finally, we could do the same using a combination of kinship matrices and regions

../ldak.out --reml res4 --pheno phen.pheno --grm partitions/kinships.3 --region-number 2 --region-prefix set --bfile test --weights sections/weights.short --power -0.25

phen_binary.pheno contains a binary phenotype, where all individuals are either 1 (controls) or 2 (cases). Suppose the prevalence of this phenotype is 0.01 (i.e., were we to pick individuals at random from the population, on average 1 in 100 would be cases). Then when analyzing this phenotype, we could add --prevalence 0.01 to convert estimates to the liability scale. For example:

../ldak.out --reml res5 --pheno phen_binary.pheno --grm partitions/kinships.all --prevalence 0.01