When a kinship matrix corresponds to a small number of predictors, it is more efficient to supply this to LDAK as a “region”. This will save space by avoiding the need to explicitly compute the kinship matrix, as LDAK will do this on-the-fly, and will also provide other computational speed-ups. Regional kinship matrices are key to performing Adaptive MultiBLUP.
To include regional kinship matrices when estimating variance components, use
–reml <output> –pheno <phenofile> –region-number X –region-prefix <input>
where <phenofile> contains the phenotypic values (in PLINK format), and there are X regions corresponding to the subsets of predictors listed in <input>1, …, <input>X. This step requires one of –bfile/–chiamo/–sp/–speed <prefix> to provide the data files to which the regions correspond and either –weights or –ignore-weights YES (for prediction, we recommend not weightings). Note that the parameters in Data Filtering do not apply to regional predictors (so ensure the predictors specified are of sufficient quality).
By default, LDAK will remove a predictor if (effectively) identical to one which remains (correlation squared >0.995). To change the threshold use –prune <float>; for example, a lower threshold will produce models with fewer predictors (at the possible expense of accuracy), while a threshold above one will result in no pruning.
One or more (non-regional) kinship matrices can still be included using –grm or –mgrm, use –covar to include covariates, and –keep to specify a subset of samples.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
To estimate predictor effect sizes, use
–calc-blups <output2> –remlfile <output>.reml
This step requires one of –bfile/–chiamo/–sp/–speed <prefix> – to provide the data files to which the regions correspond (see File Formats).
If one of –grm or –mgrm was used when estimating variance components, this should also be used here. The SNP effect sizes will be saved in <output>.blup, with the random effects in <output>.pred.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
The following example uses the files provided in the Test Datasets. The kinship matrices kins/kinshipA and kins/kinshipB were created from the binary PLINK files test.bed, test.bim and test.fam, while list1 and list2 are two subsets of predictors from the same datafiles. Phenotypes are stored in phen.pheno and the file mlist.txt contains
../ldak.out –reml out –mgrm mlist.txt –pheno phen.pheno –region-number 2 –region-prefix list –bfile test –ignore-weights YES
../ldak.out –calc-blups out –mgrm mlist.txt –remlfile out.reml –bfile test
will perform 4-way MultiBLUP using two standard kinship matrices and two regional kinship matrices, saving the effect size estimates in out.blup and the predictions in out.pred. Column 4 of out.blup will report cumulative effect sizes, which are the sum of the effect sizes corresponding to the four kinship matrices (Columns 6, 8, 10 and 12). Similarly, Column 3 of out.pred will report cumulative predictions, which are the sum of predictions corresponding to each kinship matrix (Columns 4, 5, 6 and 7).