For constructing linear prediction models, we recommend MultiBLUP, a generalisation of BLUP (Best Linear Unbiased Prediction). Full details of the method are provided in our Genome Research paper. In brief, MultiBLUP extends the BLUP model to allow for k+r genetic random effect terms:

Y = a + g_{1} + g_{2} + … + g_{k+r} + e where g_{j}~N(0,K_{j}v_{j}^{2}) and e~N(0,Iv_{e}^{2})

where the first k random effects correspond to standard kinship matrices (typically full-rank), while the last r correspond to regional kinship matrices, each constructed from a subset of predictors (typically low-rank).

Just like with BLUP, there are two steps when constructing a MultiBLUP prediction model: first you perform REML to estimate the variance terms v_{1}^{2}, …, v_{k+r}^{2}, v_{e}^{2}, then given these you calculate the BLUE estimates of SNP effect sizes.

MultiBLUP improves on BLUP when the kinship matrices correspond to subsets of predictors with distinct effect size variances. Either you can provide your own predictor subsets based on prior knowledge, or with Adaptive MultiBLUP we provide an efficient way to first determine suitable subsets.

Note that when the focus is on prediction, rather than on estimating variance explained, we do not generally advise using SNP weightings (see here for our reasoning). Therefore, kinship matrices should be computed with the option –ignore-weights YES.

Additionally, there may be advantages to varying the power parameter which determines how predictors are standardised. While we found the default (–power -1) seems to work best for estimating variance explained, higher values (e.g., –power 0) might be preferable for prediction.

If you have any problems, get in touch: doug<dot>speed<at>ucl<dot>ac<dot>uk.