Calculate Kinships

There are two ways to compute a kinship matrix. The direct method has just one step and is best in most circumstances. However, if your dataset is very large (for example, contains imputed genotypes for more than 30,000 samples), consider using the indirect method which has three steps. This indirect method can also be useful if performing Genomic Partitioning.

It is important to understand that the options you use when calculating kinships (in particular, the power parameter, predictor weightings and subset of predictors), determine the assumed Heritability Model. In general, we recommend the Human Default Model (we provide scripts for implementing this model and other models in the example at the bottom).

Always read the screen output, which suggests arguments and estimates memory usage.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Direct method:

This uses the main argument --calc-kins-direct <outfile>.

This requires the options

--bfile/--gen/--sp/--speed <datastem> - to specify the genetic data files (see File Formats).

--power <float> - to specify how predictors are scaled.

You can use --keep <keepfile> and/or --remove <removefile> to restrict to a subset of samples, and --extract <extractfile> and/or --exclude <excludefile> to restrict to a subset of predictors (for more details, see Data Filtering).

By default, LDAK will assign all predictors weighting one (equivalent to using --ignore-weights YES), however, you can provide your own weightings using --weights <weightsfile>.

The kinship matrix will be saved with stem <outfile> (see Kinship Formats for details of how kinship matrices are stored).
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Indirect method:

First cut the predictors into partitions using --cut-kins <folder>.

With this, you must provide

--bfile/--gen/--sp/--speed <datastem> - to specify the genetic data files (see File Formats).

--partition-length <integer>, --by-chr YES, or --partition-number <integer> and --partition-prefix <prefix> - to specify how to divide predictors into partitions.

You might also wish to use --extract <extractfile> and/or --exclude <excludefile> to specify a subset of predictors.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Next calculate a kinship matrix for each partition using --calc-kins <folder>

With this, you must provide

--bfile/--gen/--sp/--speed <datastem> - to specify the genetic data files (see File Formats).

--partition <number> - specifies the partition for which to calculate kinships.

--power <float> - to specify how predictors are scaled.

By default, LDAK will assign all predictors weighting one (equivalent to using --ignore-weights YES), however, you can provide your own weightings using --weights <weightsfile>.

If you used --extract <extractfile> and/or --exclude <excludefile> with --cut-kins, you should also use them here.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Finally join the kinship matrices using --join-kins <folder>

This requires no options.

The final kinship matrix will be saved with stem <foldert>/kinships.all (see Kinship Formats for details of how kinship matrices are stored).
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

By default, LDAK centres and scales predictors based on their observed means. However, if creating a kinship matrix to use for PCGC regression, the authors of PCGC suggest that predictors are centred and scaled using external estimates of their means (as the observed means can be biased due to ascertainment). Therefore, add --predictor-means <meansfile> to provide a list of average allele counts. <meansfile> should contain four columns: predictor name, Allele 1, Allele 2 and the average count of Allele 1 (for SNP data, a value between 0 and 2).
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Example:

Here we use the binary PLINK files human.bed, human.bim and human.fam from the Test Datasets, and assume the Human Default Model (see Technical Details for more information).
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

To compute the kinship matrix using the direct method, run

./ldak.out --calc-kins-direct HumDef --bfile human --power -.25

The kinship matrix is saved with stem HumDef (i.e., in the files HumDef.grm.bin, HumDef.grm.id, HumDef.grm.details and HumDef.grm.adjust).
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

To do the same using the indirect method, use the commands

./ldak.out --cut-kins kins --bfile human --partition-length 500000
./ldak.out --calc-kins kins --bfile human --partition 1 --power -.25
./ldak.out --join-kins kins

Now the kinship matrix is saved with stem kins/kinship.all
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

To instead calculate a kinship matrix assuming the GCTA Model, run

./ldak.out --calc-kins-direct GCTA --bfile human --power -1

To calculate a kinship matrix assuming the LDAK Model, run

./ldak.out --calc-kins-direct LDAK --bfile human --weights <weightsfile> --power -.25

where <weightsfile> contains the LDAK weightings (obtained using Calculate Weightings).

While to calculate a kinship matrix assuming the LDAK-Thin Model, run

./ldak.out --thin thin --bfile human --window-prune .98 --window-kb 100
awk < thin.in '{print $1, 1}' > weights.thin
./ldak.out --calc-kins-direct LDAK-Thin --bfile human --weights weights.thin --power -.25