Calculate Kinships

There are two ways to compute a kinship matrix. The direct method has just one step and is best in most circumstances. However, if your dataset is very large (for example, contains imputed genotypes for more than 30,000 samples), consider using the indirect method which has three steps. This indirect method can also be useful if performing Genomic Partitioning.

It is important to understand that the arguments you use when calculating kinships (in particular, the choice of predictor weightings, power parameter, and subset of predictors), determine the assumed Heritability Model. In general, we recommend the LDAK-Thin Model (we provide scripts for implementing this model in the example at the bottom).

Always read the screen output, which suggests arguments and estimates memory usage.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Direct method:

This uses the main argument --calc-kins-direct <outfile>.

This requires the options

--bfile/--gen/--sp/--speed <datastem> - to specify the genetic data files (see File Formats).

--weights <weightsfile> or --ignore-weights YES - to specify the predictor weightings (if using a weightsfile, this should have two columns, that provide predictor names then weightings).

--power <float> - to specify how predictors are scaled.

You can use --keep <keepfile> and/or --remove <removefile> to restrict to a subset of samples, and --extract <extractfile> and/or --exclude <excludefile> to restrict to a subset of predictors (for more details, see Data Filtering).

The kinship matrix will be saved with stem <outfile> (see Kinship Formats for details of how kinship matrices are stored).
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Indirect method:

First cut the predictors into partitions using --cut-kins <folder>.

With this, you must provide

--bfile/--gen/--sp/--speed <datastem> - to specify the genetic data files (see File Formats).

--partition-length <integer>, --by-chr YES, or --partition-number <integer> and --partition-prefix <prefix> - to specify how to divide predictors into partitions.

You might also wish to use --extract <extractfile> and/or --exclude <excludefile> to specify a subset of predictors.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Next calculate a kinship matrix for each partition using --calc-kins <folder>

With this, you must provide

--bfile/--gen/--sp/--speed <datastem> - to specify the genetic data files (see File Formats).

--partition <number> - specifies the partition for which to calculate kinships.

--weights <weightsfile> or --ignore-weights YES - to specify the predictor weightings (if using a weightsfile, this should have two columns, that provide predictor names then weightings).

--power <float> - to specify how predictors are scaled.

If you used --extract <extractfile> and/or --exclude <excludefile> with --cut-kins, you should also use them here.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Finally join the kinship matrices using --join-kins <folder>

This requires no options.

The final kinship matrix will be saved with stem <foldert>/kinships.all (see Kinship Formats for details of how kinship matrices are stored).
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

By default, LDAK centres and scales predictors based on their observed means. However, if creating a kinship matrix to use for PCGC regression, the authors of PCGC suggest that predictors are centred and scaled using external estimates of their means (as the observed means can be biased due to ascertainment). Therefore, add --predictor-means <meansfile> to provide a list of average allele counts. <meansfile> should contain four columns: predictor name, Allele 1, Allele 2 and the average count of Allele 1 (for SNP data, a value between 0 and 2).
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Example:

Here we use the binary PLINK files human.bed, human.bim and human.fam from the Test Datasets, and assume the LDAK-Thin Model (see Technical Details for more information). Note that to use the LDAK-Thin Model, requires a weightsfile that gives weighting one to the predictors that remain after thinning for duplicates, and weighting zero to those removed. This can be achieved using the commands

./ldak.out --thin thin --bfile human --window-prune .98 --window-kb 100
awk < thin.in '{print $1, 1}' > weights.thin

This first produces the file thin.in, which contains a list of predictors that remain after thinning, then these predictors are given weighting one in the file weights.thin (note that predictors not in weights.thin are automatically given weighting zero).
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

To compute the kinship matrix using the direct method, run

./ldak.out --calc-kins-direct LDAK-Thin --bfile human --weights weights.thin --power -.25

The kinship matrix is saved with stem LDAK-Thin (i.e., in the files LDAK-Thin.grm.bin, LDAK-Thin.grm.id, LDAK-Thin.grm.details and LDAK-Thin.grm.adjust).
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

To do the same using the indirect method, use the commands

./ldak.out --cut-kins kins --bfile human --partition-length 500000
./ldak.out --calc-kins kins --bfile human --partition 1 --weights weights.thin --power -.25
./ldak.out --join-kins kins

Now the kinship matrix is saved with stem kins/kinship.all
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

To instead calculate a kinship matrix assuming the GCTA Model, run

./ldak.out --calc-kins-direct GCTA --bfile human --ignore-weights YES --power -1

While to calculate a kinship matrix assuming the LDAK Model, run

./ldak.out --calc-kins-direct LDAK --bfile human --weights <weightsfile> --power -.25

where <weightsfile> contains the LDAK weightings (obtained using Calculate Weightings).