Calculate Kinships

There are two ways to compute a kinship matrix. The direct method has just one step and is best in most circumstances. However, if your dataset is very large (for example, contains imputed genotypes for more than 30,000 individuals), consider using the indirect method which has three steps. This indirect method can also be useful if performing Genomic Partitioning.

It is important to understand that the arguments you use when calculating kinships (in particular, the choice of predictor weightings, power parameter, and subset of predictors), determine the assumed Heritability Model. In general, we recommend the LDAK-Thin Model (we provide scripts for implementing this model in the example at the bottom).

Always read the screen output, which suggests arguments and estimates memory usage.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Direct method:

This uses the main argument --calc-kins-direct <output>.

With this, you must provide

--bfile/--gen/--sp/--speed <datastem> - to specify the datafiles (see File Formats).

--weights <weightsfile> or --ignore-weights YES - if using a weightsfile, this should have two columns that provide predictor names then weightings.

--power <float> - to specify how predictors are scaled; we generally recommend -0.25.

You might also wish to use --extract <extractfile> and/or --exclude <excludefile> to specify a subset of predictors.

The kinship matrix will be saved with stem <output> (see Kinship Formats for details of how kinship matrices are stored).
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Indirect method:

First cut the predictors into partitions using --cut-kins <output>.

With this, you must provide

--bfile/--gen/--sp/--speed <datastem> - to specify the data files (see File Formats).

--partition-length <integer>, --by-chr YES, or --partition-number <integer> and --partition-prefix <partitionstem> - to specify how to divide predictors into partitions.

You might also wish to use --extract <extractfile> and/or --exclude <excludefile> to specify a subset of predictors.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Next calculate a kinship matrix for each partition using --calc-kins <output>

With this, you must provide

--bfile/--gen/--sp/--speed <datastem> - to specify the data files (see File Formats).

--partition <number> - specifies the partition for which to calculate kinships.

--weights <weightsfile> or --ignore-weights YES - if using a weightsfile, this should have two columns that provide predictor names then weightings.

--power <float> - to specify how predictors are scaled.

If you used --extract <extractfile> and/or --exclude <excludefile> with --cut-kins, you should also use them here.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Finally join the kinship matrices using --join-kins <output>

This requires no options.

The final kinship matrix will be saved with stem <output>/kinships.all (see Kinship Formats for details of how kinship matrices are stored).
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

By default, LDAK centres and scales predictors based on their observed means. However, if creating a kinship matrix to use for PCGC regression, the authors of PCGC suggest that predictors are centred and scaled using external estimates of their means (as the observed means can be biased due to ascertainment). Therefore, add --predictor-means <meansfile> to provide a list of average allele counts. <meansfile> should contain four columns: predictor name, Allele 1, Allele 2 and the average count of Allele 1 (for SNP data, a value between 0 and 2).
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Example:

Here we use the binary PLINK files human.bed, human.bim and human.fam from the Test Datasets, and assume the LDAK-Thin Model (see Technical Details for more information). Note that to use the LDAK-Thin Model, you must first thin the predictors to remove duplicates. This can be achieved using the command

./ldak.out --thin thin --bfile human --window-prune .98 --window-kb 100

This produces the file thin.in, which contains a list of predictors that remain after thinning.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

To compute the kinship matrix using the direct method, run

./ldak.out --calc-kins-direct LDAK-Thin --bfile human --ignore-weights YES --power -.25 --extract thin.in

The kinship matrix is saved with stem LDAK-Thin (i.e., in the files LDAK-Thin.grm.bin, LDAK-Thin.grm.id, LDAK-Thin.grm.details and LDAK-Thin.grm.adjust).
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

To do the same using the indirect method, use the commands

./ldak.out --cut-kins kins --bfile human --partition-length 500000 --extract thin.in
./ldak.out --calc-kins kins --bfile human --partition 1 --ignore-weights YES --power -.25 --extract thin.in
./ldak.out --join-kins kins

Now the kinship matrix is saved with stem kins/kinship.all
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

To instead calculate a kinship matrix assuming the GCTA Model, run

./ldak.out --calc-kins-direct GCTA --bfile human --ignore-weights YES --power -1

While to calculate a kinship matrix assuming the LDAK Model, run

./ldak.out --calc-kins-direct LDAK --bfile human --weights <weightsfile> --power -.25

where <weightsfile> contains the LDAK weightings (obtained using Calculate Weightings).