Get Kinships

In LDAK4, there are two ways to compute kinships. The direct method (–calc-kins-direct) requires just one step and is best in most circumstances. However, if your dataset is very large, or you wish to perform genomic partitioning, consider the indirect method which involves three steps: first the predictors are cut into PARTITIONS; then kinships are calculated across these PARTITIONS; finally, the kinships are joined across PARTITIONS. (Note that in LDAK3, the genome was broken into regions not partitions.)

Options in red are REQUIRED; options in purple are OPTIONAL. If you wish to only analyse a subset of the data, see Data Filtering. In all cases, <folder> is the directory in which output will be written.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
DIRECT METHOD – for most datasets, calculating kinships should take only minutes. However, if your data are very large (say, imputed genotypes for more than 10,000 individuals) or you wish to perform genomic partitioning, consider using the INDIRECT METHOD below.

–calc-kins-direct <output>

–bfile/–chiamo/–sp/–speed <prefix> – specifies the data files (see File Formats).

–weights <weightsfile> – specifies the location of the predictor weightings. Usually, this will be the file produced by the steps in Get Weightings, else you can provide your own weights file, which should have two columns providing predictor names then weightings. Alternatively use –ignore-weights YES for uniform weightings, which we recommend when the aim is prediction.

–power <float> (default -1) – predictor values are multiplied by Var(pred)^(<float>/2) Negative values suppose rarer predictors have larger effect sizes, positive that more common predictors have larger effects. Power corresponds to the parameter alpha in our paper.

The kinships will be saved in <output>.grm.id, <output>.grm.bin and <output>.grm.N.bin. Add –kinship-gz YES or –kinship-raw YES to save kinships in additional formats.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
INDIRECT METHOD

–cut-kins <folder>

–bfile/–chiamo/–sp/–speed <prefix> – specifies the data files (see File Formats).

–partition-length <num_of_predictors> (default 500,000) – to change the PARTITION length. To divide partitions according to chromosome, add –by-chr YES. More complicated partition divides can be specified using –partition-number and –partition-prefix (see Genomic Partitioning).
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

–calc-kins <folder>

–bfile/–chiamo/–sp/–speed <prefix> – specifies the data files (see File Formats).
–partition <number> – specifies which PARTITION to consider.

–weights <weightsfile> – specifies the location of the predictor weightings. Usually, this will be the file produced by the steps in Get Weightings, else you can provide your own weights file, which should have two columns providing predictor names then weightings. Alternatively use –ignore-weights YES for uniform weightings, which we recommend when the aim is prediction.

–power <float> (default -1) – predictor values are multiplied by Var(pred)^(<float>/2) Negative values suppose rarer predictors have larger effect sizes, positive that more common predictors have larger effects. Power corresponds to the parameter alpha in our paper.

The kinships for PARTITION X will be stored in three files with prefix <folder>/kinshipX. Add –kinship-gz YES or –kinship-raw YES to save kinships in additional formats.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

–join-kins <folder>

No other arguments used.

The combined kinship matrix will be stored in three files with prefix <folder>/kinshipALL. Add –-kinship-gz YES or –-kinship-raw YES to save kinships in additional formats.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

The first step is very fast, so can be performed interactively on the master processor. The third step usually takes less than 5 minutes, unless there are very many samples and/or partitions. If the data are very large , Step 2 can easily be parallelised (see the example below).

For the Mac version, the option –workdir becomes mandatory.  See Advanced Options.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Example, using the binary PLINK files test.bed, test.bim and test.fam available in the Test Datasets and the weightings calculated in Get Weightings.

../ldak.out –cut-kins partitions –bfile test –partition-length 2000

This command cuts the predictors into regions. The default length is 500,000 predictors (no buffers in this step), however, for this example it has been set to 2000. Therefore, the predictors have been split into three partitions each containing 2500 SNPs.

../ldak.out –calc-kins partitions –bfile test –partition 1 –weights sections/re-weightsALL
../ldak.out –calc-kins partitions –bfile test –partition 2 –weights sections/re-weightsALL
../ldak.out –calc-kins partitions –bfile test –partition 3 –weights sections/re-weightsALL

These commands calculates the kinships across predictors in Partition 1, then 2, then 3. For very large data, these steps might take hours, so a script suitable for clusters would look like:

#!/bin/bash
#$ -t 1-3
number=$SGE_TASK_ID
../ldak.out –calc-kins partitions –bfile test –partition $number –weights sections/re-weightsALL
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

../ldak.out –join-kins partitions

This final step merges the kinships across regions to create partitions/kinshipsALL.grm.id, partitions/kinshipsALL.grm.bin and partitions/kinshipsALL.grm.N.bin. There will also be details files partitions/kinshipALL.grm.details and partitions/kinshipALL.grm.adjust.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Alternatively, the above steps could be replaced by

../ldak.out –calc-kins-direct partitions/direct –bfile test –weights sections/re-weightsALL

The kinships will be saved with prefix partitions/direct, which will match those with prefix partitions/kinshipALL created by the indirect method.