Get Kinships

There are two ways to compute kinships. The direct method (--calc-kins-direct) requires just one step and is best in most circumstances. However, if your dataset is very large (for example, contains imputed genotypes for more than 10,000 individuals), or you wish to perform genomic partitioning, consider the indirect method which has three steps: first the predictors are cut into PARTITIONS; then kinships are calculated across these PARTITIONS; finally, the kinships are merged across PARTITIONS. See Kinship Formats for details of how kinship matrices are stored.

Options in red are REQUIRED; options in purple are OPTIONAL. If you wish to only analyse a subset of the data, see Data Filtering. When using the direct method, <output> is the stem of the output files, when using the indirect method, <folder> is the directory in which the output files will be written.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _


--calc-kins-direct <output>

--bfile/--gen/--sp/--speed <prefix> - specifies the data files (see File Formats).

--weights <weightsfile> - specifies the file containing the predictor weightings; you can use either the condensed or detailed version of the merged weightings computed in  Get Weightings, or you can provide your own file, which should have two columns specifying predictor names then weightings. Alternatively use --ignore-weights YES for uniform weightings.

--power <float> - predictor values are scaled by [2fj(1-fj)]^(power/2), where fj is the MAF of predictor j. If the power is below -1, this assumes rarer predictors are expected to contribute more heritability than common predictors, and vice versa (this power corresponds to the parameter alpha in the AJHG paper). Based on our recent work, we advise using -0.25.

In general, we recommend using only high-quality predictors. However, if you do include lower quality predictors, you can allow for this uncertainty by using --infos to specify a file containing info scores for each (this file should have two columns: predictor name then info score). Note that if you already accounted for uncertainty when computing weightings, you SHOULD NOT do so again here.

By default, LDAK scales predictors based on their expected variance assuming Hardy-Weinberg Equilibrium; to instead scale based on their observed variance, add --hwe-stand NO (this is generally required when using non-SNP data).

By default, LDAK centres and scales predictors based on their observed means. However, if creating a kinship matrix to use for PCGC regression,  the authors of PCGC suggest that predictors are centred and scaled using external estimates of their means (as the observed means can be biased due to ascertainment). Therefore, add --predictor-means <meansfile> to provide a list of average allele counts. <meansfile> should contain four columns: predictor name, Allele 1, Allele 2 and the average count of Allele 1 (for SNP data, a value between 0 and 2).

The kinship matrix will be saved with the stem <output>. Add --kinship-gz YES or --kinship-raw YES to save in alternative formats.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _


--cut-kins <folder>

--bfile/--gen/--sp/--speed <prefix> - specifies the data files (see File Formats).

LDAK will divide the predictors into partitions. For this you must use either --partition-length to specify how many predictors in each partition, or --by-chr YES to divide by chromosome. Alternatively, more complicated partition divides can be specified by using --partition-number and --partition-prefix (see Genomic Partitioning).

--calc-kins <folder>

--partition <number> - specifies for which PARTITION to calculate kinships.

The other options are the same as for --calc-kins-direct (i.e., must use --bfile/--gen/--sp/--speed, --weights and --power; can also add --infos, --hwe-stand and --centres).

The kinship matrix for Partition # will be saved with the stem <folder>/kinship.#. Add --kinship-gz YES or --kinship-raw YES to save in alternative formats.

--join-kins <folder>

No options are required.

The merged kinship matrix will be saved with the stem <folder>/kinship.all. Add --kinship-gz YES or --kinship-raw YES to save in alternative formats.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Example: for this we use the binary PLINK files test.bed, test.bim and test.fam available in the Test Datasets, and the weightings calculated in Get Weightings.

For the DIRECT METHOD, simply run

../ldak.out --calc-kins-direct kinships --bfile test --weights sections/weights.short --power -0.25

The resulting kinship matrix will be saved with the stem kinships.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

For the INDIRECT METHOD, run (for example)

../ldak.out --cut-kins partitions --bfile test --partition-length 2000

This command divides the 5000 predictors into 3 partitions. To compute kinship matrices for each, use

../ldak.out --calc-kins partitions --bfile test --partition 1 --weights sections/weights.short --power -0.25
../ldak.out --calc-kins partitions --bfile test --partition 2 --weights sections/weights.short --power -0.25
../ldak.out --calc-kins partitions --bfile test --partition 3 --weights sections/weights.short --power -0.25

The resulting kinship matrices will be saved with stems partitions/kinships.1, partitions/kinships.2 and partitions/kinships.3. To run the above commands on a cluster, a possible script might be

#$ -S /bin/bash
#$ -t 1-3
../ldak.out --calc-kins partitions --bfile test --partition $number --weights sections/weights.short --power -0.25

When each has finished, you should run

../ldak.out --join-kins partitions

The merged kinship matrix will be saved with the stem partitions/kinships.all.