Filter Relatedness

To perform heritability analysis (e.g., estimate SNP heritability or partition heritability), requires that the samples are "unrelated" (in practice, this means at most distantly related, with no pair closer than, say, second cousins). This ensures that the heritability estimates reflect only the heritability contributed by predictors in the dataset (or predictors in local linkage disequilibrium with these). By contrast, if your dataset includes (substantial) relatedness, there will likely be long-range linkage disequilibrium (e.g., between predictors on different chromosomes), and you will end up with inflated estimates. For more details, see Quality Control.

Here we explain how to filter samples based on relatedness. This is typically used to obtain a subset of unrelated samples, but can also be used to obtain a subset of related samples.

Always read the screen output, which suggests arguments and estimates memory usage.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

The main argument is --filter <outfile>.

The only required option is

--grm <kinfile> - to provide a kinship matrix.

By default, LDAK will remove samples until no pair remained with kinship greater than the absolute value of the smallest observed (i.e., the result will be a list of unrelated samples). If you would prefer to specify a relatedness threshold, you can use --max-rel <float> (e.g., it is common to see --max-rel 0.05). Alternatively, if you use --min-rel <float>, LDAK will only retain samples with relatedness above <float> with at least one other sample (i.e., the result will be a list of related samples).

You can use --keep <keepfile> and/or --remove <removefile> to restrict to a subset of samples (e.g., to exclude ancestral outliers). If you use --pheno <phenofile>, then when deciding which of a pair or related samples to keep, LDAK will give priority to those with non-missing phenotypes.

If filtering out relatedness, the samples kept and lost will be saved in the files <outfile>.keep and <outfile>.lose. If instead searching for a subset of related samples, these will be saved in <outfile>.related.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Example:

Here we use the kinship matrix with stem LDAK-Thin created in the example for Calculate Kinships.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

To identify a subset of unrelated samples, we run

./ldak.out --filter LDAK-Thin --grm LDAK-Thin

Here, LDAK filters so that no pair of samples remains with estimated kinship greater than 0.17 (because the smallest observed kinship was -0.17). Note that for proper-sized datasets, this value will be much closer to zero. The files LDAK-Thin.keep and LDAK-Thin.lose list the 398 samples that were kept, and the 26 samples that were lost.

If we instead wished to identify related samples, we could run

./ldak.out --filter LDAK-Thin --grm LDAK-Thin --min-rel .2

LDAK searches for pairs of samples with relatedness at least 0.2. In total, 25 samples are retained, saved in LDAK-Thin.related.