Thin Predictors

There are two main reasons to thin predictors. The first is when wishing to construct an unweighted kinship matrix, in order to assess population structure or relatedness (see Quality Control for more details). Here, we perform a strong thinning, in order to obtain a subset of predictors in approximate linkage equilibrium (e.g., so that there are no SNPs within 1cM with correlation squared above 0.05). This is because we are interested in genome-wide correlations between SNPs (caused by population structure and relatedness), and not local correlations (caused by linkage disquilbrium).

The second reason for thinning predictors, is when wishing to implement the LDAK-Thin Model, our recommended Heritability Model when analysing individual-level data. Here we perform a light thinning, with the aim only of ensuring no duplicate predictors remain (e.g., so that there are no SNPs within 100kb with correlation squared above 0.98).

Always read the screen output, which suggests arguments and estimates memory usage.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

The main argument is --thin <output>.

With this, you must provide

--bfile/--gen/--sp/--speed <datastem> - to specify the datafiles (see File Formats).

--window-prune <float> - to specify the correlation squared threshold.

--window-kb/--window-length/--window-cm <float> - to specify the window size.

When thinning in order to construct an unweighted kinship matrix, we suggest using --window-prune 0.05 and --window-cm 1. However, you should change these values if too few or too many predictors remain (when analysing human data, we can reliably estimate unweighted kinships using between 50,000 and 150,000 SNPs). When thinning in order to implement the LDAK-Thin Model, we recommend using --window-prune 0.98 and --window-kb 100. While this will likely fail to capture a few duplicate SNPs (those more than 100kb apart), the impact of these on subsequent analyses will be negligible.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Example:

Here we use the binary PLINK files human.bed, human.bim and human.fam from the Test Datasets.

To thin to obtain a subset of predictors in approximate linkage equlibrium, use

../ldak.out --thin le --bfile human --window-prune 0.05 --window-cm 1

To thin duplicate predictors in preparation for using the LDAK-Thin Model, use

../ldak.out --thin thin --bfile human --window-prune 0.98 --window-kb 100

The lists of predictors that remain after thinning will be saved in le.in and thin.in.