Data Filtering

These options allow for filtering of samples or predictors, and basic QC.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

–extract <list_of_predictors> – tells LDAK to use only the predictors specified in <list_of_predictors>. (Note that in LDAK3 the option –extract-index was used instead.) This option is equivalent to first extracting these predictors, then running LDAK on the reduced dataset.  It can be used for genomic partitioning, by creating predictor lists for each subset of predictors and running LDAK once for each subset. However, this can be more easily accomplished using the options –partition-number and –partition-prefix (see Genomic Partitioning).

–keep <list_of_samples> – tells LDAK to only use the samples specified in <list_of_samples>. (Note that in LDAK3 the option –keep-index was used instead.) This option is equivalent to first retaining only these samples then running LDAK on the reduced dataset.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

–minmaf <float> – predictors with observed mean less than 2x <float> are ignored. It is unusual to consider SNPs with MAF below 0.01, as genotyping accuracy deteriorates rapidly beyond this point; and even if typed correctly, the precision of h2 estimates would also decrease. Therefore, the default cut-off of 0.01 seems a sensible choice. Using –minmaf 0 means that no filtering by MAF will be performed. This is suitable when analysing data which allows negative values (e.g., zero centred genotypes or non-genetic data).

–maxmaf <float> – predictors with observed mean greater than or equal to 2x <float> are ignored (the default is 0.51 to ensure SNPs with MAF 0.5 are retained by default).

These two options can be used for dividing predictors into MAF tranches. However, note that MAF filtering takes place only after the data have been read, so you can save memory by calculating MAF separately (e.g. using PLINK) then specifying each tranche using –extract-index.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

–minvar <float> – predictors with observed mean less than <float> are ignored. When analysing genotypic data (0/1/2), the MAF filter will generally catch such predictors, but a variance cut-off can prove useful if the data contain arbitrary values and filtering based on MAF has been turned off.

–minobs <float> – predictors with values recorded for less than a proportion <float> of the individuals are ignored. However, careful QC is vital when performing heritability analysis, so it’s best not to rely on LDAK’s (crude) filtering of predictors.