Data Filtering

These six options can be added to most commands

--extract <list_of_predictors> - tells LDAK to use only the predictors specified in <list_of_predictors>. It can be used for genomic partitioning, by creating lists of predictors and running LDAK once for each list, however, this can be more easily accomplished using the options --partition-number and --partition-prefix (see Genomic Partitioning).

--exclude <list_of_predictors> - tells LDAK to not use the predictors specificied in <list_of_predictors>. Note that --exclude takes priority over --extract.

--chr <number> - tells LDAK to only use predictors on a particular chromosome.

--snp <predictor_name> - tells LDAK to only the named predictor.

--keep <list_of_samples> - tells LDAK to only use the samples specified in <list_of_samples>. (Note that in LDAK3 the option --keep-index was used instead.) This option is equivalent to first retaining only these samples then running LDAK on the reduced dataset.

--remove <list_of_samples> - tells LDAK to not use the samples specificied in <list_of_samples>. Note that --remove takes priority over --keep.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

The following

--minmaf <float> - predictors with observed mean less than 2x <float> are ignored. It is unusual to consider SNPs with MAF below 0.01, as genotyping accuracy deteriorates rapidly beyond this point; and even if typed correctly, the precision of h2 estimates would also decrease. Therefore, the default cut-off of 0.01 seems a sensible choice. Using --minmaf 0 means that no filtering by MAF will be performed. This is suitable when analysing data which allows negative values (e.g., zero centred genotypes or non-genetic data).

--maxmaf <float> - predictors with observed mean greater than or equal to 2x <float> are ignored (the default is 0.51 to ensure SNPs with MAF 0.5 are retained by default).

These two options can be used for dividing predictors into MAF tranches. However, note that MAF filtering takes place only after the data have been read, so you can save memory by calculating MAF separately (e.g. using PLINK) then specifying each tranche using --extract-index.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

--minvar <float> - predictors with observed mean less than <float> are ignored. When analysing genotypic data (0/1/2), the MAF filter will generally catch such predictors, but a variance cut-off can prove useful if the data contain arbitrary values and filtering based on MAF has been turned off.

--minobs <float> - predictors with values recorded for less than a proportion <float> of the individuals are ignored. However, careful QC is vital when performing heritability analysis, so it's best not to rely on LDAK's (crude) filtering of predictors.