Other Arguments

LDAK is a command-line software. Each LDAK command starts with the name of the executable file, followed by arguments. The command must include one main argument, which tells LDAK which feature to use and the prefix of the output files / folder. It will usually also include one or more other arguments. For example, if you are using the Linux version of LDAK, you might use the command

./ldak5.1.linux --calc-stats results --bfile data

The main argument is --calc-stats results, which tells LDAK to calculate predictor statistics and save the output files with prefix results. Meanwhile, the other argument --bfile data tells LDAK that the data are stored in Binary PLINK format in the files data.bed, data.bim and data.fam.

Below are some of the most common other arguments.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

To be completed (sorry). However, please note that whenever you run LDAK, the screen output will suggest arguments relevant to the function you are using.



By default, LDAK scales predictors based on their expected variance assuming Hardy-Weinberg Equilibrium; to instead scale based on their observed variance, add --hwe-stand NO (this is generally required when using non-SNP data).


Data Filtering

These six options can be added to most commands

--extract <list_of_predictors> - tells LDAK to use only the predictors specified in <list_of_predictors>. It can be used for genomic partitioning, by creating lists of predictors and running LDAK once for each list, however, this can be more easily accomplished using the options --partition-number and --partition-prefix (see Genomic Partitioning).

--exclude <list_of_predictors> - tells LDAK to not use the predictors specificied in <list_of_predictors>. Note that --exclude takes priority over --extract.

--chr <number> - tells LDAK to only use predictors on a particular chromosome.

--snp <predictor_name> - tells LDAK to only the named predictor.

--keep <list_of_samples> - tells LDAK to only use the samples specified in <list_of_samples>. (Note that in LDAK3 the option --keep-index was used instead.) This option is equivalent to first retaining only these samples then running LDAK on the reduced dataset.

--remove <list_of_samples> - tells LDAK to not use the samples specificied in <list_of_samples>. Note that --remove takes priority over --keep.

In addition, for many commands, you can add --pheno and LDAK will only consider samples for which phenotypes are available.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

The following options can be used when making data

--minmaf <float> - predictors with observed mean less than 2x <float> are ignored. It is unusual to consider SNPs with MAF below 0.01, as genotyping accuracy deteriorates rapidly beyond this point; and even if typed correctly, the precision of h2 estimates would also decrease. Therefore, the default cut-off of 0.01 seems a sensible choice. Using --minmaf 0 means that no filtering by MAF will be performed. This is suitable when analysing data which allows negative values (e.g., zero centred genotypes or non-genetic data).

--maxmaf <float> - predictors with observed mean greater than or equal to 2x <float> are ignored (the default is 0.51 to ensure SNPs with MAF 0.5 are retained by default).

These two options can be used for dividing predictors into MAF tranches. However, note that MAF filtering takes place only after the data have been read, so you can save memory by calculating MAF separately (e.g. using PLINK) then specifying each tranche using --extract-index.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

--minvar <float> - predictors with observed mean less than <float> are ignored. When analysing genotypic data (0/1/2), the MAF filter will generally catch such predictors, but a variance cut-off can prove useful if the data contain arbitrary values and filtering based on MAF has been turned off.

--minobs <float> - predictors with values recorded for less than a proportion <float> of the individuals are ignored. However, careful QC is vital when performing heritability analysis, so it's best not to rely on LDAK's (crude) filtering of predictors.