Simulate Data

LDAK can generate phenotypic and genotypic data. The former can be used to investigate how the outcome of an analysis depends on the Heritability Model.

Always read the screen output, which suggests arguments and estimates memory usage.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Simulating phenotypes:

The main argument is --make-phenos <outfile>.

This requires the following options

--bfile/--gen/--sp/--speed <datastem> - to specify the genetic data files (see File Formats).

--weights <weightsfile> or --ignore-weights YES - to specify the predictor weightings (if using a weightsfile, this should have two columns, that provide predictor names then weightings).

--power <float> - to specify how predictors are scaled.

--her <float> - to specify the heritability for the simulated phenotypes (i.e., the proportion of total phenotypic variation explained by the genetic contribution).

--num-phenos <integer> - to specify the number of phenotypes to generate.

--num-causals <integer> - to specify the number of predictors contributing to each phenotype (to specify that all predictors are causal, use --num-causals -1).

By default, LDAK will pick causal predictors at random; if you would prefer to specify which predictors are causal for each phenotype, use --causals <causalsfile>. Similarly, LDAK will by default sample effect sizes from a standard normal distribution; to instead specify the effect sizes use --effects <effectsfile>. Both <causalsfile> and <effectsfile> should be text files with one row per phenotype and one column per causal predictor.

To generate binary phenotypes, add the option --prevalence <float>. LDAK will then treat the just-generated phenotypes as liabilities, so that samples with value above Inverse_CDF(<float>) will become cases, while those below this threshold will become controls.

To generate correlated phenotypes, add the option --bivar <float>; LDAK will then generate pairs of traits with the specified genetic correlation. For example, if you use --her 0.8, --num-phenos 4 and --bivar 0.5, then LDAK will generate four phenotypes with heritability 0.8, such that Phenotypes 1 and 2 will have correlation 0.5, and likewise Phenotypes 3 and 4 (whereas all other phenotype pairs will be uncorrelated).

LDAK will first compute breeding values (the genetic contributions) and save these to <outfile>.breed; then it will add noise to produce phenotypes with the desired heritability, and save these to <outfile>.pheno (both files will be in PLINK format). The list of causal predictors and effects will be stored in <outfile>.effects. Note that if constructing binary phenotypes, the liabilities will be stored in <outfile>.liab.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Simulating genotypes:

The main argument is --make-snps <outfile>.

This requires the following options

--num-samples <integer> - to specify the number of samples.

--num-snps <integer> - to specify the number of SNPs.

LDAK will generate SNPs in a very simple fashion, assuming Hardy-Weinberg equilibrium and linkage equilibrium. The default MAF range of SNPs is 0 to 0.5, but this can be changed using --maf-low <float> and --maf-high <float>.

The new data will be saved in binary PLINK format in the files <outfile>.bed, <outfile>.bim and <outfile>.fam.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Example:

Here we use the binary PLINK files human.bed, human.bim and human.fam from the Test Datasets.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

To generate 100 phenotypes assuming the GCTA Model (each with heritability 0.5 and 1000 causal SNPs), we run

./ldak.out --make-phenos GCTA --bfile human --ignore-weights YES --power -1 --her 0.5 --num-phenos 100 --num-causals 1000

The new phenotypes will be saved in GCTA.pheno.

To instead assume the LDAK-Thin Model, we must first create a weightsfile that gives weighting one to the predictors that remain after thinning for duplicates, and weighting zero to those removed. This can be achieved using the commands

./ldak.out --thin thin --bfile human --window-prune .98 --window-kb 100
awk < thin.in '{print $1, 1}' > weights.thin

This first produces the file thin.in, which contains a list of predictors that remain after thinning, then these predictors are given weighting one in the file weights.thin (note that predictors not in weights.thin are automatically given weighting zero). Then we generate the phenotypes by running

./ldak.out --make-phenos LDAK-Thin --bfile human --weights weights.thin --power -.25 --her 0.5 --num-phenos 100 --num-causals 1000

The new phenotypes will be saved in LDAK-Thin.pheno.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

To generate basic genotypes for 100 samples and 1000 SNPs, we run

./ldak.out --make-snps snps --num-samples 100 --num-snps 1000

The genotypes are saved in the files snps.bed, snps.bim and snps.fam.