Simulate Data

LDAK can generate phenotypic and genotypic data. The former can be used to investigate how the outcome of an analysis depends on the Heritability Model.

Always read the screen output, which suggests arguments and estimates memory usage.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Simulating phenotypes:

The main argument is --make-phenos <outfile>.

This requires the options

--bfile/--gen/--sp/--speed <datastem> - to specify the genetic data files (see File Formats).

--power <float> - to specify how predictors are scaled.

--her <float> - to specify the heritability for the simulated phenotypes (i.e., the proportion of total phenotypic variation explained by the genetic contribution).

--num-phenos <integer> - to specify the number of phenotypes to generate.

--num-causals <integer> - to specify the number of predictors contributing to each phenotype (to specify that all predictors are causal, use --num-causals -1).

By default, LDAK will assign all predictors weighting one (equivalent to using --ignore-weights YES), however, you can provide your own weightings using --weights <weightsfile>.

LDAK will generate phenotypes of the form Y = X1b1 + X2b2 + ... + XCbC + e, where Xj is the jth causals variant, bj is the corresponding effect size, and e is Gaussian-distributed noise. LDAK samples effect sizes from a mean-zero Gaussian distribution, whose variance ensures that E[h2j], the expected heritability of the jth causal variant, is proportion to wj [fj(1-fj)](1+alpha), where wj and alpha are specified by --weights (or --ignore-weights) and --power, respectively.

By default, LDAK will pick causal predictors at random; if you would prefer to specify which predictors are causal for each phenotype, use --causals <causalsfile>. Similarly, LDAK will by default sample effect sizes from a standard normal distribution; to instead specify the effect sizes use --effects <effectsfile>. Both <causalsfile> and <effectsfile> should be text files with one row per phenotype and one column per causal predictor.

To generate binary phenotypes, add the option --prevalence <float>. LDAK will then treat the just-generated phenotypes as liabilities, so that samples with value above Inverse_CDF(<float>) will become cases, while those below this threshold will become controls.

To generate correlated phenotypes, add the option --bivar <float>; LDAK will then generate pairs of traits with the specified genetic correlation. For example, if you use --her 0.8, --num-phenos 4 and --bivar 0.5, then LDAK will generate four phenotypes with heritability 0.8, such that Phenotypes 1 and 2 will have correlation 0.5, and likewise Phenotypes 3 and 4 (whereas all other phenotype pairs will be uncorrelated).

LDAK will first compute breeding values (the genetic contributions) and save these to <outfile>.breed; then it will add noise to produce phenotypes with the desired heritability, and save these to <outfile>.pheno (both files will be in PLINK format). The list of causal predictors and effects will be stored in <outfile>.effects. Note that if constructing binary phenotypes, the liabilities will be stored in <outfile>.liab.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Simulating genotypes:

The main argument is --make-snps <outfile>.

This requires the following options

--num-samples <integer> - to specify the number of samples.

--num-snps <integer> - to specify the number of SNPs.

LDAK will generate SNPs in a very simple fashion, assuming Hardy-Weinberg equilibrium and linkage equilibrium. The default MAF range of SNPs is 0 to 0.5, but this can be changed using --maf-low <float> and --maf-high <float>.

The new data will be saved in binary PLINK format in the files <outfile>.bed, <outfile>.bim and <outfile>.fam.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Example:

Here we use the binary PLINK files human.bed, human.bim and human.fam from the Test Datasets.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

To generate 100 phenotypes assuming the GCTA Model (each with heritability 0.5 and 1000 causal SNPs), we run

./ldak.out --make-phenos GCTA --bfile human --ignore-weights YES --power -1 --her 0.5 --num-phenos 100 --num-causals 1000

The new phenotypes will be saved in GCTA.pheno.

To instead assume the Human Default Model, we run

./ldak.out --make-phenos HumDef --bfile human --power -.25 --her 0.5 --num-phenos 100 --num-causals 1000

The new phenotypes will be saved in HumDef.pheno.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

To generate basic genotypes for 100 samples and 1000 SNPs, we run

./ldak.out --make-snps snps --num-samples 100 --num-snps 1000

The genotypes are saved in the files snps.bed, snps.bim and snps.fam.