LDAK allows generation of phenotypic and genotypic data. The former can be used for testing the accuracy of SNP-based heritability analysis for different phenotypic models.
The argument for generating phenotypic values is
which requires the following options (to restrict to a subset of the data, see Data Filtering):
--bfile/--chiamo/--sp/--speed <prefix> - to specify data files (see File Formats)
--her <float> - to specify the heritability for the simulated phenotypes (i.e., the proportion of total phenotypic variation explained by the genetic contribution).
To specify the numbers of phenotypes and causal predictors, you should use either
--num-phenos <integer> - number of phenotypes to generate
--num-causals <integer> - number of predictors contributing to each phenotype (to specify that all predictors are given an effect, you can use --num-causals -1).
or you can use one or both of
--causals <causalsfile> - to list which predictors are causal (assigned effect)
--effects <effectsfile> - to provide effect sizes
Both <causalsfile> and <effectsfile> should be text files with one column per phenotype and number of rows equal to the number of causal predictors.
LDAK will compute breeding values (the genetic contributions) and save these to <output>.breed (stored in PLINK format). For this, effect sizes are drawn from a Gaussian distribution, so to use other distributions provide effect sizes using --effects. Then LDAK will add noise to produce phenotypes with the desired heritability, and save these to <output>.pheno (also in PLINK format). The list of causal predictors and effects will be stored in <output>.effects. Note that unless --causals is used, LDAK picks causal predictors at random. However, a phenotype may have fewer causal predictors than requested if LDAK picks a predictor which subsequently fails the QC filters.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
--make-snps <output> --num-snps <integer> --num-samples
--num-samples <integer> - specifies number of samples.
--num-snps <integer> - specifies number of SNPs.
LDAK will create a simple dataset, saved in SP format to <output>.sp, <output>.bim and <output>.fam. The generation process is very simple, assuming Hardy-Weinberg equilibrium and linkage equilibrium. The default MAF range of SNPs is 0.01 to 0.5, but this can be changed using --minmaf <float> and --maxmaf <float> (see Data Filtering).