LDAK includes a generalized REML (restricted maximum likelihood) solver for estimating the heritabilities contributed by kinship matrices and/or regions. A region is a (usually small) subset of predictors; providing a region to REML is equivalent to first calculating a kinship matrix across the predictors in this region, then providing this matrix to REML. Note that if there are very many samples, you might prefer to use Haseman-Elston Regression, while if the phenotype is binary, you might prefer to use PCGC Regression.
Always read the screen output, which suggests arguments and estimates memory usage.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
The main argument is --reml <outfile>.
The only required option is
--pheno <phenofile> - to specify phenotypes (in PLINK format). Samples without a phenotype will be excluded. If <phenofile> contains more than one phenotype, specify which should be used with --mpheno <integer>.
However, in most cases, you will also use --grm <kinfile> or --mgrm <kinstems> to provide one or more kinship matrices.
To provide regions, use --region-number <integer> and --region-prefix <regprefix>, where the files <regprefix>1, <regprefix>2, ..., list the predictors in each region. You must also specify the genetic data files with --bfile/--gen/--sp/--speed <datastem>, and use --weights <weightsfile> (or --ignore-weights YES) as well as --power <float> to indicate how to scale predictors. By default, LDAK will remove a regional predictor if (effectively) identical to one which remains (correlation squared > 0.98); to change this threshold use --region-prune <float>.
You can use --keep <keepfile> and/or --remove <removefile> to restrict to a subset of samples (e.g., to exclude ancestral outliers or relatedness).
Use --covar <covarfile> to provide covariates (in PLINK format) as fixed effects in the regression; when calculating heritabilties, the phenotypic variance explained by these will be discounted.
To include some predictors as fixed effects, use --top-preds <toppredsfile>, in which case you must also specify the genetic data files with --bfile/--gen/--sp/--speed <datastem>; when calculating heritabilities, the phenotypic variance explained by these predictors will be added to that explained by the kinship matrices and regions. Usually, <toppredsfile> will contain a pruned subset of highly-associated predictors; for more details, see the section "Accommodating loci with very large effects" in our paper Reevaluation of SNP heritability in complex human traits (Nature Genetics, 2017).
If the phenotype is binary, you can use --prevalence <float> to specify the population prevalence. LDAK will then additionally report heritability estimates on the liability scale (note that for binary traits, it is often preferable to use PCGC Regression).
By default, LDAK will not restrict heritability estimates to be within [0,1]. This is generally our preference, as to obtain unbiased estimates of heritabilities near zero, it is necessary to accept the possibility of negative estimates. If you would instead prefer that all heritability estimates are within [0,1], you should add --constrain YES.
By default, LDAK will read into memory all kinship matrices at the start. If there are many kinship matrices, this can require large amounts of memory; therefore, consider adding --memory-save YES, and LDAK will instead read kinship matrices on-the-fly each time they are required.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
The main output files are
<outfile>.reml - contains estimates of the heritability contributed by each kinship matrix, region and the top predictors. It also reports for each kinship matrix and region the (mega) intensity, which equals its heritability divided by its size (x 1,000,000); this can be useful for assessing the relative importance of the kinship matrices and regions.
<outfile>.share - contains estimates of the fraction of heritability explained by each kinship matrix and region. It also reports for each the enrichment, which equals its estimated fraction divided by its expected fraction (assuming no enrichment).
<outfile>.coeff - contains estimates of the fixed effects (if no covariates are provided, the only fixed effect will be the intercept).
<outfile>.indi.blp - contains estimates of the random effects (breeding values). There is a pair of columns for each kinship matrix or region: the second of each pair provides g, the estimated breeding values; the first provides K-1g, the estimated breeding values pre-multiplied by the inverse of the kinship matrix (for regions, the first column in each pair is set to 0).
<outfile>.reg.blup and <outfile>.reg.score - contain estimates of effect sizes for regional and top predictors (to get the effect size estimates for the kinship matrix predictors you must use BLUP).
<outfile>.indi.res - contains the residuals after subtracting from the phenotypic values the estimated contributions of the kinship matrices, regions, covariates and top predictors.
If the phenotype is binary and its prevalence is specified, there will be additional output files with the suffix .liab, which provide estimates converted from the observed to the liability scale.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
Example:
Here we use the binary PLINK files human.bed, human.bim and human.fam, the phenotypes quant.pheno and binary.pheno, and the lists of SNPs part1 and part2 from the Test Datasets. We also use the kinship matrix with stem LDAK-Thin created in the example for Calculate Kinships.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
To regress the phenotype on the kinship matrix, run
./ldak.out --reml reml1 --pheno quant.pheno --grm LDAK-Thin
By viewing reml1.reml, we see that the estimate of the heritability contributed by the kinship matrix is 0.62 (SD 0.08).
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
We can include a region using the command
./ldak.out --reml reml2 --pheno quant.pheno --grm LDAK-Thin --region-prefix part --region-number 1 --bfile human --ignore-weights YES --power -.25
The estimated heritabilities contributed by the kinship matrix and region are 0.49 (SD 0.12) and 0.12 (SD 0.08), respectively.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
We can instead include two regions by running
./ldak.out --reml reml3 --pheno quant.pheno --grm LDAK-Thin --region-prefix part --region-number 2 --bfile human --ignore-weights YES --power -.25
The estimated heritabilities contributed by the kinship matrix and two regions are 0.54 (SD 0.18), 0.11 (SD 0.09) and -0.03 (SD 0.08), respectively. It is not a concern that one of the estimates is negative; you should only concerned if an estimate is significantly negative (which likely indicates a poor choice of heritability model). However, if you want to ensure heritabilities are bounded within [0,1], add --constrain YES.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
Finally, we analyse the binary phenotype binary.pheno, which we suppose corresponds to a disease with prevalence 1%.
./ldak.out --reml reml4 --pheno binary.pheno --grm LDAK-Thin --prevalence .01
We now have pairs of heritability estimates; those in reml4.reml correspond to the observed scale, while those in reml4.reml.liab are on the liability scale. The latter are typically preferred, as they are not sensitive to the ascertainment (the proportion of cases and controls used in the analysis).