Estimate Alpha

In the heritability model, the power parameter alpha determines how the expected heritability contributed by a SNP depends on its minor allele frequency. Positive (negative) alpha means that more-common SNPs tend to have larger (smaller) effect sizes than less-common SNPs (see Technical Details for the mathematical details). We are not aware of a way to estimate alpha directly. Therefore, we instead try multiple values of alpha, seeing which fits the data best.

The process for estimating alpha is the very similar to that for estimating SNP Heritability, in that you also use --sum-hers <outfile> to regress the summary statistics onto a tagging file. The difference is that it is now necessary to repeat this multiple times (each time, using a tagging file computed assuming a different value for alpha). Either you can download Pre-computed Taggings (calculated assuming the BLD-LDAK-Lite Model), or Calculate Taggings yourself (in which case, we recommend you assume the Human Default Model)

We recommend using 31 values of alpha, ranging from -1 to 0.5 (step size 0.05). Having performed the analyses, you should run the command --find-gaussian <outfile>. LDAK will then report two estimates of alpha: the first is simply the value that resulted in highest loglSS (our approximate log likelihood), while the second is obtained by fitting a normal distribution to the realisations of loglSS.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Example:

Here we use the binary PLINK files human.bed, human.bim and human.fam, and the phenotype quant.pheno from the Test Datasets.

In order to estimate alpha, we require summary statistics (i.e., the results from regressing the phenotype on each SNP individually). We can obtain these by running

./ldak.out --linear quant --bfile human --pheno quant.pheno

The summary statistics are saved in quant.summaries (already in the format required by SumHer). For more details on this command, see Single-Predictor Analysis.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

We create 31 tagging files, corresponding to different values of alpha, by running

for j in {-20..10}; do
alpha=`echo $j | awk '{print -$1/20}'`; echo $alpha
./ldak.out --calc-tagging HumDef.$j --bfile human --power $alpha
done

We regress the summary statistics on each tagging file

for j in {-20..10}; do
./ldak.out --sum-hers alpha.$j --summary quant.summaries --tagfile HumDef.$j.tagging
done

We could manually obtain an estimate of alpha by identifying the analysis that resulted in highest model fit. The following script extracts loglSS (for the alternative model) from each .<extra> file, then prefixes this by the corresponding value of alpha, then sorts.

for j in {-20..10}; do
alpha=`echo $j | awk '{print -$1/20}'`;
grep Alt_logl alpha.$j.extra | awk -v alpha=$alpha '{print alpha, $2}'
done | sort -n -k 2

We see the highest model fit (the bottom row of the screen output) corresponds to alpha=-.50. However, a better approach is to save the output to a file, then provide this to LDAK

for j in {-20..10}; do
alpha=`echo $j | awk '{print -$1/20}'`;
grep Alt_logl alpha.$j.extra | awk -v alpha=$alpha '{print alpha, $2}'
done > alpha.likes
.
/ldak.out --find-gaussian alpha --likelihoods alpha.likes

The results are saved in alpha.power. This contains two estimates. The mode estimate is -0.50, which is the value of alpha resulting in highest loglSS. The mean estimate is -0.69 (SD 0.50), which is obtained by identifying the Gaussian Distribution that best fits the realisations of loglSS. Note that for real data, the mean estimate is almost always adjacent to the mode estimate (the large difference we observe here, reflects that this is a toy example).