Worked Examples

If you have access to individual-level data, we recommend using Bolt-Predict to construct a Bolt prediction model assuming the BLD-LDAK Model. If you only have access to summary statistics, we recommend using MegaPRS to construct a BayesR prediction model assuming the BLD-LDAK Model. Here we provide complete examples for using Bolt-Predict and MegaPRS. The first example should take about 30 minutes, the second should take about three hours to run on a single CPU or less than an hour if you have access to a computer cluster.

To run these examples, you require the LDAK executable (obtain this from Downloads), which we assume you have renamed ldak.out. To avoid duplicate file names, we suggest you run each example in its own folder (which at the start contains only the LDAK executable).
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Example for running Bolt-Predict (use individual-level data):

Here we use the binary PLINK files human.bed, human.bim and human.fam, and the phenotype quant.pheno from the Test Datasets.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

1 - Preparation

Download and extract the test dataset

wget https://www.dropbox.com/s/5vcfcree3xnvhew/data.zip?dl=1 -O data.zip
unzip data.zip
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

2 - Estimate per-predictor heritabilities, assuming the BLD-LDAK Model.

Compute summary statistics (for more details, see Single-Predictor Analysis).

./ldak.out --linear quant --bfile human --pheno quant.pheno

Download and extract the first 64 SNP annotations for the BLD-LDAK Model (see BLD-LDAK Annotations).

wget https://genetics.ghpc.au.dk/doug/bld.zip
unzip bld.zip

Compute LDAK weightings, and rename these bld65 (see LDAK Weightings).

./ldak.out --cut-weights sections --bfile human
./ldak.out --calc-weights-all sections --bfile human
mv sections/weights.short bld65

Calculate tagging file assuming the BLD-LDAK Model (see Calculate Taggings).

./ldak.out --calc-tagging bld.ldak --bfile human --ignore-weights YES --power -.25 --annotation-number 65 --annotation-prefix bld --window-cm 1 --save-matrix YES

Estimate heritability contributed by each SNP (see SNP Heritability).

./ldak.out --sum-hers bld.ldak --tagfile bld.ldak.tagging --summary quant.summaries --matrix bld.ldak.matrix
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

3 - Create the prediction model.

Estimate effect sizes (see Bolt-Predict).

./ldak.out --bolt bolt --pheno quant.pheno --bfile human --ind-hers bld.ldak.ind.hers --cv-proportion .1
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

The final model is saved in bolt.effects.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Example for running MegaPRS (uses summary statistics):

Here we use summary statistics from the 2014 meta-analysis of height from the GIANT Consortium, while the reference panel uses genotype data for non-Finnish europeans from the 1000 Genomes Project.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

1 - Preparation

Download and extract a non-Finnish european reference panel constructed from 1000 Genomes Project data (the scripts we used to make this panel are in the example of Reference Panel). Note that we recommend using a larger panel (at least 2000 samples).

wget https://genetics.ghpc.au.dk/doug/ref.tgz
tar -xzvf ref.tgz

Download and format the summary statistics (for more details, see Summary Statistics). The raw results use rs ids, so we will use the file ref.names (downloaded in the previous step), to convert these to names of the form Chr:BP, and to exclude SNPs incompatible with our reference panel.

#Download results from GIANT website, and extract SNPs with single-basepair alleles
wget https://portals.broadinstitute.org/collaboration/giant/images/0/01/GIANT_HEIGHT_Wood_et_al_2014_publicrelease_HapMapCeuFreq.txt.gz
gunzip -c GIANT_HEIGHT_Wood_et_al_2014_publicrelease_HapMapCeuFreq.txt.gz | awk '(NR>1){snp=$1;a1=$2;a2=$3;dir=$5;stat=($5/$6)^2;n=$8}(NR==1){print "Predictor A1 A2 Direction Stat n"}(NR>1 && (a1=="A"||a1=="C"||a1=="G"||a1=="T") && (a2=="A"||a2=="C"||a2=="G"||a2=="T")){print snp, a1, a2, dir, stat, n}' - > height.raw

#Use ref.names to convert rs ids and exclude SNPs with inconsistent alleles
awk '(NR==FNR){arr[$1]=$2;ars[$1]=$3$4;next}(FNR==1){print $0}($1 in arr && ($2$3==ars[$1]||$3$2==ars[$1])){$1=arr[$1];print $0}' ref.names height.raw > height.txt

For all analyses, we will only use non-ambiguous reference panel SNPs for which we have summary statistics

awk '(NR==FNR && (($2=="A"&&$3=="C") || ($2=="A"&&$3=="G") || ($2=="C"&&$3=="A") || ($2=="C"&&$3=="T") || ($2=="G"&&$3=="A") || ($2=="G"&&$3=="T") || ($2=="T"&&$3=="C") || ($2=="T"&&$3=="G")) ){arr[$1];next}($2 in arr){print $2}' height.txt ref.bim > use.snps

As we will be using pseudo summary statistics, we must divide the reference panel samples into three (see Pseudo Summaries).

rm keepa keepb keepc
awk < ref.fam '(NR%3==1){print $0 > "keepa"}(NR%3==2){print $0 > "keepb"}(NR%3==0){print $0 > "keepc"}'
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

2 - Estimate per-predictor heritabilities, assuming the BLD-LDAK Model.

Download and extract the first 64 SNP annotations for the BLD-LDAK Model (see BLD-LDAK Annotations).

wget https://genetics.ghpc.au.dk/doug/bld.zip
unzip bld.zip

Compute LDAK weightings, and rename these bld65 (see LDAK Weightings).

for j in {1..22}; do
./ldak.out --cut-weights sections$j --bfile ref --extract use.snps --chr $j

./ldak.out --calc-weights-all sections$j --bfile ref --extract use.snps --chr $j
done
cat sections{1..22}/weights.short > bld65

Calculate tagging file assuming the BLD-LDAK Model (see Calculate Taggings).

./ldak.out --calc-tagging bld.ldak --bfile ref --extract use.snps --ignore-weights YES --power -.25 --annotation-number 65 --annotation-prefix bld --window-cm 1 --save-matrix YES

Estimate heritability contributed by each SNP (see SNP Heritability).

./ldak.out --sum-hers bld.ldak --tagfile bld.ldak.tagging --summary height.txt --matrix bld.ldak.matrix
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

3 - Create the prediction model.

Generate pseudo training and test summary statistics

./ldak.out --pseudo-summaries height --bfile ref --extract use.snps --summary height.txt --training-proportion .9 --keep keepa

Calculate predictor-predictor correlations

./ldak.out --calc-cors cors --bfile ref --extract use.snps --window-kb 3 --keep keepb

Estimate effect sizes (see MegaPRS).

./ldak.out --mega-prs bayesR --model bayesR --bfile ref --extract use.snps --cors cors --ind-hers bld.ldak.ind.hers --summary height.txt --summary2 height.train.summaries --window-kb 1

Test prior distributions

./ldak.out --calc-scores bayesR --bfile ref --keep keepc --scorefile bayesR.effects.train --summary height.test.summaries --power 0 --final-effects bayesR.effects.final
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

The final model is saved in bayesR.effects.best.