Genetic Correlations

We now have two sets of summary statistics and our aim is to calculate the genetic correlation between the corresponding pair of traits. Having calculated the Tagging File (see below for tips when doing this), use the command

--sum-cors <output>

which requires the options

--tagfile <tagging_file> - specifies the tagging file

--summary <sum_file1> and --summary2 <sum_file2> - specify the summary statistics

While not required, we strong advise you add --genomic-control YES (we found this guards against misspecification of the Heritability Model at limited cost to precision).

It is possible to use --extract and --exclude here, but ideally all filtering should be performed when constructing the tagging file (and if using the LDAK Model, also when computing predictor weightings).

By default, LDAK will ignore alleles with ambiguous alleles (those with alleles A & T or C & G) to protect against possible strand errors. If you are confident these are correctly aligned, you can force LDAK to include them by adding --allow-ambiguous YES.

LDAK will report an error if it finds predictors for which the alleles in the summary statistics file are not compatible with those in the tagging file. If only a few predictors are affected, it should be OK to override this error by adding --check-sums NO.

The estimate of genetic correlation is saved in <output>.cors.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Example: for this we assume the Reference Panel is stored in binary PLINK format in the files ref.bed, ref.bim and ref.fam. We use the files height.txt, height.nonamb and height.exclude constructed in the example for Summary Statistics, as well as the weightings file sumsect/weights.short computed in the example for Heritability Model.

We require a second set of summary statistics, so we use results from a GWAS for Alzheimers by Lambert et al.; these are saved in the file IGAP_stage_1.txt available at http://web.pasteur-lille.fr/en/recherche/u744/igap/igap download.php.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

First we format the summary statistics for Alzheimers

awk < IGAP_stage_1.txt '(NR>1){snp=$3;a1=$4;a2=$5;dir=$6;stat=($6/$7)^2;n=17008+37154}(NR==1){print "Predictor A1 A2 Direction Stat n"}(NR>1 && (a1=="A"||a1=="C"||a1=="G"||a1=="T") \
&& (a2=="A"||a2=="C"||a2=="G"||a2=="T")){print snp, a1, a2, dir, stat, n}' > alz.txt

head -n 5 alz.txt
Predictor A1 A2 Direction Stat n
rs28544273 A T -0.0146 0.186583 54162
rs143225517 C T -0.0146 0.186583 54162
rs3094315 G A -0.0122 0.172197 54162
rs61770173 C A -0.0126 0.138147 54162

Check for and remove duplicates;

awk < alz.txt '{print $1}' | sort | uniq -d | head
mv alz.txt alz2.txt
awk '!seen[$1]++' alz2.txt > alz.txt

To identify the non-ambiguous predictors , use

awk < alz.txt '( ($2=="A"&&$3=="C") || ($2=="A"&&$3=="G") || ($2=="C"&&$3=="A") || ($2=="C"&&$3=="T") || ($2=="G"&&$3=="A") || ($2=="G"&&$3=="T") || ($2=="T"&&$3=="C") || ($2=="T"&&$3=="G") ){print $1}' > alz.nonamb

To identify predictors inside the MHC, and those which individually explain >1% of phenotypic variance, or are in LD with predictors which do, use

awk < ref.bim '($1==6 && $4>25000000 && $4<34000000){print $2}' > mhc.snps
awk < alz.txt '(NR>1 && $5>$6/99){print $1}' > alz.big
../ldak.out --remove-tags alz --bfile ref --top-preds alz.big --window-cm 1 --min-cor .1
cat mhc.snps alz.out > alz.exclude
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

When calculating weightings and taggings, we wish to extract only predictors in both height.nonamb and alz.nonamb, and exclude predictors in either height.exclude or alz.exclude

awk '(NR==FNR){arr[$1];next}($1 in arr)' height.nonamb alz.nonamb > both.nonamb
cat height.exclude alz.exclude | sort | uniq > both.exclude

Now we calculate weightings (we speed up by dividing by chromosome)

for j in {1..22}; do
../ldak.out --cut-weights corsect$j --bfile ref --extract both.nonamb --exclude both.exclude --chr $j
../ldak.out --calc-weights-all corsect$j --bfile ref --extract both.nonamb --exclude both.exclude --chr $j
done

mkdir corsect
cat corsect{1..22}/weights.short > corsect/weights.short

then use these to calculate the tagging file corldak.tagging

../ldak.out --calc-tagging corldak --bfile ref --weights corsect/weights.short --power -0.25 --window-cm 1 --extract both.nonamb --exclude both.exclude

Finally, we regress the two sets of summary statistics onto the tagging file (there will likely be a few inconsistent alleles, so add --check-sums NO to ignore).

../ldak.out --sum-cors corldak --tagfile corldak.tagging --summary height.txt --summary2 alz.txt --genomic-control YES

The results are saved in corldak.cors.