Reference Panel

To use SumHer (or any summary-statistics method), you require a reference panel, which is used to estimate SNP-SNP correlations. This reference panel should be ancestrally similar to the GWAS from which the summary statistics come. When analysing results from European GWAS, our preferred reference panel are imputed, autosomal, common SNPs for 8,850 unrelated Caucasian individuals from the Health and Retirement Study whose genotype data are available upon application from dbGaP (accession code phs000428.v2.p2).

We prefer using the Health and Retirement Study because its large sample size enables more accurate estimation of SNP-SNP correlations. However, an alternative is to use genotype data from the ancestrally-matched individuals in the 1000 Genomes Project data. Below are scripts for downloading and extracting genotype data for the 404 non-Finnish Europeans. They make use of PLINK 1.9 which is available here.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

1 - Download sample IDs and extract non-Finnish Europeans

awk < integrated_call_samples_v3.20130502.ALL.panel '($3=="EUR" && $2!="FIN"){print $1, $1}' > eur.keep

2 - Download data for each autosome, and convert using PLINK, extracting European individuals and SNPs with MAF>0.01

for j in {1..22}; do
./plink --vcf ALL.chr$j.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz \
--make-bed --out chr$j --maf 0.01 --keep eur.keep

3 -  Join these together, excluding multi-allelic SNPs and those with duplicate positions

rm list.txt; for j in {1..22}; do echo chr$j >> list.txt; done
./ldak5.linux --make-bed all --mbfile list.txt --exclude-odd YES --exclude-dups YES

The genotype data will now be stored in binary PLINK format in the files all.bed, all.bim and all.fam.

4 - Some predictors have non-unique names (49 in mine), so identify and replace these with generic names of the form chr:bp

awk < all.bim '{print $2}' | sort | uniq -d > dup.snps
awk '(NR==FNR){arr[$1];next}($2 in arr){$2=$1":"$4}{print $0}' dup.snps all.bim > clean.bim
cp all.bed clean.bed
cp all.fam clean.fam

4 - Finally, we need to obtain and insert genetic distances


for j in {1..22}; do
./plink --bfile clean --chr $j --cm-map  genetic_map_b37/genetic_map_chr@_combined_b37.txt --make-bed --out map$j

cat map{1..22}.bim | awk '{print $2, $3}' > map.all
awk '(NR==FNR){arr[$1]=$2;next}{print $1, $2, arr[$2], $4, $5, $6}' map.all clean.bim > ref.bim
cp all.bed ref.bed
cp all.fam ref.fam

If these scripts have run successfully, then your reference panel is saved in Binary PLINK format in the files ref.bed, ref.bim and ref.fam, (you can remove the files with prefixes chr, clean, map and all).