Functional Annotations

When estimating the Heritability Enrichments of categories, it is necessary to first construct files indicating which predictors belong to each category. For our recent work, we copied the approach of Finucane et al., and considered 24 categories defined by functional annotations. Here we explain how to construct the predictor lists for these. We also explain how to construct lists of genic and exonic SNPs, based on RefSeq annotations.

For these scripts, we assume the Reference Panel is stored in binary PLINK format in the files ref.bed, ref.bim and ref.fam. Note that predictor lists are relative to the reference panel, so if we changed the panel, we must re-work out which predictors are in each category.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

24 Functional Annotations

First download and unzip the genefiles for the 24 categories (note that on some computers, it is necessary to add --no-check-certificate; if you have any problems, you can instead visit the dropbox page in your browser).

wget -O annotations.zip https://www.dropbox.com/s/jmjwyai27g09ybc/annotations.zip?dl=1
unzip annotations.zip

After unzipping, you will see 24 genefiles providing the genomic positions for each category (the names of categories are provided in annotation.names). To work out which predictors are in each category, use the command

for j in {1..24}; do
../ldak.out --cut-genes ann$j --bfile ref --genefile ann$j.genefile --ignore-weights YES
mv ann$j/genes.predictors.used ann_snps.$j
done

The lists will be saved in ann_snps.1, ann_snps.2, ..., ann_snps.24.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Genic and Exonic Predictors

Download and extract the RefSeq genefiles for genes and exons (again, you may have to add --no-check-certificate).

wget -O refseq.zip https://www.dropbox.com/s/d14dazmfcgc56kj/refseq.zip?dl=1
unzip refseq.zip

To find out which predictors are within genes and exons, use

../ldak.out --cut-genes genes --bfile ref --genefile refseq_genes.txt --ignore-weights YES
../ldak.out --cut-genes exons --bfile ref --genefile refseq_exons.txt --ignore-weights YES

The lists of predictors will be stored in genes/genes.predictors.used and exons/genes.predictors.used. To also include predictors near genes or exons, you can add --gene-buffer. For example

../ldak.out --cut-genes genes_1000 --bfile ref --genefile refseq_genes.txt --ignore-weights YES --gene-buffer 1000

will identify predictors inside or within 1000 basepairs of a gene. The command

awk '(NR==FNR){arr[$1];next}!($1 in arr){print $1}' genes/genes.predictors.used genes_1000/genes.predictors.used > near.genes

will identify which predictors are inĀ  genes_1000/genes.predictors.used (i.e., those which are not inside genes, but are within 1000 basepairs).