High-LD Regions

Click here to download highld.txt, a file that contains a list of high-LD regions in the human genome (generated by the Abecasis Group). The positions are from the GRCh37/hg19 assembly. Therefore, when using this file, you should first ensure the genomic positions in your genetic data files are also from the GRCh37/hg19 assembly. If not, you can update them using the LiftOver Tool.

We recommend excluding SNPs in high-LD regions when calculating the accuracy of training prediction models constructed using Pseudo Summaries. To identify which SNPs are within the high-LD regions, you can use the command --cut-genes (see Gene-based Analysis for more details, or below for an example).
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Example:

Here we use the binary PLINK files human.bed, human.bim and human.fam from the Test Datasets, and the file highld.txt (download from the top of this page).

Note that the test data files include only Chromosomes 21 & 22, which do not include any high-LD regions. Therefore, for demonstration purposes only, we add a fake region at the start of Chromosome 21, by running

echo "Region25 21  14600000 14700000" | cat highld.txt - > highld.fake
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

To identify which SNPs are within a high-LD region, we run

./ldak.out --cut-genes highld --bfile human --genefile highld.fake

This finds four SNPs (all within the fake high-LD region), which are saved in the file highld/genes.predictors.used.