Genomic Partitioning

Genomic partitioning is a popular idea for interrogating the genetic architecture of complex traits [1]. It involves computing heritability contributions from subsets of predictors to try and narrow down the search for causal variants. These subsets will normally be disjoint,  (hence the term partitioning), but need not be. For example, we might wish to consider the total contribution of genic versus inter-genic SNPs, so divide the genome into, say, those SNPs within 20kbp of a gene, and those not. Alternatively, we might wish to perform pathway analysis, and consider possibly overlapping subsets of SNPs.

To consider subsets of predictors, use the options –partition-number  and –partition-prefix when calculating kinships. e.g., create index files list1, list2, list3, then when cutting the genome add  –partition-number 3 and –partition-prefix list.

When performing genomic partitioning, it is important that weightings are calculated over the union of all subsets being compared. If this union very almost contains all predictors, it should only be a slight approximation to use weightings calculated over all predictors, but otherwise it is necessary to calculate weightings over the appropriate subset. Having calculated the weightings, you then compute kinships over the different regions (see the Demonstration for Example C below).
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Consider the following examples using the binary PLINK files test.bed, test.bim and test.fam in the Test Datasets. In total, there are 5000 predictors.

A – we wish to compare h2 from predictors 1-2500 and from predictors 2501-5000. The union of these two subsets is 1-5000, so we can simply use weightings calculated over all 5000 predictors.

B – we wish to compare h2 from predictors 1-2499 and from predictors 2501-5000. The union of the regions no longer includes predictor 2500, so strictly we should recalculate weightings over predictors {1-2499,2501-5000}, however, accuracy is unlikely to suffer much if instead we use weightings calculated over all predictors.

C – we wish to compare h2 from {1-350}, {300-500} and {2801-5000}. Now, the union of regions is considerably different from all predictors, so we should first recalculate weightings using only the union predictors {1-350,300-500,2801-5000}.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Demonstration of genomic partitioning (based on Example C):

The files list1, list2 and list3 contain the lists of predictors, while the file listALL contains the union of all predictors {1-500,2801-5000}. First calculate weightings.

../ldak.out –cut-weights sectionsGP –bfile test –extract listALL
../ldak.out –calc-weights sectionsGP –bfile test –extract listALL –section 1
../ldak.out –join-weights sectionsGP –extract-index listALL

Weightings will have been saved in sectionsGP/weightsALL. Now calculate kinships.

../ldak.out –cut-kins partitionsGP –bfile test –partition-number 3 –partition-prefix list
../ldak.out –calc-kins partitionsGP –bfile test –weights sectionsGP/weightsALL –partition 1
../ldak.out –calc-kins partitionsGP –bfile test –weights sectionsGP/weightsALL –partition 2
../ldak.out –calc-kins partitionsGP –bfile test –weights sectionsGP/weightsALL –partition 3

The resulting kinship files can then be analysed to assess the relative heritability contributions of the three subsets of SNPs.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

[1]  Genome partitioning of genetic variation for complex traits using common SNPs. J. Yang, T. Manolio, P. Visscher et. al., Nature Genetics, 2011.