Genomic Partitioning

Genomic partitioning is a tool for investigating the genetic architecture of complex traits (see the first application here). It involves estimates heritability contributions from subsets of predictors in order to better understand how causal variants are distributed across the genome. These subsets will normally be disjoint (hence the term partitioning), but need not be. For example, we might compare the relative contributions of genic and inter-genic predictors, in which case we would partition the genome into predictors inside and outside genes;  but alternatively, we might perform a pathway analysis using overlapping subsets of predictors.

To perform genomic partitioning, you must first create files that list the predictors in each subset. We suggest naming these files sequentially (e.g., <prefix>1, <prefix>2, ...,<prefix>K, where <prefix> is your chosen prefix and K is the total number of subsets). Note that it is not necessary to create these files if you are partitioning the genome by chromosome (then you can instead use the options --chr <integer> or --by-chr YES; see the example below).

Next you should calculate a kinship matrix corresponding to each subset (for this we generally recommend assuming the LDAK-Thin Model). If Calculating Kinships using the direct method, use --extract <extractfile> to specify the subset file (i.e., first run with --extract <prefix>1, then with --extract <prefix>2, and so on); if using the indirect method, use --partition-number K and --partition-prefix <prefix> when cutting the genome.

Finally, estimate the SNP heritability of each subset of predictors using REML, Haseman-Elston or PCGC Regression (when performing this regression, use --mgrm <kinstems> to provide multiple kinship matrices)
_ _ _ _ _ _ _ _ _ _ _ __ _ _ _ _ _ _ _ _ _ _


Here we use the binary PLINK files human.bed, human.bim and human.fam, the lists of SNPs list1, list2 and list3, and the phenotype quant.pheno from the Test Datasets. We assume the LDAK-Thin Model, which requires us to create a weights file that gives weight one to the predictors that remain after thinning for duplicates (see Technical Details for more information). This can be achieved using the command

./ldak.out --thin thin --bfile human --window-prune .98 --window-kb 100
awk < '{print $1, 1}' > weights.thin

The weightings are saved in the file weights.thin.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

First we wish to create three kinship matrices, corresponding to the files list1, list2 and list3. To do this using the direct method, run

for j in {1..3}; do
./ldak.out --calc-kins-direct list$j --bfile human --weights weights.thin --power -.25 --extract list$j

The kinship matrices will be saved with stems list1, list2 and list3.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

To construct the same three kinship matrices using the indirect method, run

./ldak.out --cut-kins gp --bfile human --partition-number 3 --partition-prefix list
for j in {1..3}; do
./ldak.out --calc-kins gp --bfile human --partition $j --weights weights.thin --power -.25

Now the kinship matrices are saved with stems gp/kinships.1, gp/kinships.2 and gp/kinships.3.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

To instead partition based on chromosome, we can run either

for j in {21..22}; do
./ldak.out --calc-kins-direct chr$j --bfile human --weights weights.thin --power -.25 --chr $j


./ldak.out --cut-kins gp2 --bfile human --by-chr YES
for j in {1..2}; do
./ldak.out --calc-kins gp2 --bfile human --partition $j --weights weights.thin --power -.25

Note that in the first script, we loop from 21 to 22, because our example dataset contains only these two chromosomes; usually you would loop from 1 to 22. Similarly, in the second script, we have only two partitions, whereas normally there would be 22. The kinship matrices will be either saved with stems chr21 and chr22, or with stems gp2/kinships.1 and gp2/kinships.2.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Having made the kinship matrices, we can now estimate how much phenotypic variance each explains (i.e., the heritability contributed by the subset of predictors used to construct each kinship matrix). For this example, we will estimate the variances using REML, and consider the matrices with stems chr21 and chr22)

echo "chr21
chr22" > mlist.txt
./ldak.out --reml reml5 --pheno quant.pheno --mgrm mlist.txt

First we created the file mlist.txt, that provides the stems of the two kinship matrices. Then we provided this file to REML using the option --mgrm <kinstems>. By viewing reml5.reml, we see that the estimated heritabilities of Chromosomes 21 and 22 are 0.40 (SD 0.08) and 0.22 (SD 0.07), respectively.