Genomic partitioning is a tool for investigating the genetic architecture of complex traits (see the first application here). It involves estimates heritability contributions from subsets of predictors in order to better understand how causal variants are distributed across the genome. These subsets will normally be disjoint (hence the term partitioning), but need not be. For example, we might compare the relative contributions of genic and inter-genic predictors, in which case we would partition the genome into predictors inside and outside genes; but alternatively, we might perform a pathway analysis using overlapping subsets of predictors.
To perform genomic partitioning, you must first create files that list the predictors in each subset. We suggest naming these files sequentially (e.g., <prefix>1, <prefix>2, ...,<prefix>K, where <prefix> is your chosen prefix and K is the total number of subsets). Note that it is not necessary to create these files if you are partitioning the genome by chromosome (then you can instead use the options --chr <integer> or --by-chr YES; see the example below).
Next you should calculate a kinship matrix corresponding to each subset (for this we generally recommend assuming the LDAK-Thin Model). If Calculating Kinships using the direct method, use --extract <extractfile> to specify the subset file (i.e., first run with --extract <prefix>1, then with --extract <prefix>2, and so on); if using the indirect method, use --partition-number K and --partition-prefix <prefix> when cutting the genome.
Finally, estimate the SNP heritability of each subset of predictors using REML, Haseman-Elston or PCGC Regression (when performing this regression, use --mgrm <kinstems> to provide multiple kinship matrices)
_ _ _ _ _ _ _ _ _ _ _ __ _ _ _ _ _ _ _ _ _ _
Example:
Here we use the binary PLINK files human.bed, human.bim and human.fam, the lists of SNPs part 1, part2 and part 3, and the phenotype quant.pheno from the Test Datasets. We assume the LDAK-Thin Model, which requires us to create a weights file that gives weight one to the predictors that remain after thinning for duplicates (see Technical Details for more information). This can be achieved using the command
./ldak.out --thin thin --bfile human --window-prune .98 --window-kb 100
awk < thin.in '{print $1, 1}' > weights.thin
The weightings are saved in the file weights.thin.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
First we wish to create three kinship matrices, corresponding to the files part1, part2 and part 3. To do this using the direct method, run
for j in {1..3}; do
./ldak.out --calc-kins-direct part$j --bfile human --weights weights.thin --power -.25 --extract part$j
done
The kinship matrices will be saved with stems part1, part2 and part3.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
To construct the same three kinship matrices using the indirect method, run
./ldak.out --cut-kins gp --bfile human --partition-number 3 --partition-prefix part
for j in {1..3}; do
./ldak.out --calc-kins gp --bfile human --partition $j --weights weights.thin --power -.25
done
Now the kinship matrices are saved with stems gp/kinships.1, gp/kinships.2 and gp/kinships.3.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
To instead partition based on chromosome, we can run either
for j in {21..22}; do
./ldak.out --calc-kins-direct chr$j --bfile human --weights weights.thin --power -.25 --chr $j
done
or
./ldak.out --cut-kins gp2 --bfile human --by-chr YES
for j in {1..2}; do
./ldak.out --calc-kins gp2 --bfile human --partition $j --weights weights.thin --power -.25
done
Note that in the first script, we loop from 21 to 22, because our example dataset contains only these two chromosomes; usually you would loop from 1 to 22. Similarly, in the second script, we have only two partitions, whereas normally there would be 22. The kinship matrices will be either saved with stems chr21 and chr22, or with stems gp2/kinships.1 and gp2/kinships.2.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
Having made the kinship matrices, we can now estimate how much phenotypic variance each explains (i.e., the heritability contributed by the subset of predictors used to construct each kinship matrix). For this example, we will estimate the variances using REML, and consider the matrices with stems chr21 and chr22)
echo "chr21
chr22" > mlist.txt
./ldak.out --reml reml5 --pheno quant.pheno --mgrm mlist.txt
First we created the file mlist.txt, that provides the stems of the two kinship matrices. Then we provided this file to REML using the option --mgrm <kinstems>. By viewing reml5.reml, we see that the estimated heritabilities of Chromosomes 21 and 22 are 0.40 (SD 0.08) and 0.22 (SD 0.07), respectively.