Gene-based Analysis

The proportion of phenotypic variance explained by a kinship matrix corresponds to the proportion of variance explained by the predictors used to construct this matrix. Typically, we compute kinships across large numbers of SNPs (e.g., chromosomes, or genome-wide), but we can just as easily compute over small numbers of SNPs and thereby assess the impact of genes or small regions (“chunks”).

In fact, this proves a very efficient way to test a subset of SNPs for association: firstly, by performing just one test per subset (whether the subset has positive heritability), it avoids problems with combining statistics from multiple single-SNP tests of association. Secondly, although the naive approach would be long-winded (first compute kinships, then perform REML analysis), it is much faster to carry out both steps simultaneously.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

–cut-genes <folder>

–bfile/–chiamo/–sp/–speed <prefix> – specifies the data files (see File Formats).

One of the following options is required
–genefile <genefile> – the four columns of <genefile> should specify the name, chromosome, start and end basepairs of each gene/chunk
–chunks <float> – will divide SNPs into chunks of weighting approximately <float>
–chunks-bp <integer> – will divide SNPs into chunks of about <integer> base pairs.

Additionally, you can use the option –pvalues <pvalues> to provide predictor p-values. The file <pvalues> should have one row per predictor and two columns which provide predictor names then p-values. LDAK will then report the minimum p-value for each gene/chunk, and also Bonferroni-corrected p-values.

As when calculating kinships, the genome will be divided into PARTITIONS. The default length is 1,000,000 SNPs,  but this can be changed using the option –partition-length.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

–calc-genes-kins <folder>

–bfile/–chiamo/–sp/–speed <prefix> – specify the data files (see File Formats).
–partition <number> – specifies which PARTITION to consider.

–weights <weightsfile> – specifies the location of predictor weightings. Usually, this will be the file produced by the steps in Get Weightings, else you can provide your own weights file, which should have two columns providing predictor names then weightings.

This step will calculate kinship matrices for each gene (or chunk) in the PARTITION. The kinships for gene X will be saved with prefixes <folder>/geneshipX. By performing REML analysis on each of these in turn, we would obtain estimates of the variance explained by each gene. However, it is far quicker to use –calc-genes-reml (below) which estimate the variance explained in one step.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

–calc-genes-reml <folder>

–pheno <phenofile> – required to provide the responses (in PLINK format). Individuals without a phenotype will be excluded. If <phenofile> contains more than one phenotype, specify which should be used with –mpheno.

–bfile/–chiamo/–sp/–speed <prefix> – specify the data files (see File Formats).
–partition <number> – specifies which PARTITION to consider.

–weights <weightsfile> – specifies the location of predictor weightings. Usually, this will be the file produced by the steps in Get Weightings, else you can provide your own weights file, which should have two columns providing predictor names then weightings.

The results for Partition X will be saved in <folder>/regressX.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

–join-genes-reml <folder>

Will join the results across PARTITIONS, saving the results in <folder>/regressALL.

For Adaptive MultiBLUP, we want to identify groups of significant chunks, which is achieved by adding –sig1 <pvalue1> and –sig2 <pvalue2> as well –bfile/–chiamo/–sp/–speed <prefix> to provide the data files. This will identify chunks with pvalue (from a likelihood ratio test) less than <pvalue1> and merge with neighbouring chunks with pvalue less than <pvalue2>. To instead merge based on a score test, add –score-test YES.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

I get gene annotations using the Tables feature on Genome Browser. Click here, then select the group “Genes and Gene Predictions” and the track you desire. Make sure “genome” is ticked (next to region), then choose “selected fields from primary and related tables” and enter a filename. Having clicked “get output”, make sure you tick the fields “name”, “chrom”, and either “txStart” and “txEnd” or (my preference) “cdsStart” and “cdsEnd”. Annotations for RefSeq based on Genome Build 37, hg19 (Feb 2009) are stored in the file genefile_hg19 available in  Downloads.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Examples using the test datasets:

../ldak.out –cut-genes genes –bfile test –weights sections/re-weightsALL –genefile genefile_hg19 –pvafile pvalues.txt
../ldak.out –cut-genes genes –bfile test –weights sections/re-weightsALL –chunks 20
../ldak.out –cut-genes genes –bfile test –weights sections/re-weightsALL –chunks-bp 1000000