LDAK Weightings

The LDAK weightings are designed to account for the fact that levels of linkage disequilibrium vary across the genome (for more details, see Equalise Tagging). Note that we originally recommended using the LDAK weightings for many analyses, but this is no longer the case. We now only advise using them when constructing the BLD-LDAK or BLD-LDAK+Alpha Models, our recommended Heritability Models when analysing summary statistics.

To calculate the LDAK weightings requires two steps: Step 1 cuts the predictors into sections, while Step 2 calculates weightings for each section (and joins them up). In most cases, the two steps will complete within a few hours, however, we also provide ways to parallelise the tasks. Note that the LDAK weightings are sensitive to genotyping errors (genotyping errors will result in misleading estimates of linkage disequilibrium). Therefore, we recommend first performing strict QC (e.g., restricting to SNPs with MAF >0.01 and information score >.95).

Always read the screen output, which suggests arguments and estimates memory usage.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Step 1:

The main argument is --cut-weights <folder>.

This requires the option

--bfile/--gen/--sp/--speed <datastem> - to specify the genetic data files (see File Formats).

By default, LDAK will first thin predictors, using a correlation squared threshold of 0.98. This threshold should suffice, but can be changed using --window-prune <float>. You can turn off thinning by adding --no-thin YES, or if you have previously thinned, add --no-thin DONE (in which case, LDAK will expect to find a file called <folder>/thin.in).

Having thinned predictors, LDAK will cut into sections. You can change aspects of this cutting by specifying the desired sizes of the sections and/or buffers (using a combination of --section-kb <float>, --section-length <integer>, --section-cm <float>, --buffer-kb <float>, --buffer-length <integer> and/or --buffer-cm <float>), but in most cases, the default choices will suffice.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Step 2:

The main argument is --calc-weights-all <folder>.

This requires the option

--bfile/--gen/--sp/--speed <datastem> - to specify the genetic data files (see File Formats).

This will calculate weightings for each section in turn, then merge them. If the job fails to complete due to time, you can use --start-section <integer> to continue from where it stopped.

LDAK originally calculated weightings using a linear solver, but now by default uses a quadratic solver (which we find more efficient); to revert to the former, add --simplex YES. Alternatively, it is possible to calculate weightings using a crude approximation with the option --quick-weights YES. These approximate weightings will be better than using no weightings, but in general we do not recommend their use.

As stated above, we recommend using only high-quality predictors when calculating weightings, and therefore each predictor is assumed to have information score one. However, if this is not the case, you can use --infos <infosfile> to provide scores for each predictors. Further, when the outcome is binary, you can guard against genotyping errors by using --subset-number <integer> and --subset-prefix <subprefix> (see Sample Subsets for more details).

When we originally described LDAK, we advised modelling LD decay. This is achieved by adding --decay YES when calculating weightings (it is also necessary to use --half-life <float> to specify the assumed rate of LD decay). However, we have found that for most analyses (even those using datasets containing high levels of structure or relatedness), modelling LD decay makes very little difference to the results, so we no longer recommend this feature.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

The weightings will be saved in <folder>/weights.short; this contains only predictors with non-zero weightings (as predictors with zero weighting can be ignored). A more detailed version will be saved in <folder>/weights.all; for each predictor, this reports the weighting, the number of neighbours (how many predictors were within the window), the tagging (the sum of squared-correlations with predictors within the window), the information score of the predictor and a "check" (the total tagging of the predictor after weighting; if the weightings are accurate, this value should match the information score).

An easy way to parallelise Steps 1 & 2 is by using the option --chr <integer> to calculate weightings for each chromosome separately, then merging weights across chromosomes. Alternatively, instead of --calc-weights-all <folder> you can run --calc-weights <folder> adding --section <integer> (you should run this for each section in turn), then --join-weights <folder> (run this once at the end). In both cases, the final weightings will be identical to those constructed using the instructions above. See the example below for further details.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Example:

Here we use the binary PLINK files human.bed, human.bim and human.fam from the Test Datasets.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Step 1:

We cut the genome into sections

./ldak.out --cut-weights sections --bfile human

This produces a number of files, including sections/thin.in, a list of the 3159 predictors which remained after pruning for duplicates, and sections/weights.details, which contains details of the four sections.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Step 2.

We calculate weights for each section, then join these

./ldak.out --calc-weights-all sections --bfile human

The final weightings are saved in sections/weights.short.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

The easy way to parallelise the above steps is by running

for j in {21..22}; do
./ldak.out --cut-weights sections$j --bfile human --chr $j
./ldak.out --calc-weights-all sections$j --bfile human --chr $j
done
cat sections{21..22}/weights.short > weights.short

Note that here we loop from 21 to 22, because our example dataset contains only these two chromosomes; usually you would have to loop from 1 to 22. Alternatively, you can run

./ldak.out --cut-weights sections --bfile human
for j in {1..4}; do
./ldak.out --calc-weights sections --bfile human --section $j

done
./ldak.out --join-weights sections --bfile human

Here we loop from 1 to 4 because we have four sections.