Get Weightings

Weightings are calculated in two steps: Step 1 cuts the predictors into SECTIONS; Step 2 calculates weightings for each SECTION, then joins these across SECTIONS (if Step 2 takes too long, it can be parallelized). The arguments for performing these steps are shown below.

Previously (LDAK4), we recommended calculating weightings twice, the second time using only predictors with non-zero weightings from the first run. This was because with dense data, the number of predictors in each section was often more than LDAK could handle. Since LDAK5, this is no longer required, as by default LDAK thins duplicate predictors, which reduces the number of predictors in each section to a manageable level.
Note that this thinning DOES NOT change the LDAK Model; it just speeds up computation.

Options in red are REQUIRED; options in purple are OPTIONAL. If you wish to only analyse a subset of the data, see Data Filtering. In all cases, <folder> is the directory in which output files will be written.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

--cut-weights <folder>

--bfile/--gen/--sp/--speed <prefix> - specifies the datafiles (see File Formats).

By default, LDAK will first thin predictors, using a correlation squared threshold of 0.98. This threshold should suffice, but can be changed using --window-prune. You can turn off thinning by adding --no-thin YES, or if you have previously thinned, add --no-thin DONE (in which case, LDAK will expect to find a file called thin.in).

Having thinned predictors, LDAK will cut into sections. You can change aspects of this cutting by specifying the desired sizes of the sections, windows and/or buffers (see Advanced Options), but in most cases, the default choices will suffice (please read the screen output to see whether changes are suggested).
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

SERIAL VERSION - for most datasets, LDAK will calculate weightings within a few hours. However, if your data are very large (for example, contain rare predictors, or more than 10,000 individuals), consider using the PARALLEL VERSION below.

--calc-weights-all <folder>

--bfile/--gen/--sp/--speed <prefix> - specifies the datafile (see File Formats).

This will calculate weightings for each section in turn, then merge them. If it is interrupted, you can use --start-section to continue from where it stopped.

Estimates of SNP heritability are very sensitive to genotyping errors, so in general, we recommend using only high-quality predictors. However, if you do include lower quality predictors, you can incorporate this uncertainty by using --infos to specify a file containing info scores for each (this file should have two columns: predictor name then info score).

By default, LDAK now calculates weightings using a quadratic solver, but it originally used a linear solver; to revert to the latter, add --simplex YES. Alternatively, it is possible to calculate weightings using a crude approximation with the option --quick-weights YES. These approximate weightings will be better than using no weightings,  but in general we do not recommend their use.

When the outcome is binary and (subsets of) cases and controls have been genotyped separately, it is advisable to calculate correlations separately over cases and controls by using --subset-number and --subset-prefix. See Subset Options.

By default, observed squared-correlations below 0.01 are considered noise; to change this threshold, use --min-cor. To model the decay of LD with distance use --decay YES and --half-life. This feature is recommended when analysing highly structured or related datasets (see LD Decay for more details).
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

PARALLEL VERSION

--calc-weights <folder>

--bfile/--gen/--sp/--speed <prefix> - specifies the datafile (see File Formats).
--section <number> - specifies for which SECTION to calculate weightings.

This calculates weightings for the specified section (so should be run once for each section). As explained above, possible options are --infos, --simplex YES, --quick-weights YES, --subset-number and --subset-prefix, and --decacy YES. When all sections have completed, you should then run

--join-weights <folder>

--bfile/--gen/--sp/--speed <prefix> - specifies the datafile (see File Formats).

This will merge weightings across sections. If some sections are missing weightings, a list of these will be saved in <folder>/weights.missings. From my experience of using clusters, if a section fails to complete, this is usually because the job was allocated to a "slow node", so I would suggest trying a second time before resorting to using --quick-weights YES.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

For both the serial and parallel versions, condensed weightings will be saved in <folder>/weights.short; this contains only predictors with non-zero weightings (as predictors with zero weight can be ignored). A more detailed version will be saved in <folder>/weights.all; for each predictor, this reports the weighting, the number of neighbours (how many predictors were within the window), the tagging (the sum of squared-correlations with predictors within the window), the info score of the predictor and a "check" (the total tagging of the predictor after weighting; if the weightings are accurate, this value should match the info score). If the decay of LD was modelled, the numbers of neighbours and taggings will be weighted according to distance.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Example: for this we use the binary PLINK files test.bed, test.bim and test.fam available in the Test Datasets

../ldak.out --cut-weights sections --bfile test

LDAK first thins the 5000 predictors, after which 3207 remain. The default section length is 1000 predictors, so these are divided into 4 sections. Note that because there are so few individuals in this example dataset (only 268), LDAK recommends reducing the average section length.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

For the SERIAL VERSION, run

../ldak.out --calc-weights-all sections --bfile test

The resulting weightings will be saved in sections/weights.short and sections/weights.all.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

For the PARALLEL VERSION, run

../ldak.out --calc-weights sections --bfile test --section 1
../ldak.out --calc-weights sections --bfile test --section 2
../ldak.out --calc-weights sections --bfile test --section 3
../ldak.out --calc-weights sections --bfile test --section 4

To run these on a cluster, a possible script might be

#!/bin/bash
#$ -S /bin/bash

#$ -t 1-4
number=$SGE_TASK_ID
../ldak.out --calc-weights sections --bfile test --section $number

When each has finished, you should run

../ldak.out --join-weights sections --bfile test

Again, the merged weightings will be stored in sections/weights.all and sections/weights.short.