Tips

If you have any questions, send them to doug <dot> speed <at> ucl <dot> ac <dot> uk.
Below are the responses to some of the popular ones we’ve received:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

The combined weights file produced by –calc-weights contains five columns. The first contains the weightings, typically, values between 0 and 1, although some values might be greater than 1; weightings of 0 indicate the corresponding predictors will be ignored when calculating kinships. The second column states for each predictor how many neighbouring predictors were considered (for this, a predictor counts as a neighbour of itself). The third column reports the total LD-decay weighting of these predictors (when LD decay is turned off, column three matches column two). The fourth column states the sum of the these LD-decay weightings multiplied by correlation squared, whenever the correlation squared exceeds mincor (default 0.01). These values represent the sums of the rows of the matrix C in the paper, and reflect the tagging of predictors, the extent to which each predictor’s signal is replicated by its neighbours. The final column provides the predictor name.

The per-section weights files also have five columns. The first four match those in the combined weights file, while the last provides the total tagging of each predictor after weighting; ideally these values should be (very close to) 1.00, because the aim of the weightings is to equalise tagging of each predictor, however when using –quick-weights YES there may be considerable deviation from 1.00.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

The genomic location in the mapfile is typically an integer (the base pair of the SNP). If turning on LD Decay when calculating weightings, the location unit is the same as that used if specifying halflife or maxlife. If you have genetic distances (e.g. centiMorgans), you can accommodate those by providing appropriate values for halflife (the distance at which correlation is likely to be due to chance rather than LD) and maxlife (the maximum range of LD).
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

When calculating weightings, ideally each section would span a chromosome. However, this would be computationally impractical, so by default LDAK considers sections of about 4000 predictors (with a 500 predictor buffer at each end). For sparse data (e.g., 500k SNPs), 500 predictors corresponds to about 3 Mb, typically longer than the range of LD, so the default settings should suffice. However, when analysing dense data, we now advise calculating weightings twice, the second time adding the option –weights <weightsfile>. For the second run, LDAK will consider only SNPs with non-zero weights from the first run. The effect of this is that if only about 10% of SNPs get non-zero weights, then running a second time will increase the effective section length and buffer size by about ten-fold, and therefore allow for the increased SNP density. With this option, it is should no longer necessary to use the options –section-length or –buffer.