The jackknife function measures the similarity between pairs of vectors containing predicted and observed values. Specifically, it computes the correlation, correlation squared, mean squared error and mean absolute error, as well as corresponding estimates of standard deviation. If the observed values are binary, the function can also compute the area under curve. The jackknife function was designed for computing the accuracy of polygenic risk scores, such as those created by the Prediction tools.
Always read the screen output, which suggests arguments and estimates memory usage.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
The main argument is --jackknife <outfile>.
This requires the following two options
--data-pairs <datapairs> or --profile <profile> - to provide pairs of predicted and observed values. If using --data-pairs, then <datapairs> should have either two or three columns (with no headers), containing predicted values, observed values, and (if provided) regression weights. If using --profile, then <profile> should be the output from Calculation Scores.
--num-blocks <integer> - to specify the number of jackknife blocks (usually 200 is a good choice).
If the observed values are binary, you can add --AUC YES, and LDAK will additionally compute the area under curve.
The accuracy estimates will be saved in <outfile>.jack.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
Example:
Here we use the binary PLINK files human.bed, human.bim and human.fam, and the phenotypes binary.binary from the Test Datasets. We will construct a toy score file using the following command
echo "Predictor A1 A2 Centre Effect1 Effect2
21:14642464 A G 0.88 0.3 -0.1
21:14649798 C A 0.97 -0.2 0.4" > scores.txt
We first calculate scores by running
./ldak.out --calc-scores scores --scorefile scores.txt --bfile human --power 0 --pheno binary.pheno
Because we included the option --pheno, the resulting profile contains both scores and the phenotypes. Therefore, we can measure how well the scores predict the phenotypes by running
./ldak.out --jackknife jack --profile scores.profile --num-blocks 200
The file jack.jack reports the correlation, correlation squared, mean squared error and mean absolute error, with standard deviations derived from jackknifing.
If we instead run
./ldak.out --jackknife jack --profile scores.profile --num-blocks 200 --AUC YES
the output is the same as before, except the file jack.jack also reports the area under curve (this is possible because the phenotypes are binary).