Relatives Files

The relatives file is required when using TetraHer and QuantHer. It should have either five or six columns. Columns 1 & 2 should provide the two IDs for the first individual in each pair, while Columns 3 & 4 should provide the two IDs for the second individual in each pair. Column 5 should specify the relatedness between the pair, while Column 6 (if provided) should specify the environmental similarity. Note that the relatedness value should provide the Coefficient of Relatedness of the pair, and therefore be between 0 and 1 (e.g., equal  1 for identical twins, 0.5 for full-siblings and parent-child pairs, 0.25 for half-siblings, etc). In general, In general, it suffices to only include close relatives (e.g., those with relatedness >0.1), because more distantly related pairs will have minimal contribution when estimating heritability.

Two example relatives files, disease.relatives and disease.enviro, are provided in the Test Datasets. The first two lines of these files are as follows:

head -n 2 disease.relatives
27809 27809 29595 29595 0.301
49531 49531 22574 22574 0.479

head -n 2 disease.enviro
27809 27809 29595 29595 0.301 0
49531 49531 22574 22574 0.479 1

The first file, disease.relatives, tells us that the pair of individuals with IDs "27809 27809" and "29595 29595" have estimated relatedness 0.30 (i.e., they are likely half-siblings), while the pair of individuals with IDs "49531 49531" and "22574 22574" have estimated relatedness 0.48 (i.e., they are likely full siblings). The second file, disease.enviro is the same as disease.relatives, except that it also provides estimates of the environmental similarity between each pair (the first pair are assumed to have no common environment, while the second pair are assumed to have the same common environment).
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Constructing the first five columns of the relatives file

First you need to identify pairs of related individuals. This can be done using either pedigree information or SNP data.

If using pedigree information, you probably already have a list of related pairs, and therefore you only need to put this in the format required by LDAK (i.e., so that each row contains the IDs of the first individual in the pair, the IDs of the second individual, and their relatedness). If you plan to also model common environment, see below for advice on how to add the sixth columns.

If using SNP data, there are two main ways to identify related pairs: based on identity by descent, or based on identity by state. To identify pairs based on identity by descent, we recommend using the software KING, while to identify pairs based on identity by state, we recommend using LDAK (specifically, by first Calculating Kinships, then Filtering Relatedness). The example below demonstrates these two approaches.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Adding the sixth column of the relatives file

The sixth column is optional, but if included, it should contain estimates of the environmental similarity between relatives. There is no consensus how to measure environmental similarity, and therefore, we generally consider two approaches. The first assigns value one to full-siblings and identical twins, and zero to all other pairs (on the basis that full-siblings and identical twins likely grew up in the same household). The second approach assigns value one to all pairs. However, we recognise that both approaches are far from perfect, and reflect the difficulty of accurately estimating environmental effects.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Example:

Here we use the binary PLINK files human.bed, human.bim and human.fam from the Test Datasets, and the software KING (version 2.3.0).
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

1 - Identifying related pairs based on identity by descent.

First we use KING to identify related pairs.

./king -b human.bed --related --degree 2

The main output file is king.kin0, which contains pairs of individuals inferred to have coefficient of relatedness at least 0.17. Note that usually this file has one extra column, labelled "InfType", which contains the inferred relationship (e.g., "Dup/MZ", "PO", "FS", "2nd", etc), however, this is not the case here, reflecting that we are using a toy dataset with very few SNPs.

We can now convert this file into the format required by LDAK. Note that KING provides the estimated coefficient of kinship for each pair, and so we must multiply this by two in order to get the corresponding coefficient of relatedness.

awk < king.kin0 '(NR>1){print $1, $2, $3, $4, $10*2}' > hapmap.pairs.king

The file hapmap.pairs.king can now be used as a relatives file when using with TetraHer and QuantHer.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

2 - Identifying related pairs based on identity by state.

First we calculate kinships assuming the LDAK-Thin Heritability Model, using the command

./ldak.out --thin thin --bfile human --window-prune .98 --window-kb 100
awk < thin.in '{print $1, 1}' > weights.thin
./ldak.out --calc-kins-direct LDAK-Thin --bfile human --weights weights.thin --power -.25

The kinship matrix is saved with stem LDAK-Thin. Next we Filter Relatedness, using the command

./ldak.out --filter LDAK-Thin --grm LDAK-Thin --min-rel .1

The list of related pairs is saved in LDAK-Thin.pairs, which is ready for use as a relatives file when using with TetraHer and QuantHer.