awk is a programming language, installed by default on most UNIX operating systems. I find awk very useful when I want to process very large files (in particular, those too large to easily manipulate in R).
Below are a few examples; the dollar sign ($) represents the command prompt in a terminal, and should not be typed. Bear in mind, these are just a few examples of what awk can do; for a more complete tutorial, see the links at the bottom.
If you want to work through these examples, first download these two files:
file1 file2
First we will print out Column 1 of file1.txt
$awk < file1.txt '{print $1}'
and now Column 2
$awk < file1.txt '{print $2}'
Now print both columns
$awk < file1.txt '{print $1, $2}'
This could also be achieved using
$awk < file1.txt '{print $0}'
Now print out rows with second value <0.7
$awk < file1.txt '{if($2<0.7){print $0}}'
This can also be achieved using
$awk < file1.txt '($2<0.7){print $0}'
Now print only the top 2 rows (NR is a built-in variable indicating the row number)
$awk < file1.txt '(NR<=2){print $0}'
(there are loads of other built-in variables)
and only the results for SNPs rs3 and rs8
$awk < file1.txt '($1=="rs3"||$1=="rs8"){print $0}'
(|| means "OR"; by contrast, && means "AND")
Now we can add some annotations
$awk < file1.txt '($1=="rs3"||$1=="rs8"){print "SNP:",$1,"P-Value",$2}'
I also use awk to find the intersect (overlap) of two files. For example, to find the SNPs in file1.txt and file2.txt, you can first store Column 1 of the first file (in the variable arr), then test whether elements in Column 1 of the second file are in arr
$awk '{if(NR==FNR){arr[$1];next}}($1 in arr){print $1}' file1.txt file2.txt
Here, FNR is the row count for the file being read, if testing whether NR equals FNR is asking whether awk is reading the first file.
Again, this can be abbreviated
$awk '(NR==FNR){arr[$1];next}($1 in arr){print $1}' file1.txt file2.txt
We can "carry over details from the first file"
$awk '(NR==FNR){arr[$1]=$2;next}($1 in arr){print "SNP:",$1, "P-Value1:",arr[$1], "P-Value2:", $2}' file1.txt file2.txt
and we can save the output to a file (rather than printing on the screen)
$awk '(NR==FNR){arr[$1]=$2;next}($1 in arr){print "SNP:",$1, "P-Value1:",arr[$1], "P-Value2:", $2}' file1.txt file2.txt > file3.txt
https://en.wikipedia.org/wiki/AWK
http://www.tutorialspoint.com/awk/awk_overview.htm