AWK

AWK = resources = Awk - A Tutorial and Introduction by Bruce Barnett Peteris Krumins' blog - AWK programming

= oneliner =

Number lines in each file separately
awk '{ print FNR "\t" $0 }'
 * 1) This Awk program appends the FNR - File Line Number predefined variable

Number lines for all files together
awk '{ print NR "\t" $0 }'
 * 1) This one works the same as #5 except that it uses NR -
 * 2) Line Number variable, which does not get reset from file to file.

Print the sum of fields in every line
awk '{ s = 0; for (i = 1; i <= NF; i++) s = s+$i; print s }'

Print the sum of fields in all lines
awk '{ for (i = 1; i <= NF; i++) s = s+$i }; END { print s+0 }'

Replace every field by its absolute value
awk '{ for (i = 1; i <= NF; i++) if ($i < 0) $i = -$i; print }'

Find the line containing the largest (numeric) first field
awk 'NR == 1 { max = $1; maxline = $0; next; } $1 > max { max=$1; maxline=$0 }; END { print max, maxline }'

Count Columns
Count Columns - Print the number of fields in each line, followed by the line awk '{ print NF ":" $0 } '

Print the last field of each line
awk '{ print $NF }'

Print the last field of the last line
awk 'END { print $NF }'

Print every line with more than 4 fields
awk 'NF > 4'

Print every line where the value of the last field is greater than 4
awk '$NF > 4'

= Text manipulation =

Convert Windows/DOS newlines to Unix newlines
awk '{ sub(/\r$/,""); print }' awk '{ sub(/$/,"\r"); print }'

Delete leading whitespace (spaces and tabs) from the beginning of each line (ltrim)
awk '{ sub(/^[ \t]+/, ""); print }'

Delete trailing whitespace (spaces and tabs) from the end of each line (rtrim)
awk '{ sub(/[ \t]+$/, ""); print }'

Delete both leading and trailing whitespaces from each line (trim)
awk '{ gsub(/^[ \t]+|[ \t]+$/, ""); print }'

Insert 3 blank spaces at beginning of each line
awk '{ sub(/^/, "  "); print }'

Substitute (find and replace) "foo" with "bar" on each line
awk '{ sub(/foo/,"bar"); print }' # only first one awk '{ gsub(/foo/,"bar"); print }' # for all gawk '{ $0 = gensub(/foo/,"bar",4); print }' # 4th match only zcat Human_CHRY.sqlite.dump.tab.gz  \ | awk  -v sq=\' '{ gsub(/,/,"\t"); print }'   \ | awk -v sq=\' '{ gsub(/\047/,""); print }'
 * 1) single quote

Substitute "foo" with "bar" only on lines that contain "baz"
awk '/baz/ { gsub(/foo/, "bar") }; { print }'

Substitute "foo" with "bar" only on lines that do not contain "baz"
awk '!/baz/ { gsub(/foo/, "bar") }; { print }'

Change "scarlet" or "ruby" or "puce" to "red"
awk '{ gsub(/scarlet|ruby|puce/, "red"); print}'

Join a line ending with a backslash with the next line
awk '/\\$/ { sub(/\\$/,""); getline t; print $0 t; next }; 1'

Print and sort the login names of all users
awk -F ":" '{ print $1 | "sort" }' /etc/passwd
 * 1) The -F argument specifies split char

Swap first field with second on every line
awk '{ temp = $1; $1 = $2; $2 = temp; print }'

Delete the second field on each line
awk '{ $2 = ""; print }'

Print the fields in reverse order on every line
awk '{ for (i=NF; i>0; i--) printf("%s ", $i); printf ("\n") }'

Remove duplicate, nonconsecutive lines
awk '!a[$0]++'

Concatenate every 5 lines of input with a comma
awk 'ORS=NR%5?",":"\n"'

Split file on patterns
awk -v n=1 '/^FOO[0-9]*/{close("out"n);n++;next} {print > "out"n}' file

= Printing Lines =

Print if column 8 contains/equal H2O
awk '$8 ~ "^H2O$"{ print ; }' awk '$8 ~ "H2O"{ print ; }'

Print the line immediately before a line that matches "/regex/"
awk '/regex/ { print x }; { x=$0 }'

Print the line immediately after a line that matches "/regex/"
awk '/regex/ { getline; print }'

Print lines that match any of "AAA" or "BBB", or "CCC"
awk '/AAA|BBB|CCC/'

Print lines that contain "AAA" and "BBB", and "CCC" in this order
awk '/AAA.*BBB.*CCC/'

Print only the lines that are 10 characters in length or longer
awk 'length > 10'

Print lines 8 to 12 (inclusive)
awk 'NR==8,NR==12'

Print line number 52
awk 'NR==52 { print; exit }'

Print section of a file between two regular expressions (inclusive)
awk '/Iowa/,/Montana/'

Delete all blank lines from a file
awk NF awk '/./'

Print all lines where 5th field is equal to "abc123"
awk '$5 == "abc123"' awk '{ if ($5 == "abc123") { print $0 } }'

Print any line where field #5 is not equal to "abc123"
awk '$5 != "abc123"' awk '{ if ($5 != "abc123") { print $0 } }' awk '!($5 == "abc123")'

Print all lines whose 7th field matches a regular expression
awk '$7 ~ /^[a-f]/' awk '$7 !~ /^[a-f]/' # negated awk '$7 ~ /^[^a-f]/' # negated

Print all the lines in a file that match some pattern
awk '{if ($0 ~ /pattern/) print $0}' awk '$0 ~ /pattern/ {print $0}' awk '/pattern/ {print $0}' awk '/pattern/ {print}' awk '/pattern/'

Prints odd lines that match /pattern/, or even lines that match /anotherpattern/
awk '(NR%2 && /pattern/) || (!(NR%2) && /anotherpattern/)'

awk 'NR % 6'           # prints all lines except those divisible by 6 awk 'NR > 5'           # prints from line 6 onwards (like tail -n +6, or sed '1,5d') awk '$2 == "foo"'      # prints lines where the second field is "foo" awk 'NF >= 6'          # prints lines with 6 or more fields awk '/foo/ && /bar/'   # prints lines that match /foo/ and /bar/, in any order awk '/foo/ && !/bar/'  # prints lines that match /foo/ but not /bar/ awk '/foo/ || /bar/'   # prints lines that match /foo/ or /bar/ (like grep -e 'foo' -e 'bar') awk '/foo/,/bar/'      # prints from line matching /foo/ to line matching /bar/, inclusive awk 'NF'               # prints only nonempty lines (or: removes empty lines, where NF==0) awk 'NF--'             # removes last field and prints the line awk '$0 = NR" "$0'     # prepends line numbers (assignments are valid in conditions)

Prints lines from /beginpat/ to /endpat/, inclusive
awk '/beginpat/,/endpat/'

Prints lines from /beginpat/ to /endpat/, not inclusive
awk '/beginpat/,/endpat/{if (!/beginpat/&&!/endpat/)print}'

Prints lines from /beginpat/ to /endpat/, not including /beginpat/
awk '/beginpat/,/endpat/{if (!/beginpat/)print}'

Prints lines from /beginpat/ to /endpat/, not inclusive
awk '/endpat/{p=0};p;/beginpat/{p=1}'

Prints lines from /beginpat/ to /endpat/, excluding /endpat/
awk '/endpat/{p=0} /beginpat/{p=1} p'

Prints lines from /beginpat/ to /endpat/, excluding /beginpat/
awk 'p; /endpat/{p=0} /beginpat/{p=1}'

Prints lines from /beginpat/ to /endpat/, inclusive
awk '/beginpat/{p=1};p;/endpat/{p=0}'

Field separator
awk 'BEGIN {FS="\t"} ; {print $1"_"$2"_"$4$5"\t"$6"\t"$7"\t"$8"\t"$9}'

= Calculate =

Average the first column of all files in this folder
You need an emptiness check and average =s/(NR-numemptylines) if you have a missing value  (ls | while read filename ; do awk '{sum+=$1} END { print "Average for " FILENAME " = ",sum/NR}' $filename ; done) > count.all.average.txt

Increment
awk 'BEGIN{t=1} {print $0"\t"t; t+=1}'

= Processing two files = awk 'NR==FNR { # some actions; next} # other condition {# other actions}' file1 file2

When processing more than one file, awk reads each file sequentially, one after another, in the order they are specified on the command line. The special variable NR stores the total number of input records read so far, regardless of how many files have been read. The value of NR starts at 1 and always increases until the program terminates. Another variable, FNR, stores the number of records read from the current file being processed. The value of FNR starts from 1, increases until the end of the current file, starts again from 1 as soon as the first line of the next file is read, and so on. So, the condition "NR==FNR" is only true while awk is reading the first file. Thus, in the program above, the actions indicated by "# some actions" are executed when awk is reading the first file; the actions indicated by "# other actions" are executed when awk is reading the second file, if the condition in "# other condition" is met. The "next" at the end of the first action block is needed to prevent the condition in "# other condition" from being evaluated, and the actions in "# other actions" from being executed while awk is reading the first file.

Prints lines that are both in file1 and file2 (intersection)
awk 'NR==FNR{a[$0];next} $0 in a' file1 file2

Data file: 20081010 1123 xxx 20081011 1234 def 20081012 0933 xyz 20081013 0512 abc 20081013 0717 def ...thousand of lines...

map file: abc withdrawal def payment xyz deposit xxx balance ...other codes...

Use information from a map file to modify a data file
awk 'NR==FNR{a[$1]=$2;next} {$3=a[$3]}1' mapfile datafile

Replace each number with its difference from the maximum
awk 'NR==FNR{if($0>max) max=$0;next} {$0=max-$0}1' file file

= Pass Shell Variables To awk = The -v option can be used to pass shell variables to awk command. s=$1 i=$2 awk -v search="$s" '$0 ~ search' "$i"
 * 1) !/bin/bash
 * 2) Usage : Search word using awk for given file.
 * 3) Syntax: ./script "word-to-search" fileToSearch

= Collapse based on similar value in a column = collapse based on a similar value in a column (2nd column) awk '{ if($1 in ps){ ps[$1]=ps[$1]","$2; } else { i[cnt++]=$1; ps[$1]=$2; }  mult[$1]++ } END{ n=asort(i); for(j=1; j<=n; j++) print i[j]"\t"ps[i[j]]"\t"mult[i[j]]; }' tmp1 > tmp1.collapse.txt