Awk Tips and Cheat Sheet
1 awk overview
awk
, like sed,
is a stream editor, allowing you to selectively alter characters
in a stream. stdin
and stdout
are the default streams, but you can take a
filename as an argument to awk in place of stdin
. And you can redirect the
output of awk
to a new file. See Editing in place with awk for tips on
doing that with multiple files.
Of course you must also be aware of the GNU Awk User's Guide.
2 awk contruct
Awk programs are one or more sets of condition-action pairs, where the
conditions
are outside of the curly braces and the actions
are inside the
curly braces.
awk [options] 'condition {action;} condition {action1; action2;}' [file] awk [options] 'condition {action;} condition {action1; action2;}' [file] awk [options] 'condition {action;} condition {action1; action2;}' [file] awk -f awkscript.awk datatoparsefile # override stdin with a filename as input df -h | awk -f awkscript.awk # using stdin awk [-Fc] 'condition {action}'
When you run awk directly from the command line, the awk "script"
is wrapped
in single quotation marks
, i.e. 'tic' Otherwise the awk commands are found in
a file using the -f <filename>
option. When awk commands are in a separate
awk file, the single quotation marks are NOT needed. The script is just a
series of awk pattern and action lines.
An example awk script in a file, showing the series of awk pattern and action lines:
df -h | awk -f awkscript.awk
3 awk script construct
As mentioned above, you can store awk commands in a file, and call that file
using awk -f awkscript file-to-parse
. With this approach, awkscript does NOT
have to be executable. An alternative approach is to make the awkscript.awk
executable (chmod 755 awkscript.awk
) but then the first line in awkscript.awk
must be the shebang line #!/usr/bin/env awk
Script construct assumes the script will be fed 1 or many files, and for each
line of input from all files fed into it (or from stdin) awk will evaluate
each condition and take all actions for that condition until the end, or until
awk encounter the next
keyword which starts the script from the top with the
next line of input.
#!/usr/bin/env awk BEGIN { # condition line has an opening brace action; action; action; ... } condition { action } condition { # each condition line has to have the opening brace. action; # a condition can have multiple actions action2; action3; } ... END { # condition line has an opening brace action; action; action; ... }
4 awk works line by line
awk
takes input, (stdin or a file) and operates on that file, one line at
a time
, and produces output (stdout or a file).
4.1 awk syntax
The most common condition
is a search pattern, /pattern/
that has been matched.
So most often you will see awk examples where they assume that the awk syntax
is always awk '/pattern/{print}'
but it could easily be awk 'NF>3 {print}'
or
even simpler awk 'NF>3'
which would only print lines that more than 3 fields,
as {print}
is the default action, and $0
is the default to the print statement
which is the whole line
So in these syntax examples, remember that the /pattern/
is just a common
condition statement, and that really any condition is allowed.
condition { action } # if condition is met perform the corresponding action # do action on all lines matching pattern awk [-Fc] '/pattern/ {action}' var=value file(s) # do action on all lines matching pattern, AND do action2 for ALL lines awk [-Fc] '/pattern/ {action} {action2}' var=value file(s) # do action for every line of input, and if pattern is found also do action1 # action2, and action3. so for lines matching the pattern, all four actions # would be performed. awk [-Fc] '{action} /pattern/ {action1; action2; action3;}' var=value file(s) awk [-Fc] '$n == somevalue && $m ~ /pattern/ {action; action; action;} {action2;}' var=value file(s) awk '$2=="Ontario" {print $1, $2}' # prints first two fields if the # 2nd field is exactly "Ontario" awk '$2 ~ "Ont" {print $1, $2}' # prints first two fields if the # 2nd field contains "Ont" anywhere awk [-Fc] -f scriptfile var=value file(s) awk [-Fc] '/pattern1/ {action1} /pattern2/ {action2}' var=value file(s) awk '/search string/ {print}' file(s) # this and previous are identical awk '/search string/ {print $1, "this is awk output", $5}' file(s) # awk understancds brackets too, but they are optional. awk '(/pattern1/){action1} (/pattern2/){action2}...' awk '(NR==1){print "Line 1 ";$0} (NR==2){print"Line 2 ";$0}...' awk 'NR==1{print "Line 1 ";$0} NR==2{print"Line 2 ";$0}...' do exactly the same thing. # now if we don't care about labelling with Line 1 and Line 2 the next # 3 lines do the same thing awk 'NR==1{print $0};NR==2{print $0}' awk 'NR==1{print};NR==2{print}' awk 'NR==1;NR==2' # the last statement must have the ; but notice that the next two lines # still do exactly the same thing. i.e. the ; is understood when you have # actions in braces. awk 'NR==1{print $0} NR==2{print $NF}' # print line 1, and print the last field # of line 2. Nothing more. awk 'NR==1{print} NR==2{print}' awk '$n == "string" {action}' #nth field must exactly match the string awk 'tolower($n) == "string" {action}' # compare case insensitive field n NR == 5 {action} NR > 5 && $1 ~/someregex/ {action} # after line 5, if the 1st field contains # some regular exp, then perform action $n ~ "string" {action} # if "string" is found in the nth field do action. awk '/string/ {printf %d\t%d\t%s\n, $1, $2, $3}' file(s) awk -F, '{printf %d\t%d\t%s\n, $1, $2, $3 $4}' file(s) # comma delimited fields awk -F, '{print $1; print $2; print $3; print $4}' file(s) # prints fields # on separate lines awk -F, '{print $4}' file # prints everything between 4th and 5th comma awk -F"\t" '{print $4}' file # prints everything between 4th and 5th tab awk '$5 !~ /string/ {print}' file # prints all lines where the 5th field # does NOT contain "string" in it. awk '{if (NR > 2) print}' file # prints all records except first two awk 'NR%2{printf "%s ",$0;next;}1' yourFile
4.2 awk options
Options entered on the awk command line are:
-F
to specify a field separator/delimiter-f awk-script-file
file contains all the awk commands-v
assign a value to a variable.
Awk works very well when you have columnar data, or field-separated data. Not
surprising then that the most common option is the -F
flag which specifies a
field delimiter
. The default field delimiter is space
, or actually any
whitespace, but can be overridden with the -F
flag.
For example awk -F,
treats the input stream as csv data. (Comma Separated
Values). See also awk input field seperator $IFS.
Other common delimiters are: -F","
and -F":"
and -F"\t"
for comma
, colon
,
tab
separated values respectively. But any character can be assigned the
field delimiter.
5 condition-action pairs.
Awk programs are one or more sets of condition-action pairs, where the conditions are outside of the curly braces and the actions are inside the curly braces.
Condition action pairs all follow these rules:
- A condition is considered
false
if it evaluates tozero
or theempty string
,anything else is true
(uninitialized variables are zero or empty string, depending on context, so they arefalse
). - Either a
condition
or anaction
can beimplied
,true
&print;
respectively Braces without a condition
are considered to have atrue
condition andare always executed
if they are hitAny
condition without an actionwill print the line
if and only if thecondition is met
i.e. NOT ZERO OR NULL
Consider these three lines that simply cat the file:
awk '{print $0}' myfile # missing condition always evalutates to true awk '1' myfile # missing action is print line if condition true awk '2718281828' myfile # both 1 and 2718281828 are not zero, so true
- The first line follows rule 3. above and so the print $0 is
always executed
- The second line follows rule 4. as there is no action and only a condition.
Because the condition is met, i.e. 1 evaluates as
true,
awk prints the line. - The third line condition is not zero, so default action occurs, "print"
Interesting to note that 1 is a condition that is always true, but any natural
number evaluates to true
hence the third line also just prints the file.
This quirk is useful if you want to modify the line before printing it, and
you want to keep it short
So your first condition{action} pair modifies the
line then add the 1
at the end, as the second condition{action} pair, which
then simply prints the newly modified line.
So these four next lines will change the whole line to lower case. For the
first two examples, it will do so whenever the first field
is exactly equal to
"commands:"
For the last two examples: it will do so whenever a case
insensitive version
of the field is "commands:"
# ---condition---- {------action-------} contition (1) awk '$1 == "commands:" { $0 = tolower($0); } 1' commands.data awk '$1 == "commands:" { $0 = tolower($0); } {print $0;}' commands.data awk 'tolower($1) == "commands:" { $0 = tolower($0); } 1' commands.data awk 'tolower($1) == "commands:" { $0 = tolower($0); } {print}' commands.data
6 awk patterns
Inside of //
marks is the awk matching pattern
, or the negations !//
for lines
that do NOT match the pattern. For example /string1/
or negation of it
!/string1/
Lines that match the pattern will cause the action(s) to be performed.
Between the slashes, you can use full regex expressions
.
6.1 Lines without a pattern "match all lines"
lines without a condition, evaluates as "true
" i.e. the default decision when
a pattern is not present is match all lines
. You can think it as no
condition or pattern matches all patterns, so is always true
. awk '{print
$1}'
prints the first field of absolutely every line
6.2 matching patterns vs substitution actions vs ~
6.2.1 substitution
Substitutions are not matching patterns, but are actions. So in that
respect, this section should be in the actions section. But they can be
confused with patterns
so I am including this here.
Some actions are subsitutions, such as sub( / this/, "that")
which are then
followed by print
, so {sub(/this/, "that"); print}
which only makes the
change on the first "this". To catch all of them {gsub(/this /, "that");print}
If you have more than one action, separate each with a ;
semicolon.
{action1; action2; action3}
So often you would see {gsub(/ this/, "that"); print $0}
as the two actions
together.
So awk can drastically manipulate each line based on many conditions and then print the resulting modified line, or part thereof.
6.2.2 substitute end of line with string " EOL"
awk '{ gsub(/$/, "EOF");print}' temperature.data
6.2.3 matching patterns
The matching pattern
is not the substitution action. Matching patterns are
conditions, and use slashes, /
but sub
and gsub
are actions for substitute
and global substitute
, actions respectively.
Matching patterns check the whole line for a match
and perform
the action on each line that matches.
So just within the //
are regexps that will be checked against the whole line
also known as the current input record
.
6.2.4 comparison operator ~
In addition to the matching pattern behaviour that checks the whole line for a
match
, you can use ~
, the comparison operator, which also lets you use
regex in the /pattern/
. This makes it easy to Limit patterns to specific
fields
, followed by the normal //
pattern. So if field 3 has to match a
pattern, do this:
awk [-Fc] '$1 == somevalue && $3 ~ /pattern/ {action1;} {action2;}' var=value file(s) awk [-Fc] '$1 == somevalue && $3 ~/pattern/ {action1;} {action2;}' var=value file(s) # both with or wihtout the space are correct. I would use ~/pattern/ as it # seems clearer # # $1 has to be exactly somevalue, where $3 just has to contain that pattern # breaking it out: 1st condition: $1 == somevalue && $3 ~ /pattern/ 1st action: {action1} 2nd condition2: missing, so evaluates as "true" 2nd action: {action2} # will always be done, as 2nd condition is always true
Another example shows three awk commands that all do the same thing, which is matches all input records with the uppercase letter 'Z' somewhere in the first field:
awk '$1 ~/Z/ {print}' files-of-strings-to-search awk '$1 ~/Z/' files-of-strings-to-search awk '{ if ($1 ~ /Z/) print }' files-of-strings-to-search
When a regexp is enclosed in slashes, such as /fubar/
, we call it a
regexp constant
, as "fubar" is a string constant.
!~
works just like ~
but is the negation, so line NOT matching are selected.
Here are 7 identical examples that only prints lines that ends in the string "Sunday"
Remember the construct is awk '/pattern/{action}' file
awk '($0 ~ /Sunday$/){print $0}' temperature.data awk '$0 ~ /Sunday$/{print $0}' temperature.data # brackets are optional awk '$0 ~ /Sunday$/{print}' temperature.data # default print is entire line $0 awk '/Sunday$/{print}' temperature.data # default comparison is entire line awk '(/Sunday$/){print}' temperature.data awk '(/Sunday$/)' temperature.data # default action is print awk '/Sunday$/' temperature.data # brackets are optional # if you don't specify what is being matched, awk defaults to the entire line
And now an example that only prints lines where field 3 contains the string "Sunday"
awk '($3 ~ /Sunday/{print $0}' temperature.data awk '$3 ~ /Sunday/' temperature.data
6.2.5 Boolean operators in pattern
As mentioned above, the matching pattern does not need to be applied to the
whole line, but rather can be a set of boolean combinations
of field values
.
Those field values can be exact
, as with a straight ==
or they can be regex
with a ~
character followed by the usual /pattern/
.
For example if I want to match only lines where the 1st field is the word "sshd", AND the regex pattern myregex is found in the 2nd field, try this:
awk '$1 == "sshd" && $2 ~ /myregex/ {print $0}' # strings in quotes. awk '$1 == 37 && $2 ~ /myregex/ {print $0}' # numbers as numbers. awk '$2 < 100 && $2 ~ /myregex/ {print $0}' # less than match compare.
This is easy when you remember that everything outside of the curly braces is the condition, and everything inside the curly braces is the action. So any condition would include complex conditions with boolean operators.
See section: boolean operators for comparisons later in this doc.
6.2.6 Other comparison operators
Straight from the gawk manual on comparison operators
Expression | Result |
---|---|
x < y | True if x is less than y |
x <= y | True if x is less than or equal to y |
x > y | True if x is greater than y |
x >= y | True if x is greater than or equal to y |
x == y | True if x is equal to y |
x != y | True if x is not equal to y |
x ~ y | True if the string x matches the regexp |
denoted by y | |
x !~ y | True if the string x does not match the regexp |
denoted by y | |
subscript in array | True if the array "array" has an element with |
the subscript "subscript" |
6.3 anchoring patterns
A very good idea, if you can, is to anchor the pattern
to either the end
of the line, $
or the beginning of the line. ^
But there is also \A
for beginning of a string
, \z
for end of a string
and
\b
for a word boundary
.
pattern | matches |
---|---|
string | |
^ |
beginning of line |
$ |
end of a line |
\A |
beginning of a string |
\z |
end of a string |
\b |
word boundary |
6.3.1 anchoring patterns when using the comparison ~ operator
If you want to anchor a search pattern to a particular field, you can
use the ~
comparison operator.
Assuming this is a file called student.data:
Name Year Average alice 3rd 78 bob 1st 57 john 4th 61 eric 4th 76 graham 4th 58 suzanne 2nd 75 johnson 2nd 43 nancy 2nd 88 nick 1st 100 someone nth 0
Then see the difference in output when running these four awk commands:
$awk '/^n/ {print}' student.data nancy 2nd 88 nick 1st 100 $awk '$2 ~ /^n/ {print}' student.data someone nth 0 $awk '$2 ~ /n/ {print}' student.data suzanne 2nd 75 johnson 2nd 43 nancy 2nd 88 someone nth 0 $awk '/n/ {print}' student.data john 4th 61 suzanne 2nd 75 johnson 2nd 43 nancy 2nd 88 nick 1st 100 someone nth 0
The first matches "n"
at the beginning of the whole line, where the
second matches "n"
at the beginning of the 2nd field only. The third
matches "n"
anywhere in the second field, and finally the last matches "n"
anywhere on the whole line.
6.4 ignore case in matching patterns
There are three ways to ignore case when matching pattersn (that I know of)
- Use the IGNORECASE option
- Use the
tolower()
function on the entire line before the comparison~
awk 'tolower($0) ~ /pattern/' student.data
- Set the variable
IGNORECASE = 1
as part of the condition.
awk 'IGNORECASE = 1;/PatTeRn/ {action} student.data
6.5 awk special patterns, BEGIN and END
awk has two special patterns that always match line 0, BEGIN
or the line
after the last line, END
These like patterns are conditions, that are
followed by {actions} in braces.
So you can have awk do some action
on line 0 and some other action
after
the last line is processed.
Typically BEGIN
is used to initialize variables,
maybe print some headers.
END
is used to print some summary of variables that have been counting
through each line processed. Both BEGIN
and END
are only executed once.
# Take the output of the df -l command and pipe it to this awk script # i.e. df -l | awk -f awkscript-for-df $1 != "tempfs" { used += $3; available += $4; } END { printf "%d GiB used\n%d GiB available\n", used/2^20, available/2^20; }
Another use of end, is to add lines to the end of a file
awk '{print} END {print "* [[https://www.zintis.net][www.zintis.net]]"};' \ projectplan.org > tmp && mv tmp projectplan.org;
Put that in a bash for loop to add the line to all files in a directory:
for orgfile in *.org
7 awk action
After the awk matching pattern
comes the action
in curly braces
'{ }'
The default action is print
, which simply prints the whole line
. You can
also use print $0
as $0
is a field that holds the whole line
7.1 grep and cat using awk
If an condition
is missing, awk evaluates it as true
. So to emulate cat
awk '{print}' filename # just like cat filename # or even awk 1 filename # here the condition is 1, so always true, but the action # is missing, which defaults to {print} # notice that even the ' characters are optional if there # is no ambiguity
If an action
is missing, awk defaults it as print
, you can emulate grep
awk '/apache/' filename # just like grep apache filename
To ignore all lines starting with #
you could awk '!/^#/' file1
If you want to change all occurances of a string, be aware of where you put the print statement. Make sure it is a separate condition/action pair otherwise you will only print lines where you made the substitution and ignore all other lines. For example this will make the change of 4 to "fourth" but will only print the lines that were changed.
# for every line that contains "4th" substitute it with "fourth" # and print only those lines. /4th/ { gsub(/4th/, "fourth"); print; }
Where this will print every line, with changes and lines without changes.
# for every line that contains "4th" substitute it with "fourth" /4th/ { gsub(/4th/, "fourth"); } {print}
A much better approach is the simpler one too:
# for lines containing "eric" /eric/ { # substitute 4th for fourth gsub(/4th/, "fourth") } # print all lines, including the ones that substituted fourth for 4th {print $0}
7.1.1 print action
As mentioned, the default action is print. You can print strings, or fields, or both. By separating the printed objects with a "comma_" the output will have a space between the objects. If you leave out the commas, the spaces on the output will also not be there.
echo "one two three" | awk '{print $1 $2 $3}' onetwothree echo "one two three" | awk '{print $3, $2, $1}' three two one
7.1.2 math actions
The action can be a math expression that will be evaluated. You can use it with ONLY a BEGIN, and avoid a file input altogether as a quick calculator
awk 'BEGIN {print pi*5^2}' awk 'BEGIN {print 451*log(2.718)}'
7.1.3 Multiple actions on a pattern match
Within the curly braces, you can have multiple actions, each separated by a semicolon. so:
NR == 1 { print "This is my data as I like to see it"; print $0; } /looking for this string/
Source of this script is: opensource.com article
7.2 next command
As described above, awk works line by line. On each line, if patterns are
matched to the line, the all the actions get run. However, you can also have
awk do multiple comparisons, and they all get run in the order they appear in
the awk script. Unless…. one of the actions includes the next
command.
In that case, the line is considered completed, and awk moves on to the next
line
in the input
.
This is useful when you need to ignore all the remaining pattern comparisons and subsequent actions on a line. For example, say you are printing records that have a value greater than 60 in the third column. Easy:
$3 >= 60 {print $0}
But let's say you also want to flag records that are greater than 50 with a string "on probation"
$3 >= 60 {print $0} $3 >= 50 {print $0, " On probation"}
This would cause records that have values > 60 to get printed twice
, once
with just the line, the second time with the words "on probation" which
is NOT what you wanted.
You can tell awk "next" as another action with $3 >= 60 in which case they will not have "on probation" in their output.
$3 >= 60 {print $0; next;} $3 >= 50 {print $0, " On probation";}
This can also be written for easier reading as:
$3 >= 60 { print $0; next; } $3 >= 50 { print $0, " On probation"; }
Finally, lets say if the value of column 3 is < 50, print "failed" but not the mark.
$3 >= 60 { print $0; next; } $3 >= 50 { print $0, " On probation"; } $3 < 50 { $3 = "failed"; print $0; }
7.3 awk records (a.k.a. lines)
I already mentioned BEGIN
and END
, but awk also automatically counts the
number of records
(lines) in the input and assigns it the variable NR
.
You recall this variable using NR
8 awk fields $1, $2 etc
Based on the field delimiter, awk takes a line of input and breaks it up into
fields
. awk automatically counts the number of fields
in each line and
assigns it to the variable NF
for each line. You can recall this variable
with NF
. Similarily, for each line the last field is stored in variable $NF
8.0.1 awk field variables
$0
is the whole record (line)$1
is the first field$2
is the second field.$n
is the nth field.$NF
is the last fieldNF
is just a number equal to the number of fields on this line.NR
number of records (i.e. the number of lines if record separator is \nFNR
is the file number of records (is reset to 1 at the top of every new file when multiple files are processed by awk)
To print all fields excluding the first field this will work:
awk '{$1 = ''; print $0}'
which sets the first field to the null string then prints the whole line. i.e. it omits the first field.
To count lines that match a pattern, you can create your own variable and use it to count, like:
BEGIN { rows = 0} rows += 1 END print rows
$ cat file1 one two three $ cat file2 four five six $ awk '{ print FNR, FILENAME, " ==> ", $0 }' file1 file2 $ cat file1 1 file1 ==> one 2 file1 ==> two 3 file1 ==> three 1 file2 ==> four 2 file2 ==> five 3 file2 ==> six
8.1 awk output field seperator $OFS
You can change the output field seperator
, OFS, to what you want. Often
to
the newline character
in which case each field printed will be on a new line.
so assuming your input field separator is a colon, like for example in a
PATH output, and you want to see each path entry on its own line:
echo ${PATH} | awk 'BEGIN{FS=":";OFS="\n"} FNR==1{$1=$1;print;exit}' \ myinputfile echo "${PATH}" | awk '{gsub(/:/, "\n"); print}' echo -e "${PURPLE}${PATH//:/\\n}" # also works, using the echo command itself
But another way to output each field on a new line is using gsub(/ /,"\n")
or
gsub(/ : /,"\n")
if your fields are separated by a colon. but no spaces
surounding the ":".
$ echo "The fox jumped over the dog" | awk '{gsub(/ /,"\n"); print}' The fox jumped over the dog
A simplistic approach, if not so flexible, is to print the newline string wherever you need it, among the other fields. For instance:
awk '{print $1"\n"$2,$3,$4}' file
8.2 removing whitespace
awk
will put a trailing space on the end of a print, i.e. '{print $2 $3}'
To remove that, you can use sub
or gsub
to change zero or more occurances, +
,
of either space or tab, [ \t]
that occur at the end of the line, $
So
/[ \t]+$/
# remember that everything inside the curly braces is an action. {gsub(/[ \t]+$/,"",$0); print;}
(not awk related, but while here see:
alias taka='tput setaf 5;echo -e "${PURPLE}${PATH//:/\n}"'
Another good example from stackoverflow: stackoverflow.com Here we are looking at lines that contain a comma, and then stripping all blank characters from field 2, and printing fields 1 and 2 with a single space between them.
awk -F, '/,/{gsub(/ /, "", $2); print $1","$2} ' input.txt
Explanation:
-F, use comma as field separator (so the thing before the first comma is $1, etc) /,/ operate only on lines with a comma (this means empty lines and lines without a comma are skipped) gsub(/a/, b, c) match the regular expression a, replace it with b, and do all this with the contents of c, in this case 2nd field. print $1","$2 print the contents of field 1, a comma, then field 2 input.txt use input.txt as the source of lines to process
8.3 awk input field seperator $IFS
The usual approach is use -F:
or -F,
… Can also use $FS=":"
or $FS=","
8.4 -f scriptfile
If you collect your awk commands in a separate file
, then you include the -f
file argument
to awk, which then reads the awk commands in that file
for
every line of input from stdin.
9 Variables
9.1 System Varialbes, FS, RS, NR
Because you would want to specify separators in a script and not rely on the
user of a script to remember to call the script with a -F,
parameter, you
should always use these predefined system variables in an awk script, Usually
at the beginning, so in the BEGIN
section.
FS = "\t" # tab for tab seperated values, exactly 1 tab FS = "\t+" # tab for tab seperated values, one or more tabs FS = "[':\t]" # any one of a tic, colon, or tab will be seen as a delimiter FS = "," # for comma separated values FS = "\n" # Good for multi-lined records, so each line is a field. RS = "" # if the above is used, you will need a record separator. # If set to "" will be a blank line. these two are often # combined to handle multi-line records. ORS = "\n" # the output record separator.Good for multi-lined records, # so each line is a field. OFS = "\n" # output field separator, set to newline. Is a space by # default. NR # number of records NF # number of fields FILENAME # the name of the current input file FNR # the number of the current record relative to the input file ARGC # the number of passed parameters ARGV # recalls the command line parameters ENVIRON # an array containing the shell environment vars and values IGNORECASE # to ignore the character case or not BEGIN { FS = ":"; OFS = "\t"; } {print $1, $NF};
9.2 User defined variables.
Simply start using them with an assignment statement, and recall the values later without using the $ sign:
# Using student records as a sample file source. # sample data: # alice 3rd 78 # bob 1st 57 # john 4th 61 # eric 4th 76 # graham 4th 58 # suzanne 2nd 75 # johnson 2nd 43 NR == 1 { print "Adjusting 4th year student marks up by 5%"; } # count the number of 4th year students $2 == "4th" { count++; } # add adjusted mark to all fourth year students /4th/ { markadjust = $3 * 1.05; print "4th year student", $1, " Mark", $3, " Adjusted mark", markadjust; } # also print out the name and mark of third year students /3rd/ { print "Third year student", $1, $3; } END { print "There were " count " 4th year students with adjusted marks" }
9.3 comparisons
You can compare any variable to a value. The comparison is at the "pattern matching stage". Then any action can be taken based on the comparisons.
<
less than>
greather than<=
less than or equal to>=
greater than or equal to==
equal to!=
not equal to~
matches # # used to compare a pattern from a particular field.!~
not matches
So to print the lines only if they have six fields: But also for the very first line only, print the first six fields:
awk 'NR == 1 {print $1 $2 $3 $4 $5 $6}' awk 'NF == 6 {print $0}'
Or more exacting, with all comments:
# run on hydro electric usage data as awk -f electric-usage.awk *.cvs # if you want to run on all your electic data files. BEGIN {FS = ","; ORS = "\n\n"} # input is csv, output is double spaced. # the column headings are on line 2. Print them. NR == 2 {print "\n---------------"; print $1, $4, $5, $6} # only print lines that have six fields. and then only fields 1, 4, 5 & 6 NF == 6 {print $1, $4, $5, $6}
Similar to the above, but with tabbed output:
BEGIN {FS = ","; ORS = "\n" ; print "\n---------------"; } # input is csv, output is double spaced. # the column headings start with "Months" $1 == "Months" {printf "%s\t%s%s\n", $1, $2, $6; print "----------------"} # {print "\n---------------"; print $1, $2, $6} # only print lines where field 2 is greater than 15 celsius # and then, only print the month and have six fields. and then # only fields 1, 4, 5 & 6 # I could do this: # $2 > 15 || $1 == "Months" {print $1, $2, $6} but that prints the heading twice # as the heading is already printed in the script earlier. # so instead, just $2 > 15 {printf "%s\t%d\t%d\n", $1, $2, $6} but that prints the heading twice
9.4 boolean operators for comparisons
The typical ||
for or and &&
for and boolean operators allow you to combine
several comparisons together. For example, if field 3 is < 100 AND the line
has less than 8 fields, print the line would be:
NF < 8 && $3 < 100 {print $0} NF > 2 || $1 ~ /header/ {print $0}
9.5 conditional statement, if
{ if ($3 < 50) print $1, "failed"; if ($3 > 95) { print $1, "with distiction"; print scholars += 1; }; }
9.6 while loops
Lets say you want to average out a variable number of numbers on each line. You can do a while loop up to the number of fields on a line:
Input data:
575,56,124,461 271,182,818,457 314,159,265,342,415,711 478,549,241,256 505,121,211,198,217
You awk script could be:
# these actions run from scratch on every line of input sum = 0; i = 1; while (i < (NF + 1)) { sum += $i i++ }; average = sum / NF print $0, " average -->", average;
9.7 exiting awk after a pattern is seen
You can have awk stop looking at more lines with the {exit}
action. So lets
say you want to stop reading input when you see the first instance of the
string "<footer>"
. Do this:
awk '/<footer>/ {exit}; {print $0}'
9.8 awk range from pattern to pattern
The desire to do something after a pattern has been seen
, then stop doing
it when another pattern is seen
, is common enough that awk has a built-in
construct to do this.
Simply specify the pattern to start an action, then another pattern after
which the action will stop, separated by a comma
awk '/ifseenstartaction/,/stopactionafterseeing/' awk '/startpattern/,/stoppattern/ print $0, "pattern activated";'
10 awk options
-v
var=value
11 Awk Numbering and Calculations:
11.1 precede each line by its line number FOR THAT FILE (left alignment).
11.2 Using a tab (\t) instead of space will preserve margins.
awk '{print FNR "\t" $0}' files*
11.3 precede each line by its line number FOR ALL FILES TOGETHER, with tab.
awk '{print NR "\t" $0}' files*
11.4 number each line of a file (number on left, right-aligned)
Double the percent signs if typing from the DOS command prompt.
awk '{printf("%5d : %s\n", NR,$0)}'
11.5 number each line of file, but only print numbers if line is not blank
Remember caveats about Unix treatment of \r (mentioned above)
awk 'NF{$0=++a " :" $0};{print}' awk '{print (NF? ++a " :" :"") $0}'
11.6 count lines (emulates "wc -l")
awk 'END{print NR}'
11.7 print the sums of the fields of every line
awk '{s=0; for (i=1; i<=NF ; i++) s=s+$i; print s}' awk '{s=0; for (i=1; i<=NF; i++) s=s+$i; print $0, " ==> sum:", s}'
11.8 add all fields in all lines and print the sum
awk '{for (i=1; i<=NF; i++) s=s+$i}; END{print s}'
11.9 print every line after replacing each field with its absolute value
awk '{for (i=1; i<=NF; i++) if ($i < 0) $i = -$i; print }' awk '{for (i=1; i<=NF; i++) $i = ($i < 0) ? -$i : $i; print }'
11.10 print from nth field to the last field in the line
awk -v n=4 '{ for (i=n; i<=NF; i++) printf "%s%s", $i, (i<NF ? OFS : ORS)}' input
This is taken straight out of stackoverflow:
"This will take n
as the value of n
and loop through that number through the
last field NF
, for each iteration it will print the current value
, if that is
not the last value in the line it will print OFS
after it (space), if it is
the last value on the line it will print ORS
after it (newline)."
11.11 print the total number of fields ("words") in all lines
awk '{ total = total + NF }; END {print total}' file wc -w file
11.12 print the total number of lines that contain "Beth"
awk '/Beth/{n++}; END {print n+0}' file grep Beth file | wc -l
11.13 print the largest first field and the line that contains it
Intended for finding the longest string in field #1
awk '$1 > max {max=$1; maxline=$0}; END{ print max, maxline}'
11.14 print the number of fields in each line, followed by the line
awk '{ print NF ":" $0 } '
11.15 print the last field of each line
awk '{ print $NF }'
11.16 print the last field of the last line
awk '{ field = $NF }; END{ print field }'
11.17 print every line with more than 4 fields
condition is NF > 4
, and there is no action, so defaults to print $0.
awk 'NF > 4'
11.18 print every line where the value of the last field is > 4
condition is $NF > 4
, and there is no action, so defaults to print $0.
awk '$NF > 4'
TEXT CONVERSION AND SUBSTITUTION:
11.19 IN UNIX ENVIRONMENT: convert DOS newlines (CR/LF) to Unix format
awk '{sub(/\r$/,"");print}' # assumes EACH line ends with Ctrl-M
11.20 IN UNIX ENVIRONMENT: convert Unix newlines (LF) to DOS format
awk '{sub(/$/,"\r");print}
11.21 IN DOS ENVIRONMENT: convert Unix newlines (LF) to DOS format
awk 1 ? shouldn't it be awk '1' ?
worth testing in a dos environment, which I do not have.
11.22 IN DOS ENVIRONMENT: convert DOS newlines (CR/LF) to Unix format
11.23 Cannot be done with DOS versions of awk, other than gawk:
gawk -v BINMODE="w" '1' infile >outfile
11.24 Use "tr" instead.
tr -d \r <infile >outfile # GNU tr version 1.22 or higher
11.25 delete leading whitespace (spaces, tabs) from front of each line
awk '{$1 = $1} {print}'
awk '{$1 = $1}1'
11.26 aligns all text flush left
awk '{sub(/^[ \t]+/, ""); print}'
11.27 delete trailing whitespace (spaces, tabs) from end of each line
awk '{sub(/[ \t]+$/, "");print}'
11.28 delete BOTH leading and trailing whitespace from each line
awk '{gsub(/^[ \t]+|[ \t]+$/,"");print}' awk '{$1=$1;print}' # also removes extra space between fields
11.29 align all text flush right on a 79-column width
awk '{printf "%79s\n", $0}' file*
Probably better to strip trailing white space first, and then print flush right. so:
awk ' {printf "%79s\n", $0}' file* awk '{sub(/[ \t]+$/, "")} {printf "%79s\n", $0}' file
11.30 center all text on a 79-character width
awk ' {l=length();s=int((79-l)/2); printf "%"(s+l)"s\n",$0}' file* awk '{sub(/[ \t]+$/, "")}{l=length();s=int((79-l)/2); printf "%"(s+l)"s\n",$0}'
Again, best to strip trailing blanks first, to get a true centered output.
11.31 insert 5 blank spaces at beginning of each line (make page offset)
awk '{sub(/^/, " ");print}'
11.32 substitute (find and replace) "foo" with "bar" on each line
awk '{sub(/foo/,"bar");print}' # replaces only 1st instance gawk '{$0=gensub(/foo/,"bar",4);print}' # replaces only 4th instance awk '{gsub(/foo/,"bar");print}' # replaces ALL instances in a line
11.33 substitute "foo" with "bar" ONLY for lines which contain "baz"
awk '/baz/{gsub(/foo/, "bar")};{print}' # prints all lines awk '/baz/{gsub(/foo/, "bar");print}' # prints only lines with baz
Be aware that the print statement can be part of an action, or its own action
11.34 substitute "foo" with "bar" EXCEPT for lines which contain "baz"
awk '!/baz/{gsub(/foo/, "bar")};{print}'
11.35 change "scarlet" or "ruby" or "puce" to "red"
awk '{gsub(/scarlet|ruby|puce/, "red"); print}'
11.36 reverse order of lines (emulates "tac")
awk '{a[i++]=$0} END {for (j=i-1; j>=0;) print a[j--] }' file*
11.37 if a line ends with a backslash, append the next line to it
(fails if there are multiple lines ending with backslash…)
awk '/\\$/ {sub(/\\$/,""); getline t; print $0 t; next}; 1' file*
11.38 print and sort the login names of all users
awk -F ":" '{ print $1 | "sort" }' /etc/passwd
11.39 print the first 2 fields, in opposite order, of every line
awk '{print $2, $1}' file
11.40 switch the first 2 fields of every line
awk '{temp = $1; $1 = $2; $2 = temp}' file
Seems to me to be unneccessary, if you simply switch the fields then save
the output to a new file. i.e. awk '{print $2, $1}' file > newfile
11.41 print every line, deleting the second field of that line
awk '{ $2 = ""; print }'
11.42 print in reverse order the fields of every line
awk '{for (i=NF; i>0; i--) printf("%s ",i);printf ("\n")}' file awk '{for (i=NF; i>0; i--) printf("%s ",$i);printf ("\n")}' file
Don't forget that you want to print the field i, not just the number i, so $i
11.43 remove duplicate, consecutive lines (emulates "uniq")
awk 'a !~ $0; {a=$0}'
This example also removes the blank lines, after the first blank line.
11.44 very similarly, print only unique occurances of each line
awk '!x[$0]++'
11.45 x is a array with an index of x $0, which is initialized to 0 awk 'NR < 11'
11.46 print first line of file (emulates "head -1")
awk 'NR>1{exit};1' # print the last 2 lines of a file (emulates "tail -2") awk '{y=x "\n" $0; x=$0};END{print y}'
11.47 print the last line of a file (emulates "tail -1")
awk 'END{print}'
11.48 print only lines which match regular expression (emulates "grep")
awk '/regex/'
11.49 print only lines which do NOT match regex (emulates "grep -v")
awk '!/regex/'
11.50 print the line immediately before a regex
(but not the line itself i.e the line containing the regex)
awk '/regex/{print x};{x=$0}' awk '/regex/{print (x=="" ? "match on line 1" : x)};{x=$0}'
11.51 print the line immediately after a regex,
(but not the line containing the regex)
awk '/regex/{getline;print}'
11.52 grep for AAA and BBB and CCC (in any order)
awk '/AAA/; /BBB/; /CCC/'
We have three condition/action pairs, each of which is missing the action which defaults to print $0
11.53 grep for AAA and BBB and CCC (in that order)
awk '/AAA.*BBB.*CCC/'
11.54 print only lines of 65 characters or longer
awk 'length > 64'
11.55 print only lines of less than 65 characters
awk 'length < 64'
11.56 print section of file from regular expression to end of file
awk '/regex/,0' awk '/regex/,EOF'
11.57 print section of file based on line numbers (lines 8-12, inclusive)
awk 'NR==8,NR==12'
11.58 print line number 52
awk 'NR==52' awk 'NR==52 {print;exit}' # more efficient on large files
11.59 print section of file between two regular expressions (inclusive)
awk '/Iowa/,/Montana/' # case sensitive
SELECTIVE DELETION OF CERTAIN LINES:
11.60 delete ALL blank lines from a file (same as "grep '.' ")
awk NF awk '/./' # awk 'NR%2{printf "%s ",$0;next;}1' yourFile
This last example above will join every two lines into one line
one two three four
will become
one, two three, four
This works as follows. NR%2
evaluates to zero every second line (even line).
So when it is NOT zero, i.e. every odd line, it prints the odd line, $0, and
immediately goes to the next line. Then the next condition / action is
performed which is 1, always true, so always does the defaut print action.
Changing emails on all my org files:
awk '/^\#\+EMAIL/ {sub(/somedomain DOT ca/, "gmail DOT com");print}' *.org
tail +1 phones.crd | awk -F"\0" '{print}' >! junk
awk [-Fc] 'pattern {action}' var=value file(s) awk [-Fc] -f scriptfile var=value file(s) awk '{print $1}' file(s) awk '/search string/' file(s)
this is the same as:
11.61 grep 'search string' file(s)
awk '/search string/ {print}' file(s)
11.62 this and previous are identical
awk '/search string/ {print $1, "this is awk output", $5}' file(s) awk '/string/ {printf %d\t%d\t%s\n, $1, $2, $3}' file(s) awk -F, '{printf %d\t%d\t%s\n, $1, $2, $3 $4}' file(s)
11.63 comma delimited fields
awk -F, '{print $1; print $2; print $3; print $4}' file(s)
11.64 prints fields on separate lines
awk -F, '{print $4}' file
11.65 prints everything between 4th and 5th comma
awk -F"\t" '{print $4}' file
11.66 prints everything between 4th and 5th tab
awk '$5 !~ /string/ {print}' file
11.67 prints all lines where the 5th field
11.68 does NOT contain "string" in it.
awk '{if (NR > 2) print}' file
11.69 prints all records except first two
12 awk nube one-liners
12.1 print every line
# print is the default action awk '{print}' awk '{print $0}' # $0 is the whole line awk '1'
see condition-action pairs. for what the '1' above accomplishes
12.2 print the first field of every line
awk '{print "$1"}'
12.3 print only lines which match regular expression (emulates "grep")
awk '/regex/' awk '/^# /' # prints lines that start with a hash followed by a space.
12.4 print only lines which do NOT match regex (emulates "grep -v")
awk '!/regex/'
12.5 print every line that contains zintis
awk '/zintis/ {print}'
12.6 on every line that contains zintis, print the 1st and 2nd field
awk '/zintis/ {print "$1, $2"}
with a space between them (comma). Lines that do not match 'zintis' do not get printed at all.
12.7 on every line, remove all characters up to (and including) "Discovered"
awk '{gsub(/^.*Discovered/, "")} print $0}'
12.8 print the first 2 fields, in opposite order, of every line
awk '{print $2, $1}' file
12.9 print every line, deleting the second field of that line
awk '{ $2 = ""; print }'=
12.10 change all occurances of "kg" to "lbs"
awk '{gsub(/kg/, "lbs");print}'=
12.11 search for ip addresses in file "junk"
awk '/[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3/' junk
13 awk beginner one-liners
13.1 change "scarlet" or "ruby" or "puce" to "red"
=awk '{gsub(//scarlet|ruby|puce//, "red"); print}'=
13.2 substitute "foo" with "bar" EXCEPT for lines which contain "baz"
awk '!/baz/{gsub(//foo//, "bar")};{print}'
13.3 substitute "foo" with "bar" ONLY for lines which contain "baz"
awk '//baz/{gsub(//foo//, "bar")};{print}'
13.4 substitute (find and replace) "foo" with "bar" on each line
awk '{sub(//foo//,"bar");print}' # replaces only 1st instance gawk '{$0=gensub(//foo//,"bar",4);print}' # replaces only 4th instance awk '{gsub(//foo//,"bar");print}' # replaces ALL instances in a line
13.5 print the last line of a file (emulates "tail -1")
awk 'END{print}'
13.6 print every line, then add one more extra line at the end
awk '{print;} END {print "Extra line";}' junk.org
13.7 print the number of fields in each line, followed by the line
awk '{ print NF ":" $0 } '
14 awk intermediate one-liners
14.1 print in reverse order the fields of every line
remember that everything within the {} is an action(s)
awk '{for (i=NF; i>0; i--) printf("%s ",$i);printf ("\n")}' file
14.2 remove duplicate, consecutive lines (emulates "uniq")
awk 'a !~ $0; {a=$0}'
This works as follows: the condition is a !~ $0
which evaluates as true
if a
does not equal $0
, i.e. the whole line. i.e. a fresh line. If true,
then execute the action which set a
to be the whole line {a=$0}
The condition is NOT met whenever we have a duplicate line, so no action is done, i.e. the line is NOT printed.
The only other quirk to remember is that in addition executing the action
in the curly braces and setting a to be the whole line, awk ALSO PRINTS the
whole line. That comes from point 2) below: the implied action
is print
.
Useful to review:
- A condition is considered
false
if it evaluates tozero
or theempty string
,anything else is true
(uninitialized variables are zero or empty string, depending on context, so they arefalse
). - Either a
condition
or anaction
can beimplied
,true
&print;
respectively Braces without a condition
are considered to have atrue
condition andare always executed
if they are hitAny
condition without an actionwill print the line
if and only if thecondition is met
i.e. NOT ZERO OR NULL
14.3 switch the first 2 fields of every line
awk '{temp = $1; $1 = $2; $2 = temp}' file
14.4 IN UNIX ENVIRONMENT: convert Unix newlines (LF) to DOS format
awk '{sub(/$/,"\r");print}
14.5 count number of rows (NR) a.k.a. lines (emulates "wc -l")
awk 'END{print NR}'
14.6 print the number of fields (NF) in each line, followed by the line
awk '{ print NF ":" $0 } '
14.7 print the last field of each line
awk '{ print $NF }'
14.8 print the last field of the last line
awk '{ field = $NF }; END{ print field }'
14.9 print every line with more than 4 fields
awk 'NF > 4'
14.10 print every line where the value of the last field is > 4
awk '$NF > 4'
HANDY ONE-LINERS FOR AWK 22 July 2003 compiled by Eric Pement <pemente@northpark.edu> version 0.22 Latest version of this file is usually at: http://www.student.northpark.edu/pemente/awk/awk1line.txt
USAGE:
awk 'pattern {print "$1"}'= # standard Unix shells
FILE SPACING:
14.11 double space a file
awk '1;{print ""}' awk 'BEGIN{ORS="\n\n"};1'
These both accomplish the same thing.
14.12 double space a file which already has blank lines in it.
Output file should contain no more than one blank line between lines of text. NOTE: On Unix systems, DOS lines which have only CRLF (\r\n) are often treated as non-blank, and thus 'NF' alone will return TRUE.
awk 'NF{print $0 "\n"}'
Lines that are already blank, will have NF = 0 which evaluates as false. So if the condition is false, the action will NOT be done, so not printing the blank line with another \n . Lines that are NOT blank evaluate NF as 1 so true, and are printed with an extra \n at the end, i.e. double spaced.
14.13 triple space a file
awk '1;{print "\n"}'
15 Some more past examples
touch `awk '{print $1"-confg" }' /home/Netman1/zintis/expect/c.pwd `
==============================================================
awk -F: '{printf "%s\t%s\t%s\t%s\t%s\t%s\n", $1, $2, $3, $4, $5, $6}' newsspool
==============================================================
touch `awk '{print $1"-confg" }' /home/Netman1/zintis/expect/c.pwd ` awk -F: '{printf "%s\t%s\t%s\t%s\t%s\t%s\n", $1, $2, $3, $4, $5, $6}' newsspool
To print only lines from snp.txt where the second column is unique, try:
awk 'a !~ $2; {a=$2}' snp.txt
Print lines where column 2 < 10
awk -F, '$2<10' file
conversely print lines where column 2 is greater than or equl to 10
awk -F, '!($2<10)' file
The following awk script prints only duplicated lines in a file, and these lines only once. The line 'deux' comes three times, but will only be output once.
awk 'seen[$0]++ == 1' <filename>
Sample input file:
eins deux drei quattro deux sechs drei acht huit neuf six deux
The output:
deux drei
awk '!x[$0]++'
removes the duplicate lines, as explained below:
x
is a array and it's initialized to 0.- the index of x is
$0
, - if
$0
is first time meet,thenplus 1
to the value ofx[$0]
, =x[$0]
now is1
.- As
+ here is "suffix +", 0 is returned and then be added. - So
!x[$0]
istrue
, the$0
is printed by default. - if
$0
appears more than once,! x[$0]
will befalse
so *won't print$0
.
In this awk script sorting is not necessary. All it does is create an
(associative) array element
with the entire line as the index without a value
(or 0 is you will).
The exclamation mark
negates that value so the outcome is 1 (true). The value
of 1 in awk means perform the default action which is {print $0} so the entire
line gets printed.
Afterwards the ++
comes into action and 1 is added to the array value
, which
now becomes 1
. So the next time the same line is encountered
the value
returned by the array is 1
which is then negated to 0
by the exclamation mark
,
so nothing will get printed.
16 awk to split file into individual words
echo "The fox jumped over the dog" | awk '{gsub(/ /,"\n"); print}'
will produce this output:
The fox jumped over the dog
So, breaking up a file into individual words, one per line would be:
awk '{gsub(/ /,"\n"); print}' filewithwords
Of course you could also just do it with echo.
17 awk to add a line after every line that matches character
Say you want to add a line before every line that starts with #+AUTHOR
awk '/^\#\+AUTHOR/ {print "this is an extra line"}; {print};' *.org
The first statement in between {} prints the extra line. The second statement in between the {} prints the line from the file. The second statement runs every time. The first statement only runs if the line matches #+AUTHOR
18 Editing in place with awk
It turns out that any awk command takes a stdin and outputs to stdout.
But
the original file remains unchanged. To make editing changes to the original
file, you have to use a temporary file
and bash
to then replace the original
file
with what awk output. - Be careful here, as you will clobber the
original file, everytime!
$ awk '{print $0}' file > tmp && mv tmp file # another real-world example for orgfile in *.org do cp ${orgfile} ${orgfile}.bak awk '/^\#\+AUTHOR/{print "#+HTML_HEAD_EXTRA:<style>body {background-color: #222; color: lightgreen; } h3, h4 {color: teal} h1, h2 {color: #08f} pre { background-color: #111; color: #dcc; } code { background-color: #444; color: #ff0; } </style>"};{print};' ${orgfile} > tmp && mv tmp ${orgfile}; done 0 for codef in *.org do awk '{gsub(/#+HTML_HEAD_EXTRA:<style>body {background-color: #222; color: lightgreen; } h3, h4 {color: teal} h1, h2 {color: #08f} pre { background-color: #111; color: #dcc; } code { background-color: #444; color: #ff0; } </style>/,"#+HTML_HEAD_EXTRA:<style>body {background-color:\ #222; color: lightgreen; } h3, h4 {color: teal} h1, h2 \ {color: #08f} pre { background-color: #111; color: #dcc; } \ code { background-color: #444; color: #ff0; } </style>"); print}' \ ${codef} > tmp && mv tmp ${codef}; done
So I would rather make a backup first:
for orgfile in *.org do cp ${orgfile} ${orgfile}.bak awk '/^\#\+AUTHOR/{print "this is an extra line"};{print};' ${orgfile} > tmp && mv tmp ${orgfile} done
#+BEGINSRC bash '/^\#\+AUTHOR/{print "this is an extra line"};{print};' ${orgfile}
cp ${orgfile} ${orgfile}.bak; awk '/^\#\+AUTHOR/{print "this is an extra line"};{print};' ${orgfile} > tmp && mv tmp ${orgfile}; done
for i in $(ls *.bak); do mv ${i} $(basename $i .org) done #+ENDSRCk
18.1 moving backup files back to their original names:
for backfile in *.bak; do mv ${backfile} $(basename ${orgfile}) done
18.2 Home
19 awk cheat from 1996
tail +1 phones.crd | awk -F"\0" '{print}' >! junk
- awk [-Fc] '/pattern/ {action}' var=value file(s)
- awk [-Fc] -f scriptfile var=value file(s)
pattern
is a string surrounded by two /
or preceeded by !
for negation.
pattern
is optional and when ommitted, matches every line
.
awk '{print $1}' file(s)= awk '//search string//' file(s)=
19.1 awk like grep
The following line is the same as: grep 'search string' file(s)
awk '//search string// {print}' file(s)
And since the default action is {print}
you could simplify that further with
awk '/search string/' files
19.2 awk like cat
Since the default action is {print}, and if you omit the search string,
you might think that awk file
is the same as cat file
, however awk is still
expecting something in the single tics, ' ' as the second parameter, so
you must include the '{print}'
here. So cat file
is the same as:
awk '{print}' file(s)
19.3 selective fields
You can output only selected fields if you wish, with $1
being the
first field
. And $NF
being the last field
on a line.
So printing the first, and fifth fields of every line
:
awk '{print $1, " is first field and 5th field is ", $5}' file(s)
And to print the above on only lines that contain search string
you can
awk '/search string/ {print $1, " is first field and 5th field is ", $5}'
19.3.1 Formatted Strings
To format the output a bit, you use %d
for decimal digits, %s
for strings
like this:
awk '/string/ {printf %d\t%d\t%s\n, $1, $2, $3}' file(s)
Or dealing with comma delimited fields into some table:
awk -F, '{printf %d\t%d\t%s\n, $1, $2, $3 $4}' file(s)
comma delimited fields, where
- first field is a decimal digit
- second field is a decimal digit
- third field is a string
print comma delimited fields on separate lines
awk -F, '{print $1; print $2; print $3; print $4}' file(s)=
print everything between 4th and 5th comma
awk -F, '{print $4}' file=
print the fourth field from "last" command output that has "system boot" and pulls out just the unique values (i.e. skips all repeated lines) good to show all kernel versions ever run on your system.
last | grep "system boot" | awk '{print $4}' | uniq
print everything between 4th and 5th tab
awk -F"\t" '{print $4}' file=
print all lines where the 5th field does NOT contain "string" in it.
awk '$5 !~//string// {print}' file # is this a typo? is ~ extraneous?
print all records except first two (NR
is number of records
, a.k.a. lines)
awk '{if (NR > 2) print}' file
print all records that have more than two fields (NF
is number of fields)
#+BEGIN_EXAMPLE awk '{if (NF > 2) print}' file=
,#+BEGINSRC bash
touch `awk '{print $1"-confg" }' /home/Netman1/zintis/expect/c.pwd `
==============================================================
awk -F: '{printf "%s\t%s\t%s\t%s\t%s\n", $1, $2, $3, $4, $5}' newsspool
-calc | psf -w -H " inode Calculations for newsspools" | lpr
#+ENDSRC
That's all I had on awk back in 1997
20 cut on newer Linuxes
cut
is like an awk light
.
man cut
cut -f 3 file.txt
cut -d " " -f 3 file.txt
cut -f 3 file.txt | cut -c 2
cut -c 15-20 file.txt
cut -d ',' -f 3 file.txt
\cut -d ':' -f 3 file.txt
#+ENDEXAMPLE
21 Building up an awk script (an example)
Let's say you have a file of temperature data in both Celsius and Fahrenheit scales. You want to convert all of the data to Celsius.
Here is the data file, called temperature.data
(top 10 entries)
Temperature Scale 39 F 102 F 45 F 6 C 9 C 2 C
Print just the second field:
awk '{print $2}' temperature.data
Output would be:
Scale F F F C C C
Print just the whole line if the second field was "F":
awk '$2 == "F" {print $0}' temperature.data
Gives you:
39 F 102 F 45 F
awk '{print ($2 == "F" ? ($1 - 32)/1.8 : $1) " Celsius"}' temperature.data
Gives you:
Temperature Celsius 3.88889 Celsius 38.8889 Celsius 7.22222 Celsius 6 Celsius 9 Celsius 2 Celsius
Breaking this down:
- will print every line as there is no pattern to match, just a print action
- within the print action there is a condition $2 == "F"
- If the condition is met, i.e. the second field is an "F", then print the first field, $1, less 32 divided by 1.8, else just straight $1 (after the :)
- print "Celsius"
But that messes up the first record which is the heading "Temperature Unit", to become "Temperature Celsius" Obvious not what we intended. So,…. We want to keep as is, so don't change when the record number is 1, NR=1.
First we can simply put the condition that the record number be greater than 1:
awk 'NR>1{print ($2 == "F" ? ($1 - 32)/1.8 : $1) " Celsius"}' temperature.data
Results in:
3.88889 Celsius 38.8889 Celsius 7.22222 Celsius 6 Celsius 9 Celsius 2 Celsius
Which simply removes the heading line altogether. So, a better approach is to let the default awk action (print the whole line) execute when NR==1, and then for all lines NR>1 do the math converstion for lines where $2 is "F"
awk 'NR==1; NR>1{print ($2 == "F" ? ($1 - 32)/1.8 : $1) " Celsius"}' temperature.data
Results in:
Temperature Scale 3.88889 Celsius 38.8889 Celsius 7.22222 Celsius 6 Celsius 9 Celsius 2 Celsius
To make it comletely obvious, you can print only the first line like this:
awk 'NR==1' temperature.data
Gives you:
Temperature Scale
And see what happens when a condition is met by multiple "lines" separated by ";"
awk 'NR==1;NR<4' temperature.data
Gives you:
Temperature Scale Temperature Scale 39 F 102 F
So you can see that for the first record, NR==1 is met so the line is printed. But the second statement is also met, NR<4, is the line is printed again. For subsequent lines, only NR<4 is met, so lines 2, and 3 are printed.
Adding a string to the end of each line:
awk '{ gsub(/$/, " EOF");print}' temperature.data
Results in:
Temperature Scale EOF 39 F EOF 102 F EOF 45 F EOF 6 C EOF 9 C EOF 2 C EOF
Combining with the temperature conversion:
awk 'gsub(/$/, " EOF");NR==1;NR>1{print ($2 == "F" ? ($1 - 32)/1.8 : $1) " Celsius"}' temperature.data
Gives us not quite what we wanted, but illustrates how awk runs multiple actions on each line:
Temperature Scale EOF Temperature Scale EOF 39 F EOF 3.88889 Celsius 102 F EOF 38.8889 Celsius 45 F EOF 7.22222 Celsius 6 C EOF 6 Celsius 9 C EOF 9 Celsius 2 C EOF 2 Celsius
A better approach here would be just to add it to the end of "Celsius"
awk 'NR==1;NR>1{print ($2 == "F" ? ($1 - 32)/1.8 : $1) " Celsius EOF"}' temperature.data
Temperature Scale 3.88889 Celsius EOF 38.8889 Celsius EOF 7.22222 Celsius EOF 6 Celsius EOF 9 Celsius EOF 2 Celsius EOF
21.0.1 Using awk formated print
The printf statement looks like: printf format, item1, item2, ...
The format
definition is a single string that includes format definitions
for each of the items that follow, and any other fixed strings you want
to add
From the example before that gave us lines like 38.8889 Celsius
# no format strings here awk 'NR==1; NR>1{print ($2 == "F" ? ($1 - 32)/1.8 : $1) " Celsius"}' temperature.data
We can try printf("%.1f %c\n",...)
where in this case we have two format
statements, and two objects to print, separated by commas, these being:
- ($2=="F" ? ($1-32) / 1.8 : $1)
- "Celsius"
# format string is "%.1f %c\n" awk 'NR==1; NR>1{printf("%.1f %s\n",($2=="F" ? ($1-32) / 1.8 : $1),"Celsius")}' temperature.data # format string is "%.1f degrees %c\n" awk 'NR==1; NR>1{printf("%.1f degrees %c\n",($2=="F" ? ($1-32) / 1.8 : $1),"Celsius")}' temperature.data # format string is "%.1f degrees Celsius\n" awk 'NR==1; NR>1{printf("%.1f degrees Celsius\n",($2=="F" ? ($1-32) / 1.8 : $1))}' temperature.data
Breaking down the first example better printf format, item1, item2
where in our case:
- format is
"%.1f %s\n"
- item1 is either
($1-32) / 1.8
or$1
according to if$2
is an "F" - item2 is "Celsius"
Temperature Scale 3.9 Celsius 38.9 Celsius 7.2 Celsius 6.0 Celsius Temperature Scale 3.9 degrees C 38.9 degrees C 7.2 degrees C 6.0 degrees C 9.0 degrees C
Notice that you are not seeing Celsius
but C
. That is because only one
character from the string is allowed, with %c. %s would give you the whole
string
# format string is "%.1f %s\n" awk 'NR==1; NR>1{printf("%.1f %s\n",($2=="F" ? ($1-32) / 1.8 : $1),"Celsius")}' temperature.data
Temperature Scale 3.9 Celsius 38.9 Celsius 7.2 Celsius 6.0 Celsius 9.0 Celsius 2.0 Celsius
21.0.2 Extra strings in the format definition
You can include any strings you want right in the format portion. For example we can add the string " degrees" in the format statement as follows:
As shown in an example above, a format string could be =
# format string is "%.1f degrees %s\n" awk 'NR==1; NR>1{printf("%.1f %s\n",($2=="F" ? ($1-32) / 1.8 : $1),"Celsius")}' temperature.data
22 awk arrays
22.1 Features of arrays
In awk they are one dimensional arrays
for storing groups of related strings
.
There is no need to declare the size of the array ahead of time
awk arrays are associative
, meaning that the index need NOT be a number, but
any string. In that sense they are more like python dicitonaries, where each
entry is a pair. an index and the corresponding array element value.
The awk manual uses this array example:
Element "dog" Value "chien" Element "cat" Value "chat" Element "one" Value "un" Element "1" Value "un" Element "not yet" Value "pas encore"
You can recall an array element using square brackets on the index, for
example french["not yet"]
would return "pas encore".
A reference to an array element that does not exist automatically creates it
.
22.1.1 check if element exists
Remember that awk automatically creates any element that does not exist
so
to check if one exists you must use this syntax: indx in array
so for
example
"dog" in french
22.2 arrays in the awk manual
I have barely scratched the surface here. Best to read gawk manual #arrays