Awk Tips and Cheat Sheet

Home

1 awk overview

awk, like sed, is a stream editor, allowing you to selectively alter characters in a stream. stdin and stdout are the default streams, but you can take a filename as an argument to awk in place of stdin. And you can redirect the output of awk to a new file. See Editing in place with awk for tips on doing that with multiple files.

Of course you must also be aware of the GNU Awk User's Guide.

2 awk contruct

Awk programs are one or more sets of condition-action pairs, where the conditions are outside of the curly braces and the actions are inside the curly braces.

awk [options] 'condition {action;} condition {action1; action2;}' [file]
awk [options] 'condition {action;} condition {action1; action2;}' [file]
awk [options] 'condition {action;} condition {action1; action2;}' [file]
awk -f awkscript.awk datatoparsefile   # override stdin with a filename as input
df -h | awk -f awkscript.awk           # using stdin
awk [-Fc] 'condition {action}'

When you run awk directly from the command line, the awk "script" is wrapped in single quotation marks, i.e. 'tic' Otherwise the awk commands are found in a file using the -f <filename> option. When awk commands are in a separate awk file, the single quotation marks are NOT needed. The script is just a series of awk pattern and action lines.

An example awk script in a file, showing the series of awk pattern and action lines:

df -h | awk -f awkscript.awk  

3 awk script construct

As mentioned above, you can store awk commands in a file, and call that file using awk -f awkscript file-to-parse. With this approach, awkscript does NOT have to be executable. An alternative approach is to make the awkscript.awk executable (chmod 755 awkscript.awk) but then the first line in awkscript.awk must be the shebang line #!/usr/bin/env awk

Script construct assumes the script will be fed 1 or many files, and for each line of input from all files fed into it (or from stdin) awk will evaluate each condition and take all actions for that condition until the end, or until awk encounter the next keyword which starts the script from the top with the next line of input.

#!/usr/bin/env awk
BEGIN {           # condition line has an opening brace
    action;
    action;
    action;
    ...
}
condition { action }  
condition {      # each condition line has to have the opening brace. 
    action;      # a condition can have multiple actions
    action2;
    action3;
} 
...
END {            # condition line has an opening brace
    action;
    action;
    action;
    ...
}

4 awk works line by line

awk takes input, (stdin or a file) and operates on that file, one line at a time, and produces output (stdout or a file).

4.1 awk syntax

The most common condition is a search pattern, /pattern/ that has been matched. So most often you will see awk examples where they assume that the awk syntax is always awk '/pattern/{print}' but it could easily be awk 'NF>3 {print}' or even simpler awk 'NF>3' which would only print lines that more than 3 fields, as {print} is the default action, and $0 is the default to the print statement which is the whole line

So in these syntax examples, remember that the /pattern/ is just a common condition statement, and that really any condition is allowed.

 condition { action }  # if condition is met perform the corresponding action

 # do action on all lines matching pattern
 awk [-Fc] '/pattern/ {action}' var=value file(s)
 # do action on all lines matching pattern, AND do action2 for ALL lines 
 awk [-Fc] '/pattern/ {action} {action2}' var=value file(s)
 # do action for every line of input, and if pattern is found also do action1
 # action2, and action3.  so for lines matching the pattern, all four actions
 # would be performed.
 awk [-Fc] '{action} /pattern/ {action1; action2; action3;}' var=value file(s)

 awk [-Fc] '$n == somevalue && $m ~ /pattern/ {action; action; action;} {action2;}' var=value file(s)
 awk '$2=="Ontario" {print $1, $2}'  # prints first two fields if the
                                     # 2nd field is exactly "Ontario"
 awk '$2 ~ "Ont" {print $1, $2}'  # prints first two fields if the
                                     # 2nd field contains "Ont" anywhere

 awk [-Fc] -f scriptfile var=value file(s)

 awk [-Fc] '/pattern1/ {action1} /pattern2/ {action2}' var=value file(s)

 awk '/search string/ {print}'  file(s)      # this and previous are identical
 awk '/search string/ {print $1, "this is awk output", $5}'  file(s)

 # awk understancds brackets too, but they are optional.
 awk '(/pattern1/){action1} (/pattern2/){action2}...'
 awk '(NR==1){print "Line 1 ";$0} (NR==2){print"Line 2 ";$0}...'
 awk  'NR==1{print "Line 1 ";$0}   NR==2{print"Line 2 ";$0}...'   do exactly the same thing.

 # now if we don't care about labelling with Line 1 and Line 2 the next
 # 3 lines do the same thing
 awk 'NR==1{print $0};NR==2{print $0}'   
 awk 'NR==1{print};NR==2{print}'   
 awk 'NR==1;NR==2' 

 # the last statement must have the ; but notice that the next two lines
 # still do exactly the same thing. i.e. the ; is understood when you have
 # actions in braces.

 awk 'NR==1{print $0} NR==2{print $NF}'  # print line 1, and print the last field
                                         # of line 2.  Nothing more.
 awk 'NR==1{print} NR==2{print}'

 awk '$n == "string" {action}'  #nth field must exactly match the string
 awk 'tolower($n) == "string" {action}' # compare case insensitive field n
 NR == 5 {action}
 NR > 5 && $1 ~/someregex/ {action} # after line 5, if the 1st field contains
                                    # some regular exp, then perform action
 $n ~ "string" {action}    # if "string" is found in the nth field do action.

awk '/string/ {printf %d\t%d\t%s\n, $1, $2, $3}' file(s)
awk -F, '{printf  %d\t%d\t%s\n, $1, $2, $3 $4}' file(s)  # comma delimited fields
awk -F, '{print $1; print $2; print $3; print $4}' file(s)  # prints fields 
                                                            # on separate lines
awk -F, '{print $4}' file      # prints everything between 4th and 5th comma
awk -F"\t" '{print $4}' file   # prints everything between 4th and 5th tab

awk '$5 !~ /string/ {print}' file        # prints all lines where the 5th field
                                         # does NOT contain "string" in it.
awk '{if (NR > 2) print}' file           # prints all records except first two

awk 'NR%2{printf "%s ",$0;next;}1' yourFile

4.2 awk options

Options entered on the awk command line are:

  • -F to specify a field separator/delimiter
  • -f awk-script-file file contains all the awk commands
  • -v assign a value to a variable.

Awk works very well when you have columnar data, or field-separated data. Not surprising then that the most common option is the -F flag which specifies a field delimiter. The default field delimiter is space, or actually any whitespace, but can be overridden with the -F flag.

For example awk -F, treats the input stream as csv data. (Comma Separated Values). See also awk input field seperator $IFS.

Other common delimiters are: -F"," and -F":" and -F"\t" for comma, colon, tab separated values respectively. But any character can be assigned the field delimiter.

5 condition-action pairs.

Awk programs are one or more sets of condition-action pairs, where the conditions are outside of the curly braces and the actions are inside the curly braces.

Condition action pairs all follow these rules:

  1. A condition is considered false if it evaluates to zero or the empty string, anything else is true (uninitialized variables are zero or empty string, depending on context, so they are false).
  2. Either a condition or an action can be implied, true & print; respectively
  3. Braces without a condition are considered to have a true condition and are always executed if they are hit
  4. Any condition without an action will print the line if and only if the condition is met i.e. NOT ZERO OR NULL

Consider these three lines that simply cat the file:

awk '{print $0}' myfile   # missing condition always evalutates to true
awk '1' myfile            # missing action is print line if condition true
awk '2718281828' myfile   # both 1 and 2718281828 are not zero, so true
  • The first line follows rule 3. above and so the print $0 is always executed
  • The second line follows rule 4. as there is no action and only a condition. Because the condition is met, i.e. 1 evaluates as true, awk prints the line.
  • The third line condition is not zero, so default action occurs, "print"

Interesting to note that 1 is a condition that is always true, but any natural number evaluates to true hence the third line also just prints the file.

This quirk is useful if you want to modify the line before printing it, and you want to keep it short So your first condition{action} pair modifies the line then add the 1 at the end, as the second condition{action} pair, which then simply prints the newly modified line.

So these four next lines will change the whole line to lower case. For the first two examples, it will do so whenever the first field is exactly equal to "commands:" For the last two examples: it will do so whenever a case insensitive version of the field is "commands:"

#    ---condition----  {------action-------} contition (1)
awk '$1 == "commands:" { $0 = tolower($0); } 1'          commands.data
awk '$1 == "commands:" { $0 = tolower($0); } {print $0;}' commands.data
awk 'tolower($1) == "commands:" { $0 = tolower($0); } 1' commands.data
awk 'tolower($1) == "commands:" { $0 = tolower($0); } {print}' commands.data

6 awk patterns

Inside of // marks is the awk matching pattern, or the negations !// for lines that do NOT match the pattern. For example /string1/ or negation of it !/string1/

Lines that match the pattern will cause the action(s) to be performed. Between the slashes, you can use full regex expressions.

6.1 Lines without a pattern "match all lines"

lines without a condition, evaluates as "true" i.e. the default decision when a pattern is not present is match all lines. You can think it as no condition or pattern matches all patterns, so is always true. awk '{print $1}' prints the first field of absolutely every line

6.2 matching patterns vs substitution actions vs ~

6.2.1 substitution

Substitutions are not matching patterns, but are actions. So in that respect, this section should be in the actions section. But they can be confused with patterns so I am including this here.

Some actions are subsitutions, such as sub( / this/, "that") which are then followed by print, so {sub(/this/, "that"); print} which only makes the change on the first "this". To catch all of them {gsub(/this /, "that");print}

If you have more than one action, separate each with a ; semicolon.

  • {action1; action2; action3}

So often you would see {gsub(/ this/, "that"); print $0} as the two actions together.

So awk can drastically manipulate each line based on many conditions and then print the resulting modified line, or part thereof.

6.2.2 substitute end of line with string " EOL"

awk '{ gsub(/$/, "EOF");print}' temperature.data

6.2.3 matching patterns

The matching pattern is not the substitution action. Matching patterns are conditions, and use slashes, / but sub and gsub are actions for substitute and global substitute, actions respectively.

Matching patterns check the whole line for a match and perform the action on each line that matches.

So just within the // are regexps that will be checked against the whole line also known as the current input record.

6.2.4 comparison operator ~

In addition to the matching pattern behaviour that checks the whole line for a match, you can use ~, the comparison operator, which also lets you use regex in the /pattern/. This makes it easy to Limit patterns to specific fields, followed by the normal // pattern. So if field 3 has to match a pattern, do this:

awk [-Fc] '$1 == somevalue && $3 ~ /pattern/ {action1;} {action2;}' var=value file(s)
awk [-Fc] '$1 == somevalue && $3 ~/pattern/ {action1;} {action2;}' var=value file(s)
# both with or wihtout the space are correct.  I would use ~/pattern/ as it
# seems clearer
#
# $1 has to be exactly somevalue, where $3 just has to contain that pattern
# breaking it out:
1st condition: $1 == somevalue && $3 ~ /pattern/
1st action:  {action1}
2nd condition2: missing, so evaluates as "true"
2nd action: {action2}       # will always be done, as 2nd condition is always true

Another example shows three awk commands that all do the same thing, which is matches all input records with the uppercase letter 'Z' somewhere in the first field:

awk '$1 ~/Z/ {print}' files-of-strings-to-search
awk '$1 ~/Z/' files-of-strings-to-search
awk '{ if ($1 ~ /Z/) print }'  files-of-strings-to-search

When a regexp is enclosed in slashes, such as /fubar/, we call it a regexp constant, as "fubar" is a string constant.

!~ works just like ~ but is the negation, so line NOT matching are selected.

Here are 7 identical examples that only prints lines that ends in the string "Sunday" Remember the construct is awk '/pattern/{action}' file

awk '($0 ~ /Sunday$/){print $0}' temperature.data
awk '$0 ~ /Sunday$/{print $0}'   temperature.data  # brackets are optional
awk '$0 ~ /Sunday$/{print}'      temperature.data  # default print is entire line $0
awk '/Sunday$/{print}'           temperature.data  # default comparison is entire line
awk '(/Sunday$/){print}'         temperature.data
awk '(/Sunday$/)'                temperature.data  # default action is print
awk '/Sunday$/'                  temperature.data  # brackets are optional
# if you don't specify what is being matched, awk defaults to the entire line

And now an example that only prints lines where field 3 contains the string "Sunday"

awk '($3 ~ /Sunday/{print $0}' temperature.data
awk '$3 ~ /Sunday/'            temperature.data

6.2.5 Boolean operators in pattern

As mentioned above, the matching pattern does not need to be applied to the whole line, but rather can be a set of boolean combinations of field values. Those field values can be exact, as with a straight == or they can be regex with a ~ character followed by the usual /pattern/.

For example if I want to match only lines where the 1st field is the word "sshd", AND the regex pattern myregex is found in the 2nd field, try this:

awk '$1 == "sshd" && $2 ~ /myregex/ {print $0}'  # strings in quotes.
awk '$1 == 37 && $2 ~ /myregex/ {print $0}'      # numbers as numbers.
awk '$2 < 100  && $2 ~ /myregex/ {print $0}'     # less than match compare.

This is easy when you remember that everything outside of the curly braces is the condition, and everything inside the curly braces is the action. So any condition would include complex conditions with boolean operators.

See section: boolean operators for comparisons later in this doc.

6.2.6 Other comparison operators

Straight from the gawk manual on comparison operators

Expression Result
x < y True if x is less than y
x <= y True if x is less than or equal to y
x > y True if x is greater than y
x >= y True if x is greater than or equal to y
x == y True if x is equal to y
x != y True if x is not equal to y
x ~ y True if the string x matches the regexp
  denoted by y
x !~ y True if the string x does not match the regexp
  denoted by y
subscript in array True if the array "array" has an element with
  the subscript "subscript"

6.3 anchoring patterns

A very good idea, if you can, is to anchor the pattern to either the end of the line, $ or the beginning of the line. ^

But there is also \A for beginning of a string, \z for end of a string and \b for a word boundary.

pattern matches
string  
^ beginning of line
$ end of a line
\A beginning of a string
\z end of a string
\b word boundary

6.3.1 anchoring patterns when using the comparison ~ operator

If you want to anchor a search pattern to a particular field, you can use the ~ comparison operator.

Assuming this is a file called student.data:

Name Year Average
alice 3rd 78
bob 1st 57
john 4th 61
eric 4th 76
graham 4th 58
suzanne 2nd 75
johnson 2nd 43
nancy 2nd 88
nick 1st 100
someone nth 0

Then see the difference in output when running these four awk commands:

$awk  '/^n/ {print}' student.data 
nancy 2nd 88
nick 1st 100

$awk '$2 ~ /^n/ {print}' student.data
someone nth 0

$awk '$2 ~ /n/ {print}' student.data 
suzanne 2nd 75
johnson 2nd 43
nancy 2nd 88
someone nth 0

$awk '/n/ {print}' student.data 
john 4th 61
suzanne 2nd 75
johnson 2nd 43
nancy 2nd 88
nick 1st 100
someone nth 0

The first matches "n" at the beginning of the whole line, where the second matches "n" at the beginning of the 2nd field only. The third matches "n" anywhere in the second field, and finally the last matches "n" anywhere on the whole line.

6.4 ignore case in matching patterns

There are three ways to ignore case when matching pattersn (that I know of)

  1. Use the IGNORECASE option
  2. Use the tolower() function on the entire line before the comparison ~
    awk 'tolower($0) ~ /pattern/' student.data
    
  3. Set the variable IGNORECASE = 1 as part of the condition.
awk 'IGNORECASE = 1;/PatTeRn/ {action} student.data

6.5 awk special patterns, BEGIN and END

awk has two special patterns that always match line 0, BEGIN or the line after the last line, END These like patterns are conditions, that are followed by {actions} in braces.

So you can have awk do some action on line 0 and some other action after the last line is processed.

Typically BEGIN is used to initialize variables, maybe print some headers.

END is used to print some summary of variables that have been counting through each line processed. Both BEGIN and END are only executed once.

# Take the output of the df -l command and pipe it to this awk script
# i.e. df -l | awk -f awkscript-for-df
$1 != "tempfs" {
    used += $3;
    available += $4;
    }
END {
     printf "%d GiB used\n%d GiB available\n", used/2^20, available/2^20;
 }

Another use of end, is to add lines to the end of a file

awk '{print} END {print "* [[https://www.zintis.net][www.zintis.net]]"};' \
projectplan.org > tmp && mv tmp projectplan.org;

Put that in a bash for loop to add the line to all files in a directory:

for orgfile in *.org   

7 awk action

After the awk matching pattern comes the action in curly braces '{ }' The default action is print, which simply prints the whole line. You can also use print $0 as $0 is a field that holds the whole line

7.1 grep and cat using awk

If an condition is missing, awk evaluates it as true. So to emulate cat

awk '{print}' filename  # just like cat filename
# or even
awk 1 filename   # here the condition is 1, so always true, but the action
                 # is missing, which defaults to {print}
                 # notice that even the ' characters are optional if there
                 # is no ambiguity

If an action is missing, awk defaults it as print, you can emulate grep

awk '/apache/' filename   # just like grep apache filename

To ignore all lines starting with # you could awk '!/^#/' file1

If you want to change all occurances of a string, be aware of where you put the print statement. Make sure it is a separate condition/action pair otherwise you will only print lines where you made the substitution and ignore all other lines. For example this will make the change of 4 to "fourth" but will only print the lines that were changed.

# for every line that contains "4th" substitute it with "fourth"
# and print only those lines.
/4th/ {
    gsub(/4th/, "fourth"); print;
}

Where this will print every line, with changes and lines without changes.

# for every line that contains "4th" substitute it with "fourth"
/4th/ {
    gsub(/4th/, "fourth");
}
{print}

A much better approach is the simpler one too:

# for lines containing "eric"
/eric/ {
    # substitute 4th for fourth
    gsub(/4th/, "fourth")
}
# print all lines, including the ones that substituted fourth for 4th
{print $0}

7.1.1 print action

As mentioned, the default action is print. You can print strings, or fields, or both. By separating the printed objects with a "comma_" the output will have a space between the objects. If you leave out the commas, the spaces on the output will also not be there.

echo "one two three" | awk '{print $1 $2 $3}'
onetwothree

echo "one two three" | awk '{print $3, $2, $1}'
three two one

7.1.2 math actions

The action can be a math expression that will be evaluated. You can use it with ONLY a BEGIN, and avoid a file input altogether as a quick calculator

awk 'BEGIN {print pi*5^2}'
awk 'BEGIN {print 451*log(2.718)}'


7.1.3 Multiple actions on a pattern match

Within the curly braces, you can have multiple actions, each separated by a semicolon. so:

NR == 1 {
   print "This is my data as I like to see it";
   print $0;
}
/looking for this string/

Source of this script is: opensource.com article

7.2 next command

As described above, awk works line by line. On each line, if patterns are matched to the line, the all the actions get run. However, you can also have awk do multiple comparisons, and they all get run in the order they appear in the awk script. Unless…. one of the actions includes the next command. In that case, the line is considered completed, and awk moves on to the next line in the input.

This is useful when you need to ignore all the remaining pattern comparisons and subsequent actions on a line. For example, say you are printing records that have a value greater than 60 in the third column. Easy:

$3 >= 60 {print $0}

But let's say you also want to flag records that are greater than 50 with a string "on probation"

$3 >= 60 {print $0}
$3 >= 50 {print $0, " On probation"}

This would cause records that have values > 60 to get printed twice, once with just the line, the second time with the words "on probation" which is NOT what you wanted.

You can tell awk "next" as another action with $3 >= 60 in which case they will not have "on probation" in their output.

$3 >= 60 {print $0; next;}
$3 >= 50 {print $0, " On probation";}

This can also be written for easier reading as:

$3 >= 60 {
    print $0;
    next;
}

$3 >= 50 {
    print $0, " On probation";
}

Finally, lets say if the value of column 3 is < 50, print "failed" but not the mark.

$3 >= 60 {
    print $0;
    next;
    }

$3 >= 50 {
     print $0, " On probation";
     }

$3 < 50 {
     $3 = "failed";
     print $0;
     }

7.3 awk records (a.k.a. lines)

I already mentioned BEGIN and END, but awk also automatically counts the number of records (lines) in the input and assigns it the variable NR. You recall this variable using NR

8 awk fields $1, $2 etc

Based on the field delimiter, awk takes a line of input and breaks it up into fields. awk automatically counts the number of fields in each line and assigns it to the variable NF for each line. You can recall this variable with NF. Similarily, for each line the last field is stored in variable $NF

8.0.1 awk field variables

  • $0 is the whole record (line)
  • $1 is the first field
  • $2 is the second field.
  • $n is the nth field.
  • $NF is the last field
  • NF is just a number equal to the number of fields on this line.
  • NR number of records (i.e. the number of lines if record separator is \n
  • FNR is the file number of records (is reset to 1 at the top of every new file when multiple files are processed by awk)

To print all fields excluding the first field this will work:

  • awk '{$1 = ''; print $0}' which sets the first field to the null string then prints the whole line. i.e. it omits the first field.

To count lines that match a pattern, you can create your own variable and use it to count, like:

BEGIN { rows = 0}
rows += 1
END
print rows
 $ cat file1
 one
 two
 three

 $ cat file2
 four
 five
 six

$ awk '{ print FNR, FILENAME, " ==> ", $0 }' file1 file2
     $ cat file1
1 file1 ==> one
2 file1 ==> two
3 file1 ==> three
1 file2 ==> four
2 file2 ==> five
3 file2 ==>  six

8.1 awk output field seperator $OFS

You can change the output field seperator, OFS, to what you want. Often to the newline character in which case each field printed will be on a new line. so assuming your input field separator is a colon, like for example in a PATH output, and you want to see each path entry on its own line:

echo ${PATH} | awk 'BEGIN{FS=":";OFS="\n"} FNR==1{$1=$1;print;exit}' \
      myinputfile
echo "${PATH}" | awk '{gsub(/:/, "\n"); print}'
echo -e "${PURPLE}${PATH//:/\\n}"  # also works, using the echo command itself

But another way to output each field on a new line is using gsub(/ /,"\n") or gsub(/ : /,"\n") if your fields are separated by a colon. but no spaces surounding the ":".

$ echo "The fox jumped over the dog" | awk '{gsub(/ /,"\n"); print}'
The
fox
jumped
over
the
dog

A simplistic approach, if not so flexible, is to print the newline string wherever you need it, among the other fields. For instance:

  • awk '{print $1"\n"$2,$3,$4}' file

8.2 removing whitespace

awk will put a trailing space on the end of a print, i.e. '{print $2 $3}' To remove that, you can use sub or gsub to change zero or more occurances, +, of either space or tab, [ \t] that occur at the end of the line, $ So /[ \t]+$/

# remember that everything inside the curly braces is an action.
{gsub(/[ \t]+$/,"",$0); print;} 

(not awk related, but while here see:

  • alias taka='tput setaf 5;echo -e "${PURPLE}${PATH//:/\n}"'

Another good example from stackoverflow: stackoverflow.com Here we are looking at lines that contain a comma, and then stripping all blank characters from field 2, and printing fields 1 and 2 with a single space between them.

awk -F, '/,/{gsub(/ /, "", $2); print $1","$2} ' input.txt

Explanation:

 -F,     use comma as field separator 
         (so the thing before the first comma is $1, etc)

/,/      operate only on lines with a comma 
         (this means empty lines and lines without a comma are skipped)

gsub(/a/, b, c)    match the regular expression a, replace it with b, 
                   and do all this with the contents of c, in this case 2nd
		   field.

print $1","$2   print the contents of field 1, a comma, then field 2

input.txt      use input.txt as the source of lines to process

8.3 awk input field seperator $IFS

The usual approach is use -F: or -F,… Can also use $FS=":" or $FS=","

8.4 -f scriptfile

If you collect your awk commands in a separate file, then you include the -f file argument to awk, which then reads the awk commands in that file for every line of input from stdin.

9 Variables

9.1 System Varialbes, FS, RS, NR

Because you would want to specify separators in a script and not rely on the user of a script to remember to call the script with a -F, parameter, you should always use these predefined system variables in an awk script, Usually at the beginning, so in the BEGIN section.

FS = "\t"   # tab for tab seperated values, exactly 1 tab
FS = "\t+"  # tab for tab seperated values, one or more tabs
FS = "[':\t]" # any one of a tic, colon, or tab will be seen as a delimiter
FS = ","    # for comma separated values
FS = "\n"   # Good for multi-lined records, so each line is a field.
RS = ""     # if the above is used, you will need a record separator.
            # If set to "" will be a blank line.  these two are often
            # combined to handle multi-line records.
ORS = "\n"  # the output record separator.Good for multi-lined records,
            # so each line is a field.

OFS = "\n"  # output field separator, set to newline.  Is a space by
            # default.
NR          # number of records
NF          # number of fields
FILENAME    # the name of the current input file
FNR         # the number of the current record relative to the input file
ARGC        # the number of passed parameters
ARGV        # recalls the command line parameters
ENVIRON     # an array containing the shell environment vars and values
IGNORECASE  # to ignore the character case or not

BEGIN {
  FS = ":";
  OFS = "\t";
  }
  {print $1, $NF};

9.2 User defined variables.

Simply start using them with an assignment statement, and recall the values later without using the $ sign:

# Using student records as a sample file source.
# sample data:

# alice 3rd 78
# bob 1st 57
# john 4th 61
# eric 4th 76
# graham 4th 58
# suzanne 2nd 75
# johnson 2nd 43

NR == 1 {
   print "Adjusting 4th year student marks up by 5%";
}

# count the number of 4th year students
$2 == "4th" {
    count++;
    }
# add adjusted mark to all fourth year students
/4th/ {
    markadjust = $3 * 1.05;
    print "4th year student", $1, "  Mark", $3, " Adjusted mark", markadjust;
}

# also print out the name and mark of third year students
/3rd/ {
    print "Third year student", $1, $3;
}

END {
    print "There were " count " 4th year students with adjusted marks"
}

9.3 comparisons

You can compare any variable to a value. The comparison is at the "pattern matching stage". Then any action can be taken based on the comparisons.

  • < less than
  • > greather than
  • <= less than or equal to
  • >= greater than or equal to
  • == equal to
  • != not equal to
  • ~ matches # # used to compare a pattern from a particular field.
  • !~ not matches

So to print the lines only if they have six fields: But also for the very first line only, print the first six fields:

awk 'NR == 1 {print $1 $2 $3 $4 $5 $6}'
awk 'NF == 6 {print $0}'

Or more exacting, with all comments:

# run on hydro electric usage data as awk -f electric-usage.awk  *.cvs
# if you want to run on all your electic data files.

BEGIN {FS = ","; ORS = "\n\n"}   # input is csv, output is double spaced.
# the column headings are on line 2.  Print them.
NR == 2 {print "\n---------------"; print $1, $4, $5, $6}

# only print lines that have six fields. and then only fields 1, 4, 5 & 6
NF == 6 {print $1, $4, $5, $6}

Similar to the above, but with tabbed output:

BEGIN {FS = ","; ORS = "\n" ; print "\n---------------";  }
# input is csv, output is double spaced.
# the column headings start with "Months"
$1 == "Months" {printf "%s\t%s%s\n", $1, $2, $6; print "----------------"}
# {print "\n---------------"; print $1, $2, $6}

# only print lines where field 2 is greater than 15 celsius
# and then, only print the month and have six fields. and then
# only fields 1, 4, 5 & 6
# I could do this:
# $2 > 15 || $1 == "Months" {print $1, $2, $6} but that prints the heading twice
# as the heading is already printed in the script earlier.
# so instead, just
$2 > 15 {printf "%s\t%d\t%d\n", $1, $2, $6} but that prints the heading twice


9.4 boolean operators for comparisons

The typical || for or and && for and boolean operators allow you to combine several comparisons together. For example, if field 3 is < 100 AND the line has less than 8 fields, print the line would be:

NF < 8 && $3 < 100 {print $0}
NF > 2 || $1 ~ /header/ {print $0}

9.5 conditional statement, if

{
    if ($3 < 50) print $1, "failed";
    if ($3 > 95) {
       print $1, "with distiction";
       print scholars += 1;
       };
 }

9.6 while loops

Lets say you want to average out a variable number of numbers on each line. You can do a while loop up to the number of fields on a line:

Input data:

575,56,124,461
271,182,818,457
314,159,265,342,415,711
478,549,241,256
505,121,211,198,217

You awk script could be:

# these actions run from scratch on every line of input
sum = 0;
i = 1;
while (i < (NF + 1))
   {
   sum += $i
   i++
   };
average = sum / NF
print $0, " average -->", average;

9.7 exiting awk after a pattern is seen

You can have awk stop looking at more lines with the {exit} action. So lets say you want to stop reading input when you see the first instance of the string "<footer>". Do this:

awk '/<footer>/ {exit}; {print $0}'

9.8 awk range from pattern to pattern

The desire to do something after a pattern has been seen, then stop doing it when another pattern is seen, is common enough that awk has a built-in construct to do this.

Simply specify the pattern to start an action, then another pattern after which the action will stop, separated by a comma

awk '/ifseenstartaction/,/stopactionafterseeing/'
awk '/startpattern/,/stoppattern/ print $0, "pattern activated";'

10 awk options

-v var=value

11 Awk Numbering and Calculations:

11.1 precede each line by its line number FOR THAT FILE (left alignment).

11.2 Using a tab (\t) instead of space will preserve margins.

awk '{print FNR "\t" $0}' files*

11.3 precede each line by its line number FOR ALL FILES TOGETHER, with tab.

awk '{print NR "\t" $0}' files*

11.4 number each line of a file (number on left, right-aligned)

Double the percent signs if typing from the DOS command prompt.

awk '{printf("%5d : %s\n", NR,$0)}'

11.5 number each line of file, but only print numbers if line is not blank

Remember caveats about Unix treatment of \r (mentioned above)

awk 'NF{$0=++a " :" $0};{print}'
awk '{print (NF? ++a " :" :"") $0}'

11.6 count lines (emulates "wc -l")

awk 'END{print NR}'

11.7 print the sums of the fields of every line

awk '{s=0; for (i=1; i<=NF
                    ; i++) s=s+$i; print s}'
awk '{s=0; for (i=1; i<=NF; i++) s=s+$i; print $0, " ==>  sum:", s}' 

11.8 add all fields in all lines and print the sum

awk '{for (i=1; i<=NF; i++) s=s+$i}; END{print s}'

11.9 print every line after replacing each field with its absolute value

awk '{for (i=1; i<=NF; i++) if ($i < 0) $i = -$i; print }'
awk '{for (i=1; i<=NF; i++) $i = ($i < 0) ? -$i : $i; print }'

11.10 print from nth field to the last field in the line

awk -v n=4 '{ for (i=n; i<=NF; i++) printf "%s%s", $i, (i<NF ? OFS : ORS)}' input

This is taken straight out of stackoverflow: "This will take n as the value of n and loop through that number through the last field NF, for each iteration it will print the current value, if that is not the last value in the line it will print OFS after it (space), if it is the last value on the line it will print ORS after it (newline)."

11.11 print the total number of fields ("words") in all lines

awk '{ total = total + NF }; END {print total}' file
wc -w file

11.12 print the total number of lines that contain "Beth"

awk '/Beth/{n++}; END {print n+0}' file
grep Beth file | wc -l

11.13 print the largest first field and the line that contains it

Intended for finding the longest string in field #1

awk '$1 > max {max=$1; maxline=$0}; END{ print max, maxline}'

11.14 print the number of fields in each line, followed by the line

awk '{ print NF ":" $0 } '

11.15 print the last field of each line

awk '{ print $NF }'

11.16 print the last field of the last line

awk '{ field = $NF }; END{ print field }'

11.17 print every line with more than 4 fields

condition is NF > 4, and there is no action, so defaults to print $0.

awk 'NF > 4'

11.18 print every line where the value of the last field is > 4

condition is $NF > 4, and there is no action, so defaults to print $0.

awk '$NF > 4'

TEXT CONVERSION AND SUBSTITUTION:

11.19 IN UNIX ENVIRONMENT: convert DOS newlines (CR/LF) to Unix format

awk '{sub(/\r$/,"");print}'   # assumes EACH line ends with Ctrl-M

11.20 IN UNIX ENVIRONMENT: convert Unix newlines (LF) to DOS format

awk '{sub(/$/,"\r");print}

11.21 IN DOS ENVIRONMENT: convert Unix newlines (LF) to DOS format

awk 1    ? shouldn't it be awk '1' ?

worth testing in a dos environment, which I do not have.

11.22 IN DOS ENVIRONMENT: convert DOS newlines (CR/LF) to Unix format

11.23 Cannot be done with DOS versions of awk, other than gawk:

gawk -v BINMODE="w" '1' infile >outfile

11.24 Use "tr" instead.

tr -d \r <infile >outfile            # GNU tr version 1.22 or higher

11.25 delete leading whitespace (spaces, tabs) from front of each line

awk '{$1 = $1} {print}'
awk '{$1 = $1}1'  

11.26 aligns all text flush left

awk '{sub(/^[ \t]+/, ""); print}'

11.27 delete trailing whitespace (spaces, tabs) from end of each line

awk '{sub(/[ \t]+$/, "");print}'

11.28 delete BOTH leading and trailing whitespace from each line

awk '{gsub(/^[ \t]+|[ \t]+$/,"");print}'
awk '{$1=$1;print}'           # also removes extra space between fields

11.29 align all text flush right on a 79-column width

awk '{printf "%79s\n", $0}' file*

Probably better to strip trailing white space first, and then print flush right. so:

awk '                     {printf "%79s\n", $0}' file*
awk '{sub(/[ \t]+$/, "")} {printf "%79s\n", $0}' file

11.30 center all text on a 79-character width

awk '                   {l=length();s=int((79-l)/2); printf "%"(s+l)"s\n",$0}' file*
awk '{sub(/[ \t]+$/, "")}{l=length();s=int((79-l)/2); printf "%"(s+l)"s\n",$0}'                                                                  

Again, best to strip trailing blanks first, to get a true centered output.

11.31 insert 5 blank spaces at beginning of each line (make page offset)

awk '{sub(/^/, "     ");print}'

11.32 substitute (find and replace) "foo" with "bar" on each line

 awk '{sub(/foo/,"bar");print}'           # replaces only 1st instance
gawk '{$0=gensub(/foo/,"bar",4);print}'   # replaces only 4th instance
 awk '{gsub(/foo/,"bar");print}'          # replaces ALL instances in a line

11.33 substitute "foo" with "bar" ONLY for lines which contain "baz"

awk '/baz/{gsub(/foo/, "bar")};{print}'  # prints all lines 
awk '/baz/{gsub(/foo/, "bar");print}'    # prints only lines with baz

Be aware that the print statement can be part of an action, or its own action

11.34 substitute "foo" with "bar" EXCEPT for lines which contain "baz"

awk '!/baz/{gsub(/foo/, "bar")};{print}'

11.35 change "scarlet" or "ruby" or "puce" to "red"

awk '{gsub(/scarlet|ruby|puce/, "red"); print}'

11.36 reverse order of lines (emulates "tac")

awk '{a[i++]=$0} END {for (j=i-1; j>=0;) print a[j--] }' file*

11.37 if a line ends with a backslash, append the next line to it

(fails if there are multiple lines ending with backslash…)

awk '/\\$/ {sub(/\\$/,""); getline t; print $0 t; next}; 1' file*

11.38 print and sort the login names of all users

awk -F ":" '{ print $1 | "sort" }' /etc/passwd

11.39 print the first 2 fields, in opposite order, of every line

awk '{print $2, $1}' file

11.40 switch the first 2 fields of every line

awk '{temp = $1; $1 = $2; $2 = temp}' file

Seems to me to be unneccessary, if you simply switch the fields then save the output to a new file. i.e. awk '{print $2, $1}' file > newfile

11.41 print every line, deleting the second field of that line

awk '{ $2 = ""; print }'

11.42 print in reverse order the fields of every line

awk '{for (i=NF; i>0; i--) printf("%s ",i);printf ("\n")}' file
awk '{for (i=NF; i>0; i--) printf("%s ",$i);printf ("\n")}' file

Don't forget that you want to print the field i, not just the number i, so $i

11.43 remove duplicate, consecutive lines (emulates "uniq")

awk 'a !~ $0; {a=$0}'

This example also removes the blank lines, after the first blank line.

11.44 very similarly, print only unique occurances of each line

awk '!x[$0]++'

11.45 x is a array with an index of x $0, which is initialized to 0 awk 'NR < 11'

11.46 print first line of file (emulates "head -1")

awk 'NR>1{exit};1'
# print the last 2 lines of a file (emulates "tail -2")

awk '{y=x "\n" $0; x=$0};END{print y}'

11.47 print the last line of a file (emulates "tail -1")

awk 'END{print}'

11.48 print only lines which match regular expression (emulates "grep")

awk '/regex/'

11.49 print only lines which do NOT match regex (emulates "grep -v")

awk '!/regex/'

11.50 print the line immediately before a regex

(but not the line itself i.e the line containing the regex)

awk '/regex/{print x};{x=$0}'
awk '/regex/{print (x=="" ? "match on line 1" : x)};{x=$0}'

11.51 print the line immediately after a regex,

(but not the line containing the regex)

awk '/regex/{getline;print}'

11.52 grep for AAA and BBB and CCC (in any order)

awk '/AAA/; /BBB/; /CCC/'

We have three condition/action pairs, each of which is missing the action which defaults to print $0

11.53 grep for AAA and BBB and CCC (in that order)

awk '/AAA.*BBB.*CCC/'

11.54 print only lines of 65 characters or longer

awk 'length > 64'

11.55 print only lines of less than 65 characters

awk 'length < 64'

11.56 print section of file from regular expression to end of file

awk '/regex/,0'
awk '/regex/,EOF'

11.57 print section of file based on line numbers (lines 8-12, inclusive)

awk 'NR==8,NR==12'

11.58 print line number 52

awk 'NR==52'
awk 'NR==52 {print;exit}'          # more efficient on large files

11.59 print section of file between two regular expressions (inclusive)

awk '/Iowa/,/Montana/'             # case sensitive

SELECTIVE DELETION OF CERTAIN LINES:

11.60 delete ALL blank lines from a file (same as "grep '.' ")

awk NF
awk '/./'
#

awk 'NR%2{printf "%s ",$0;next;}1' yourFile

This last example above will join every two lines into one line

one
two
three
four

will become

 one, two
three, four

This works as follows. NR%2 evaluates to zero every second line (even line). So when it is NOT zero, i.e. every odd line, it prints the odd line, $0, and immediately goes to the next line. Then the next condition / action is performed which is 1, always true, so always does the defaut print action.

Changing emails on all my org files:

awk  '/^\#\+EMAIL/  {sub(/somedomain DOT ca/, "gmail DOT com");print}'  *.org

tail +1 phones.crd | awk -F"\0" '{print}' >! junk

 awk [-Fc] 'pattern {action}' var=value file(s)
 awk [-Fc] -f scriptfile var=value file(s)
 awk '{print $1}'  file(s)

awk '/search string/'  file(s)

this is the same as:

11.61 grep 'search string' file(s)

awk '/search string/ {print}'  file(s)      

11.62 this and previous are identical

awk '/search string/ {print $1, "this is awk output", $5}'  file(s)
awk '/string/ {printf %d\t%d\t%s\n, $1, $2, $3}' file(s)

awk -F, '{printf  %d\t%d\t%s\n, $1, $2, $3 $4}' file(s)  

11.63 comma delimited fields

awk -F, '{print $1; print $2; print $3; print $4}' file(s)  

11.64 prints fields on separate lines

awk -F, '{print $4}' file      

11.65 prints everything between 4th and 5th comma

awk -F"\t" '{print $4}' file   

11.66 prints everything between 4th and 5th tab

awk '$5 !~ /string/ {print}' file        

11.67 prints all lines where the 5th field

11.68 does NOT contain "string" in it.

awk '{if (NR > 2) print}' file

11.69 prints all records except first two

12 awk nube one-liners

12.1 print every line

# print is the default action
awk '{print}'
awk '{print $0}'  # $0 is the whole line
awk '1'               

see condition-action pairs. for what the '1' above accomplishes

12.2 print the first field of every line

awk '{print "$1"}'

12.3 print only lines which match regular expression (emulates "grep")

awk '/regex/'
awk '/^# /'   # prints lines that start with a hash followed by a space.

12.4 print only lines which do NOT match regex (emulates "grep -v")

awk '!/regex/'

12.5 print every line that contains zintis

awk '/zintis/ {print}'

12.6 on every line that contains zintis, print the 1st and 2nd field

awk '/zintis/ {print "$1, $2"}

with a space between them (comma). Lines that do not match 'zintis' do not get printed at all.

12.7 on every line, remove all characters up to (and including) "Discovered"

awk '{gsub(/^.*Discovered/, "")} print $0}'

12.8 print the first 2 fields, in opposite order, of every line

awk '{print $2, $1}' file

12.9 print every line, deleting the second field of that line

awk '{ $2 = ""; print }'=

12.10 change all occurances of "kg" to "lbs"

awk '{gsub(/kg/, "lbs");print}'=  

12.11 search for ip addresses in file "junk"

awk '/[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3/' junk

13 awk beginner one-liners

13.1 change "scarlet" or "ruby" or "puce" to "red"

=awk '{gsub(//scarlet|ruby|puce//, "red"); print}'=

13.2 substitute "foo" with "bar" EXCEPT for lines which contain "baz"

awk '!/baz/{gsub(//foo//, "bar")};{print}'

13.3 substitute "foo" with "bar" ONLY for lines which contain "baz"

awk '//baz/{gsub(//foo//, "bar")};{print}'

13.4 substitute (find and replace) "foo" with "bar" on each line

awk '{sub(//foo//,"bar");print}'           # replaces only 1st instance
gawk '{$0=gensub(//foo//,"bar",4);print}'  # replaces only 4th instance
awk '{gsub(//foo//,"bar");print}'          # replaces ALL instances in a line

13.5 print the last line of a file (emulates "tail -1")

awk 'END{print}'

13.6 print every line, then add one more extra line at the end

awk '{print;} END {print "Extra line";}' junk.org

13.7 print the number of fields in each line, followed by the line

awk '{ print NF ":" $0 } '

14 awk intermediate one-liners

14.1 print in reverse order the fields of every line

remember that everything within the {} is an action(s)

awk '{for (i=NF; i>0; i--) printf("%s ",$i);printf ("\n")}' file

14.2 remove duplicate, consecutive lines (emulates "uniq")

awk 'a !~ $0; {a=$0}'

This works as follows: the condition is a !~ $0 which evaluates as true if a does not equal $0, i.e. the whole line. i.e. a fresh line. If true, then execute the action which set a to be the whole line {a=$0}

The condition is NOT met whenever we have a duplicate line, so no action is done, i.e. the line is NOT printed.

The only other quirk to remember is that in addition executing the action in the curly braces and setting a to be the whole line, awk ALSO PRINTS the whole line. That comes from point 2) below: the implied action is print.

Useful to review:

  1. A condition is considered false if it evaluates to zero or the empty string, anything else is true (uninitialized variables are zero or empty string, depending on context, so they are false).
  2. Either a condition or an action can be implied, true & print; respectively
  3. Braces without a condition are considered to have a true condition and are always executed if they are hit
  4. Any condition without an action will print the line if and only if the condition is met i.e. NOT ZERO OR NULL

14.3 switch the first 2 fields of every line

awk '{temp = $1; $1 = $2; $2 = temp}' file

14.4 IN UNIX ENVIRONMENT: convert Unix newlines (LF) to DOS format

awk '{sub(/$/,"\r");print}

14.5 count number of rows (NR) a.k.a. lines (emulates "wc -l")

awk 'END{print NR}'

14.6 print the number of fields (NF) in each line, followed by the line

awk '{ print NF ":" $0 } '

14.7 print the last field of each line

awk '{ print $NF }'

14.8 print the last field of the last line

awk '{ field = $NF }; END{ print field }'

14.9 print every line with more than 4 fields

awk 'NF > 4'

14.10 print every line where the value of the last field is > 4

awk '$NF > 4'

HANDY ONE-LINERS FOR AWK 22 July 2003 compiled by Eric Pement <pemente@northpark.edu> version 0.22 Latest version of this file is usually at: http://www.student.northpark.edu/pemente/awk/awk1line.txt

USAGE:

awk 'pattern {print "$1"}'= # standard Unix shells

FILE SPACING:

14.11 double space a file

awk '1;{print ""}'
awk 'BEGIN{ORS="\n\n"};1'

These both accomplish the same thing.

14.12 double space a file which already has blank lines in it.

Output file should contain no more than one blank line between lines of text. NOTE: On Unix systems, DOS lines which have only CRLF (\r\n) are often treated as non-blank, and thus 'NF' alone will return TRUE.

awk 'NF{print $0 "\n"}'

Lines that are already blank, will have NF = 0 which evaluates as false. So if the condition is false, the action will NOT be done, so not printing the blank line with another \n . Lines that are NOT blank evaluate NF as 1 so true, and are printed with an extra \n at the end, i.e. double spaced.

14.13 triple space a file

awk '1;{print "\n"}'

15 Some more past examples

touch `awk '{print $1"-confg" }' /home/Netman1/zintis/expect/c.pwd ` ==============================================================

awk -F: '{printf "%s\t%s\t%s\t%s\t%s\t%s\n", $1, $2, $3, $4, $5, $6}' newsspool

==============================================================

touch `awk '{print $1"-confg" }' /home/Netman1/zintis/expect/c.pwd `
   awk -F: '{printf "%s\t%s\t%s\t%s\t%s\t%s\n", $1, $2, $3, $4, $5, $6}' newsspool

To print only lines from snp.txt where the second column is unique, try:

awk 'a !~ $2; {a=$2}' snp.txt

Print lines where column 2 < 10

awk -F, '$2<10' file

conversely print lines where column 2 is greater than or equl to 10

awk -F, '!($2<10)' file

The following awk script prints only duplicated lines in a file, and these lines only once. The line 'deux' comes three times, but will only be output once.

awk 'seen[$0]++ == 1' <filename>

Sample input file:

eins
deux
drei
quattro
deux
sechs
drei
acht
huit
neuf
six
deux

The output:

deux
drei
awk '!x[$0]++'

removes the duplicate lines, as explained below:

  • x is a array and it's initialized to 0.
  • the index of x is $0,
  • if $0 is first time meet,then plus 1 to the value of x[$0],
  • =x[$0] now is 1.
  • As + here is "suffix +", 0 is returned and then be added.
  • So !x[$0] is true, the $0 is printed by default.
  • if $0 appears more than once, ! x[$0] will be false so *won't print $0.

In this awk script sorting is not necessary. All it does is create an (associative) array element with the entire line as the index without a value (or 0 is you will).

The exclamation mark negates that value so the outcome is 1 (true). The value of 1 in awk means perform the default action which is {print $0} so the entire line gets printed.

Afterwards the ++ comes into action and 1 is added to the array value, which now becomes 1. So the next time the same line is encountered the value returned by the array is 1 which is then negated to 0 by the exclamation mark, so nothing will get printed.

16 awk to split file into individual words

echo "The fox jumped over the dog" | awk '{gsub(/ /,"\n"); print}'

will produce this output:

The
fox
jumped
over
the
dog

So, breaking up a file into individual words, one per line would be:

awk '{gsub(/ /,"\n"); print}' filewithwords

Of course you could also just do it with echo.

17 awk to add a line after every line that matches character

Say you want to add a line before every line that starts with #+AUTHOR

awk '/^\#\+AUTHOR/ {print "this is an extra line"}; {print};'     *.org

The first statement in between {} prints the extra line. The second statement in between the {} prints the line from the file. The second statement runs every time. The first statement only runs if the line matches #+AUTHOR

18 Editing in place with awk

It turns out that any awk command takes a stdin and outputs to stdout. But the original file remains unchanged. To make editing changes to the original file, you have to use a temporary file and bash to then replace the original file with what awk output. - Be careful here, as you will clobber the original file, everytime!

        $ awk '{print $0}' file > tmp && mv tmp file
        # another real-world example

    for orgfile in *.org
           do
               cp ${orgfile} ${orgfile}.bak
               awk '/^\#\+AUTHOR/{print "#+HTML_HEAD_EXTRA:<style>body {background-color: #222; color: lightgreen; } h3, h4 {color: teal} h1, h2 {color: #08f}  pre { background-color: #111; color: #dcc; } code { background-color: #444; color: #ff0; } </style>"};{print};' ${orgfile} > tmp && mv tmp ${orgfile};
           done

0
    for codef in  *.org
           do
               awk '{gsub(/#+HTML_HEAD_EXTRA:<style>body {background-color: #222; color: lightgreen; } h3, h4 {color: teal} h1, h2 {color: #08f}  pre { background-color: #111; color: #dcc; } code { background-color: #444; color: #ff0; } </style>/,"#+HTML_HEAD_EXTRA:<style>body {background-color:\
                      #222; color: lightgreen; } h3, h4 {color: teal} h1, h2 \
                     {color: #08f}  pre { background-color: #111; color: #dcc; } \
                     code { background-color: #444; color: #ff0; } </style>"); print}' \
                     ${codef} > tmp && mv tmp ${codef};
           done


So I would rather make a backup first:

for orgfile in *.org
do
    cp ${orgfile} ${orgfile}.bak
    awk '/^\#\+AUTHOR/{print "this is an extra line"};{print};' ${orgfile} > tmp && mv tmp ${orgfile}
done

#+BEGINSRC bash '/^\#\+AUTHOR/{print "this is an extra line"};{print};' ${orgfile}

cp ${orgfile} ${orgfile}.bak; awk '/^\#\+AUTHOR/{print "this is an extra line"};{print};' ${orgfile} > tmp && mv tmp ${orgfile}; done

for i in $(ls *.bak); do mv ${i} $(basename $i .org) done #+ENDSRCk

18.1 moving backup files back to their original names:

for backfile in *.bak; do
    mv ${backfile} $(basename ${orgfile})
done

18.2 Home

19 awk cheat from 1996

  • tail +1 phones.crd | awk -F"\0" '{print}' >! junk

- awk [-Fc] '/pattern/ {action}' var=value file(s) - awk [-Fc] -f scriptfile var=value file(s)

pattern is a string surrounded by two / or preceeded by ! for negation. pattern is optional and when ommitted, matches every line.

awk '{print $1}'  file(s)=
awk '//search string//'  file(s)=  

19.1 awk like grep

The following line is the same as: grep 'search string' file(s)

awk '//search string// {print}'  file(s)

And since the default action is {print} you could simplify that further with

awk '/search string/' files

19.2 awk like cat

Since the default action is {print}, and if you omit the search string, you might think that awk file is the same as cat file, however awk is still expecting something in the single tics, ' ' as the second parameter, so you must include the '{print}' here. So cat file is the same as:

awk '{print}' file(s)

19.3 selective fields

You can output only selected fields if you wish, with $1 being the first field. And $NF being the last field on a line.

So printing the first, and fifth fields of every line:

awk '{print $1, " is first field and 5th field is ", $5}'  file(s)

And to print the above on only lines that contain search string you can

awk '/search string/ {print $1, " is first field and 5th field is ", $5}'

19.3.1 Formatted Strings

To format the output a bit, you use %d for decimal digits, %s for strings like this:

awk '/string/ {printf %d\t%d\t%s\n, $1, $2, $3}' file(s)

Or dealing with comma delimited fields into some table:

awk -F, '{printf  %d\t%d\t%s\n, $1, $2, $3 $4}' file(s)

comma delimited fields, where

  • first field is a decimal digit
  • second field is a decimal digit
  • third field is a string

print comma delimited fields on separate lines

awk -F, '{print $1; print $2; print $3; print $4}' file(s)=

print everything between 4th and 5th comma

awk -F, '{print $4}' file=

print the fourth field from "last" command output that has "system boot" and pulls out just the unique values (i.e. skips all repeated lines) good to show all kernel versions ever run on your system.

  • last | grep "system boot" | awk '{print $4}' | uniq

print everything between 4th and 5th tab

awk -F"\t" '{print $4}' file=

print all lines where the 5th field does NOT contain "string" in it.

awk '$5 !~//string// {print}' file       # is this a typo?  is ~ extraneous?

print all records except first two (NR is number of records, a.k.a. lines)

awk '{if (NR > 2) print}' file

print all records that have more than two fields (NF is number of fields)

#+BEGIN_EXAMPLE
   awk '{if (NF > 2) print}' file=

,#+BEGINSRC bash touch `awk '{print $1"-confg" }' /home/Netman1/zintis/expect/c.pwd ` ============================================================== awk -F: '{printf "%s\t%s\t%s\t%s\t%s\n", $1, $2, $3, $4, $5}' newsspool -calc | psf -w -H " inode Calculations for newsspools" | lpr #+ENDSRC

That's all I had on awk back in 1997

20 cut on newer Linuxes

cut is like an awk light.

  • man cut
  • cut -f 3 file.txt
  • cut -d " " -f 3 file.txt
  • cut -f 3 file.txt | cut -c 2
  • cut -c 15-20 file.txt
  • cut -d ',' -f 3 file.txt
  • \cut -d ':' -f 3 file.txt #+ENDEXAMPLE

21 Building up an awk script (an example)

Let's say you have a file of temperature data in both Celsius and Fahrenheit scales. You want to convert all of the data to Celsius.

Here is the data file, called temperature.data (top 10 entries)

Temperature  Scale
39 F
102 F
45 F
6 C
9 C
2 C

Print just the second field:

awk '{print $2}'  temperature.data

Output would be:

Scale
F
F
F
C
C
C

Print just the whole line if the second field was "F":

awk '$2 == "F" {print  $0}'  temperature.data

Gives you:

39 F
102 F
45 F
awk '{print ($2 == "F" ? ($1 - 32)/1.8 : $1) " Celsius"}'  temperature.data

Gives you:

Temperature Celsius
3.88889 Celsius
38.8889 Celsius
7.22222 Celsius
6 Celsius
9 Celsius
2 Celsius

Breaking this down:

  • will print every line as there is no pattern to match, just a print action
  • within the print action there is a condition $2 == "F"
  • If the condition is met, i.e. the second field is an "F", then print the first field, $1, less 32 divided by 1.8, else just straight $1 (after the :)
  • print "Celsius"

But that messes up the first record which is the heading "Temperature Unit", to become "Temperature Celsius" Obvious not what we intended. So,…. We want to keep as is, so don't change when the record number is 1, NR=1.

First we can simply put the condition that the record number be greater than 1:

awk 'NR>1{print ($2 == "F" ? ($1 - 32)/1.8 : $1) " Celsius"}'  temperature.data

Results in:

3.88889 Celsius
38.8889 Celsius
7.22222 Celsius
6 Celsius
9 Celsius
2 Celsius

Which simply removes the heading line altogether. So, a better approach is to let the default awk action (print the whole line) execute when NR==1, and then for all lines NR>1 do the math converstion for lines where $2 is "F"

awk 'NR==1; NR>1{print ($2 == "F" ? ($1 - 32)/1.8 : $1) " Celsius"}'  temperature.data

Results in:

Temperature  Scale
3.88889 Celsius
38.8889 Celsius
7.22222 Celsius
6 Celsius
9 Celsius
2 Celsius

To make it comletely obvious, you can print only the first line like this:

awk 'NR==1'  temperature.data

Gives you:

Temperature  Scale

And see what happens when a condition is met by multiple "lines" separated by ";"

awk 'NR==1;NR<4'  temperature.data

Gives you:

Temperature  Scale
Temperature  Scale
39 F
102 F

So you can see that for the first record, NR==1 is met so the line is printed. But the second statement is also met, NR<4, is the line is printed again. For subsequent lines, only NR<4 is met, so lines 2, and 3 are printed.

Adding a string to the end of each line:

awk '{ gsub(/$/, "  EOF");print}' temperature.data

Results in:

Temperature  Scale  EOF
39 F  EOF
102 F  EOF
45 F  EOF
6 C  EOF
9 C  EOF
2 C  EOF   

Combining with the temperature conversion:

awk 'gsub(/$/, "  EOF");NR==1;NR>1{print ($2 == "F" ? ($1 - 32)/1.8 : $1) " Celsius"}'  temperature.data

Gives us not quite what we wanted, but illustrates how awk runs multiple actions on each line:

Temperature  Scale  EOF
Temperature  Scale  EOF
39 F  EOF
3.88889 Celsius
102 F  EOF
38.8889 Celsius
45 F  EOF
7.22222 Celsius
6 C  EOF
6 Celsius
9 C  EOF
9 Celsius
2 C  EOF
2 Celsius

A better approach here would be just to add it to the end of "Celsius"

awk 'NR==1;NR>1{print ($2 == "F" ? ($1 - 32)/1.8 : $1) " Celsius  EOF"}'  temperature.data
Temperature  Scale
3.88889 Celsius  EOF
38.8889 Celsius  EOF
7.22222 Celsius  EOF
6 Celsius  EOF
9 Celsius  EOF
2 Celsius  EOF

21.0.1 Using awk formated print

The printf statement looks like: printf format, item1, item2, ... The format definition is a single string that includes format definitions for each of the items that follow, and any other fixed strings you want to add

From the example before that gave us lines like 38.8889 Celsius

# no format strings here
awk 'NR==1; NR>1{print ($2 == "F" ? ($1 - 32)/1.8 : $1) " Celsius"}'  temperature.data

We can try printf("%.1f %c\n",...) where in this case we have two format statements, and two objects to print, separated by commas, these being:

  1. ($2=="F" ? ($1-32) / 1.8 : $1)
  2. "Celsius"
# format string is "%.1f %c\n"
awk 'NR==1; NR>1{printf("%.1f %s\n",($2=="F" ? ($1-32) / 1.8 : $1),"Celsius")}'  temperature.data
# format string is "%.1f degrees %c\n"                                                                
awk 'NR==1; NR>1{printf("%.1f degrees %c\n",($2=="F" ? ($1-32) / 1.8 : $1),"Celsius")}'  temperature.data
# format string is "%.1f degrees Celsius\n"                                                                   
awk 'NR==1; NR>1{printf("%.1f degrees Celsius\n",($2=="F" ? ($1-32) / 1.8 : $1))}'  temperature.data

Breaking down the first example better printf format, item1, item2 where in our case:

  • format is "%.1f %s\n"
  • item1 is either ($1-32) / 1.8 or $1 according to if $2 is an "F"
  • item2 is "Celsius"
    Temperature  Scale
     3.9 Celsius
     38.9 Celsius
     7.2 Celsius
     6.0 Celsius
    
    Temperature  Scale
    3.9 degrees C
    38.9 degrees C
    7.2 degrees C
    6.0 degrees C
    9.0 degrees C
    

Notice that you are not seeing Celsius but C. That is because only one character from the string is allowed, with %c. %s would give you the whole string

# format string is "%.1f %s\n"
awk 'NR==1; NR>1{printf("%.1f %s\n",($2=="F" ? ($1-32) / 1.8 : $1),"Celsius")}'  temperature.data
Temperature  Scale
3.9 Celsius
38.9 Celsius
7.2 Celsius
6.0 Celsius
9.0 Celsius
2.0 Celsius

21.0.2 Extra strings in the format definition

You can include any strings you want right in the format portion. For example we can add the string " degrees" in the format statement as follows:

As shown in an example above, a format string could be =

# format string is  "%.1f degrees %s\n"
awk 'NR==1; NR>1{printf("%.1f %s\n",($2=="F" ? ($1-32) / 1.8 : $1),"Celsius")}'  temperature.data

22 awk arrays

22.1 Features of arrays

In awk they are one dimensional arrays for storing groups of related strings. There is no need to declare the size of the array ahead of time

awk arrays are associative, meaning that the index need NOT be a number, but any string. In that sense they are more like python dicitonaries, where each entry is a pair. an index and the corresponding array element value.

The awk manual uses this array example:

Element "dog" Value "chien"
Element "cat" Value "chat"
Element "one" Value "un"
Element "1" Value "un"
Element "not yet" Value "pas encore"

You can recall an array element using square brackets on the index, for example french["not yet"] would return "pas encore".

A reference to an array element that does not exist automatically creates it.

22.1.1 check if element exists

Remember that awk automatically creates any element that does not exist so to check if one exists you must use this syntax: indx in array so for example

"dog" in french

22.2 arrays in the awk manual

I have barely scratched the surface here. Best to read gawk manual #arrays