Previous Section  < Day Day Up >  Next Section

6.13. Variables

6.13.1 Numeric and String Constants

Numeric constants can be represented as integers, such as 243; floating-point numbers, such as 3.14; or numbers using scientific notation, such as .723E–1 or 3.4e7. Strings, such as Hello world, are enclosed in double quotes.

Initialization and Type Coercion

Just mentioning a variable in your awk program causes it to exist. A variable can be a string, a number, or both. When it is set, it becomes the type of the expression on the right-hand side of the equal sign.

Uninitialized variables have the value zero or the value " ", depending on the context in which they are used.


name = "Nancy"   name is a string



x++              x is a number;

                 x is initialized to zero and incremented by 1



number = 35       number is a number


To coerce a string to be a number:


name + 0


To coerce a number to be a string:


number " "


All fields and array elements created by the split function are considered strings, unless they contain only a numeric value. If a field or array element is null, it has the string value of null. An empty line is also considered to be a null string.

6.13.2 User-Defined Variables

User-defined variables consist of letters, digits, and underscores, and cannot begin with a digit. Variables in awk are not declared. Awk infers data type by the context of the variable in the expression. If the variable is not initialized, awk initializes string variables to null and numeric variables to zero. If necessary, awk will convert a string variable to a numeric variable, and vice versa. Variables are assigned values with awk's assignment operators. See Table 6.11.

Table 6.11. Assignment Operators

Operator

Meaning

Equivalence

=

a = 5

a = 5

+=

a = a + 5

a += 5

–=

a = a – 5

a –= 5

*=

a = a * 5

a *= 5

/=

a = a / 5

a /= 5

%=

a = a % 5

a %= 5

^=

a = a ^ 5

a ^= 5


The simplest assignment takes the result of an expression and assigns it to a variable.

FORMAT


variable = expression


Example 6.86.

% nawk '$1 ~  /Tom/ {wage = $2 * $3; print wage}'  filename


EXPLANATION

Awk will scan the first field for Tom and when there is a match, it will multiply the value of the second field by the value of the third field and assign the result to the user-defined variable wage. Because the multiplication operation is arithmetic, awk assigns wage an initial value of zero. (The % is the shell prompt and filename is an input file.)

Increment and Decrement Operators

To add one to an operand, the increment operator is used. The expression x++ is equivalent to x = x + 1. Similarly, the decrement operator subtracts one from its operand. The expression x–– is equivalent to x = x – 1. This notation is useful in looping operations when you simply want to increment or decrement a counter. You can use the increment and decrement operators either preceding the operator, as in ++x, or after the operator, as in x++. If these expressions are used in assignment statements, their placement will make a difference in the result of the operation.


{x = 1;  y = x++ ; print x, y}


The ++ here is called a post-increment operator; y is assigned the value of 1, and then x is increased by 1, so that when all is said and done, y will equal 1, and x will equal 2.


{x = 1; y = ++x;  print x, y}


The ++ here is called a pre-increment operator; x is incremented first, and the value of two is assigned to y, so that when this statement is finished, y will equal 2, and x will equal 2.

User-Defined Variables at the Command Line

A variable can be assigned a value at the command line and passed into an awk script. For more on processing arguments and ARGV, see "Processing Command Arguments (nawk)" on page 239.

Example 6.87.

nawk –F: –f awkscript    month=4  year=2004 filename


EXPLANATION

The user-defined variables month and year are assigned the values 4 and 2004, respectively. In the awk script, these variables may be used as though they were created in the script. Note: If filename precedes the arguments, the variables will not be available in the BEGIN statements. (See "BEGIN Patterns Patterns" on page 208.)

The –v Option (nawk)

The –v option provided by nawk allows command-line arguments to be processed within a BEGIN statement. For each argument passed at the command line, there must be a –v option preceding it.

Field Variables

Field variables can be used like user-defined variables, except they reference fields. New fields can be created by assignment. A field value that is referenced and has no value will be assigned the null string. If a field value is changed, the $0 variable is recomputed using the current value of OFS as a field separator. The number of fields allowed is usually limited to 100.

Example 6.88.

% nawk ' { $5 = 1000 * $3 / $2;  print } '  filename


EXPLANATION

If $5 does not exist, nawk will create it and assign the result of the expression 1000 * $3 / $2 to the fifth field ($5). If the fifth field exists, the result will be assigned to it, overwriting what is there.

Example 6.89.

% nawk ' $4 == "CA" { $4  = "California"; print}'  filename


EXPLANATION

If the fourth field ($4) is equal to the string CA, nawk will reassign the fourth field to California. The double quotes are essential. Without them, the strings become user-defined variables with an initial value of null.

Built-In Variables

Built-in variables have uppercase names. They can be used in expressions and can be reset. See Table 6.12 for a list of built-in variables.

Example 6.90.

(The Employees Database)

% cat employees2

Tom Jones:4423:5/12/66:543354

Mary Adams:5346:11/4/63:28765

Sally Chang:1654:7/22/54:650000

Mary Black:1683:9/23/44:336500



(The Command Line)

% nawk  –F:  '$1 == "Mary Adams"{print NR, $1, $2, $NF}' employees2



(The Output)

2  Mary Adams 5346  28765


Table 6.12. Built-In Variables

Variable Name

Contents

ARGC

Number of command-line argument

ARGIND

Index in ARGV of the current file being processed from the command line (gawk only)

ARGV

Array of command-line arguments

CONVFMT

Conversion format for numbers, %.6g, by default (gawk only)

ENVIRON

An array containing the values of the current environment variables passed in from the shell

ERRNO

Contains a string describing a system error occurring from redirection when reading from the getline function or when using the close function (gawk only)

FIELDWIDTHS

A whitespace-separated list of fieldwidths used instead of FS when splitting records of fixed fieldwidth (gawk only)

FILENAME

Name of current input file

FNR

Record number in current file

FS

The input field separator, by default a space

IGNORECASE

Turns off case sensitivity in regular expressions and string operations (gawk only)

NF

Number of fields in current record

NR

Number of records so far

OFMT

Output format for numbers

OFS

Output field separator

ORS

Output record separator

RLENGTH

Length of string matched by match function

RS

Input record separator

RSTART

Offset of string matched by match function

RT

The record terminator; gawk sets it to the input text that matched the character or regex specified by RS

SUBSEP

Subscript separator


EXPLANATION

The –F option sets the field separator to a colon. The print function prints the record number, the first field, the second field, and the last field ($NF).

Example 6.91.

(The Employees Database)

% cat employees2

    Tom Jones:4423:5/12/66:543354

    Mary Adams:5346:11/4/63:28765

    Sally Chang:1654:7/22/54:650000

    Mary Black:1683:9/23/44:336500



(The Command Line)

% gawk  –F:  '{IGNORECASE=1}; \

  $1 == "mary adams"{print NR, $1, $2,$NF}' employees2





(The Output)

     2  Mary Adams 5346  28765


EXPLANATION

The –F option sets the field separator to a colon. The gawk built-in variable, IGNORECASE, when set to a nonzero value, turns off gawk's case-sensitivity when doing case-sensitive string and regular expression operations. The string mary adams will be matched, even though in the input file, her name is spelled Mary Adams. The print function prints the record number, the first field, the second field, and the last field ($NF).

6.13.3 BEGIN Patterns

The BEGIN pattern is followed by an action block that is executed before awk processes any lines from the input file. In fact, a BEGIN block can be tested without any input file, becuase awk does not start reading input until the BEGIN action block has completed. The BEGIN action is often used to change the value of the built-in variables, OFS, RS, FS, and so forth, to assign initial values to user-defined variables and to print headers or titles as part of the output.

Example 6.92.

% nawk 'BEGIN{FS=":"; OFS="\t"; ORS="\n\n"}{print $1,$2,$3}' file


EXPLANATION

Before the input file is processed, the field separator (FS) is set to a colon, the output field separator (OFS) to a tab, and the output record separator (ORS) to two newlines. If there are two or more statements in the action block, they should be separated with semicolons or placed on separate lines (use a backslash to escape the newline character if at the shell prompt).

Example 6.93.

% nawk 'BEGIN{print "MAKE YEAR"}'

MAKE YEAR


EXPLANATION

Awk will display MAKE YEAR. The print function is executed before awk opens the input file, and even though the input file has not been assigned, awk will still print MAKE and YEAR. When debugging awk scripts, you can test the BEGIN block actions before writing the rest of the program.

6.13.4 END Patterns

END patterns do not match any input lines, but execute any actions that are associated with the END pattern. END patterns are handled after all lines of input have been processed.

Example 6.94.

% nawk 'END{print "The number of records is " NR }' filename

The number of records is 4


EXPLANATION

The END block is executed after awk has finished processing the file. The value of NR is the number of the last record read.

Example 6.95.

% nawk '/Mary/{count++}END{print "Mary was found " count " times."}' employees

Mary was found 2 times.


EXPLANATION

For every input line from the file employees containing the pattern Mary, the user-defined variable, count, is incremented by 1. When all input lines have been read, the END block is executed to display the string Mary was found 2 times containing the final value of count.

    Previous Section  < Day Day Up >  Next Section