< Day Day Up > |
6.13. Variables6.13.1 Numeric and String ConstantsNumeric constants can be represented as integers, such as 243; floating-point numbers, such as 3.14; or numbers using scientific notation, such as .723E–1 or 3.4e7. Strings, such as Hello world, are enclosed in double quotes. Initialization and Type CoercionJust mentioning a variable in your awk program causes it to exist. A variable can be a string, a number, or both. When it is set, it becomes the type of the expression on the right-hand side of the equal sign. Uninitialized variables have the value zero or the value " ", depending on the context in which they are used. name = "Nancy" name is a string x++ x is a number; x is initialized to zero and incremented by 1 number = 35 number is a number To coerce a string to be a number: name + 0 To coerce a number to be a string: number " " All fields and array elements created by the split function are considered strings, unless they contain only a numeric value. If a field or array element is null, it has the string value of null. An empty line is also considered to be a null string. 6.13.2 User-Defined VariablesUser-defined variables consist of letters, digits, and underscores, and cannot begin with a digit. Variables in awk are not declared. Awk infers data type by the context of the variable in the expression. If the variable is not initialized, awk initializes string variables to null and numeric variables to zero. If necessary, awk will convert a string variable to a numeric variable, and vice versa. Variables are assigned values with awk's assignment operators. See Table 6.11.
The simplest assignment takes the result of an expression and assigns it to a variable. FORMAT variable = expression Example 6.86.
% nawk '$1 ~ /Tom/ {wage = $2 * $3; print wage}' filename
EXPLANATION Awk will scan the first field for Tom and when there is a match, it will multiply the value of the second field by the value of the third field and assign the result to the user-defined variable wage. Because the multiplication operation is arithmetic, awk assigns wage an initial value of zero. (The % is the shell prompt and filename is an input file.) Increment and Decrement OperatorsTo add one to an operand, the increment operator is used. The expression x++ is equivalent to x = x + 1. Similarly, the decrement operator subtracts one from its operand. The expression x–– is equivalent to x = x – 1. This notation is useful in looping operations when you simply want to increment or decrement a counter. You can use the increment and decrement operators either preceding the operator, as in ++x, or after the operator, as in x++. If these expressions are used in assignment statements, their placement will make a difference in the result of the operation. {x = 1; y = x++ ; print x, y} The ++ here is called a post-increment operator; y is assigned the value of 1, and then x is increased by 1, so that when all is said and done, y will equal 1, and x will equal 2. {x = 1; y = ++x; print x, y} The ++ here is called a pre-increment operator; x is incremented first, and the value of two is assigned to y, so that when this statement is finished, y will equal 2, and x will equal 2. User-Defined Variables at the Command LineA variable can be assigned a value at the command line and passed into an awk script. For more on processing arguments and ARGV, see "Processing Command Arguments (nawk)" on page 239. Example 6.87.
nawk –F: –f awkscript month=4 year=2004 filename
EXPLANATION The user-defined variables month and year are assigned the values 4 and 2004, respectively. In the awk script, these variables may be used as though they were created in the script. Note: If filename precedes the arguments, the variables will not be available in the BEGIN statements. (See "BEGIN Patterns Patterns" on page 208.) The –v Option (nawk)The –v option provided by nawk allows command-line arguments to be processed within a BEGIN statement. For each argument passed at the command line, there must be a –v option preceding it. Field VariablesField variables can be used like user-defined variables, except they reference fields. New fields can be created by assignment. A field value that is referenced and has no value will be assigned the null string. If a field value is changed, the $0 variable is recomputed using the current value of OFS as a field separator. The number of fields allowed is usually limited to 100. Example 6.88.
% nawk ' { $5 = 1000 * $3 / $2; print } ' filename
EXPLANATION If $5 does not exist, nawk will create it and assign the result of the expression 1000 * $3 / $2 to the fifth field ($5). If the fifth field exists, the result will be assigned to it, overwriting what is there. Example 6.89.
% nawk ' $4 == "CA" { $4 = "California"; print}' filename
EXPLANATION If the fourth field ($4) is equal to the string CA, nawk will reassign the fourth field to California. The double quotes are essential. Without them, the strings become user-defined variables with an initial value of null. Built-In VariablesBuilt-in variables have uppercase names. They can be used in expressions and can be reset. See Table 6.12 for a list of built-in variables. Example 6.90.(The Employees Database) % cat employees2 Tom Jones:4423:5/12/66:543354 Mary Adams:5346:11/4/63:28765 Sally Chang:1654:7/22/54:650000 Mary Black:1683:9/23/44:336500 (The Command Line) % nawk –F: '$1 == "Mary Adams"{print NR, $1, $2, $NF}' employees2 (The Output) 2 Mary Adams 5346 28765
EXPLANATION The –F option sets the field separator to a colon. The print function prints the record number, the first field, the second field, and the last field ($NF). Example 6.91.(The Employees Database) % cat employees2 Tom Jones:4423:5/12/66:543354 Mary Adams:5346:11/4/63:28765 Sally Chang:1654:7/22/54:650000 Mary Black:1683:9/23/44:336500 (The Command Line) % gawk –F: '{IGNORECASE=1}; \ $1 == "mary adams"{print NR, $1, $2,$NF}' employees2 (The Output) 2 Mary Adams 5346 28765 EXPLANATION The –F option sets the field separator to a colon. The gawk built-in variable, IGNORECASE, when set to a nonzero value, turns off gawk's case-sensitivity when doing case-sensitive string and regular expression operations. The string mary adams will be matched, even though in the input file, her name is spelled Mary Adams. The print function prints the record number, the first field, the second field, and the last field ($NF). 6.13.3 BEGIN PatternsThe BEGIN pattern is followed by an action block that is executed before awk processes any lines from the input file. In fact, a BEGIN block can be tested without any input file, becuase awk does not start reading input until the BEGIN action block has completed. The BEGIN action is often used to change the value of the built-in variables, OFS, RS, FS, and so forth, to assign initial values to user-defined variables and to print headers or titles as part of the output. Example 6.92.
% nawk 'BEGIN{FS=":"; OFS="\t"; ORS="\n\n"}{print $1,$2,$3}' file
EXPLANATION Before the input file is processed, the field separator (FS) is set to a colon, the output field separator (OFS) to a tab, and the output record separator (ORS) to two newlines. If there are two or more statements in the action block, they should be separated with semicolons or placed on separate lines (use a backslash to escape the newline character if at the shell prompt). Example 6.93.% nawk 'BEGIN{print "MAKE YEAR"}' MAKE YEAR EXPLANATION Awk will display MAKE YEAR. The print function is executed before awk opens the input file, and even though the input file has not been assigned, awk will still print MAKE and YEAR. When debugging awk scripts, you can test the BEGIN block actions before writing the rest of the program. 6.13.4 END PatternsEND patterns do not match any input lines, but execute any actions that are associated with the END pattern. END patterns are handled after all lines of input have been processed. Example 6.94.% nawk 'END{print "The number of records is " NR }' filename The number of records is 4 EXPLANATION The END block is executed after awk has finished processing the file. The value of NR is the number of the last record read. Example 6.95.% nawk '/Mary/{count++}END{print "Mary was found " count " times."}' employees Mary was found 2 times. EXPLANATION For every input line from the file employees containing the pattern Mary, the user-defined variable, count, is incremented by 1. When all input lines have been read, the END block is executed to display the string Mary was found 2 times containing the final value of count. |
< Day Day Up > |