< Day Day Up > |
6.26. awk Built-In FunctionsUnless otherwise noted, the examples in this section use the following datafile, repeated periodically for your convenience.
6.26.1 String FunctionsExample 6.173.% nawk 'NR==1{gsub(/northwest/,"southeast", $1) ;print}' datafile southeast NW Joel Craig 3.0 .98 3 4 EXPLANATION If this is the first record (NR == 1), globally substitute the regular expression northwest with southeast, if northwest is found in the first field. Example 6.174.% nawk 'NR==1{print substr($3, 1, 3)}' datafile Joe EXPLANATION If this is the first record, display the substring of the third field, starting at the first character, and extracting a length of 3 characters. The substring Joe is printed. Example 6.175.% nawk 'NR==1{print length($1)}' datafile 9 EXPLANATION If this is the first record, the length (number of characters) in the first field is printed. Example 6.176.% nawk 'NR==1{print index($1,"west")}' datafile 6 EXPLANATION If this is the first record, print the first position where the substring west is found in the first field. The string west starts at the sixth position (index) in the string northwest.
Example 6.177.% nawk '{if(match($1,/^no/)){print substr($1,RSTART,RLENGTH)}}' datafile no no no EXPLANATION If the match function finds the regular expression /^no/ in the first field, the index position of the leftmost character is returned. The built-in variable RSTART is set to the index position and the RLENGTH variable is set to the length of the matched substring. The substr function returns the string in the first field starting at position RSTART, RLENGTH number of characters. Example 6.178.% nawk 'BEGIN{split("10/14/04",now,"/");print now[1],now[2],now[3]}' 10 14 04 EXPLANATION The string 10/14/04 is split into an array called now. The delimiter is the forward slash. The elements of the array are printed, starting at the first element of the array. The following datafile2 is used for Example 6.179.
Example 6.179.% nawk -F: '/north/{split($1, name, " ");\ print "First name: "name[1];\ print "Last name: " name[2];\ print "\n--------------------"}' datafile2 First name: Joel Last name: Craig -------------------- First name: TJ Last name: Nichols -------------------- First name: Val last name: Shultz -------------------- EXPLANATION The input field separator is set to a colon (–F:). If the record contains the regular expression north, the first field is split into an array called name, where a space is the delimiter. The elements of the array are printed.
Example 6.180.% nawk '{line=sprintf("%10.2f%5s\n",$7,$2); print line}' datafile 3.00 NW 5.00 WE 2.00 SW 4.00 SO 4.00 SE 5.00 EA 3.00 NE 5.00 NO 5.00 CT EXPLANATION The sprintf function formats the seventh and the second fields ($7, $2) using the formatting conventions of the printf function. The formatted string is returned and assigned to the user-defined variable line and printed. The toupper and tolower Functions (gawk only)The toupper function returns a string with all the lowercase characters translated to uppercase, and leaves nonalphabetic characters unchanged. Likewise, the tolower function translates all uppercase letters to lowercase. Strings must be quoted. FORMAT toupper (string) tolower (string) Example 6.181.% awk 'BEGIN{print toupper("linux"), tolower("BASH 2.0")}' LINUX bash 2.0 6.26.2 Time Functions with gawkGawk provides two functions for getting the time and formatting timestamps: systime and strftime. The systime FunctionThe systime function returns the time of day in non-leap-year seconds since January 1, 1970 (called the Epoch). FORMAT systime() Example 6.182.% awk 'BEGIN{now=systime(); print now}' 939515282 EXPLANATION The return value of the systime function is returned to a user-defined variable, now. The value is the time of day in non-leap-year seconds since January 1, 1970. The strftime FunctionThe strftime function formats the time using the C library strftime function. The format specifications are in the form %T %D, and so on (see Table 6.14). The timestamp is in the same form as the return value from systime. If the timestamp is omitted, then the current time of day is used as the default.
FORMAT systime([format specification][,timestamp]) Example 6.183.% awk 'BEGIN{now=strftime("%D", systime()); print now}' 10/09/04 % awk 'BEGIN{now=strftime("%T"); print now}' 17:58:03 % awk 'BEGIN{now=strftime("%m/%d/%y"); print now}' 10/09/04 EXPLANATION The strftime function formats the time and date according to the format instruction provided as an argument. (See Table 6.14.) If systime is given as a second argument or no argument is given at all, the current time for this locale is assumed. If a second argument is given, it must be in the same format as the return value from the systime function. The examples in the next sections use the following datafile database, repeated periodically for your convenience.
6.26.3 Command-Line ArgumentsExample 6.184.% cat argvs.sc # Testing command-line arguments with ARGV and ARGC using a for loop. BEGIN{ for(i=0;i < ARGC;i++) printf("argv[%d] is %s\n", i, ARGV[i]) printf("The number of arguments, ARGC=%d\n", ARGC) } % nawk -f argvs.sc datafile argv[0] is nawk argv[1] is datafile The number of arguments, ARGC=2 EXPLANATION The BEGIN block contains a for loop to process the command-line arguments. ARGC is the number of arguments and ARGV is an array that contains the actual arguments. Nawk does not count options as arguments. The only valid arguments in this example are the nawk command and the input file, datafile.
Example 6.185.1 % nawk 'BEGIN{name=ARGV[1]};\ $0 ~ name {print $3 , $4}' "Derek" datafile nawk: can't open Derek source line number 1 2 % nawk 'BEGIN{name=ARGV[1]; delete ARGV[1]};\ $0 ~ name {print $3, $4}' "Derek" datafile Derek Johnson EXPLANATION
6.26.4 Reading Input (getline)Example 6.186.% nawk 'BEGIN{ "date" | getline d; print d}' datafile Mon Jan 15 11:24:24 PST 2004 EXPLANATION The UNIX/Linux date command is piped to the getline function. The results are stored in the variable d and printed. Example 6.187.% nawk 'BEGIN{ "date " | getline d; split( d, mon) ;print mon[2]}' datafile Jan EXPLANATION The UNIX/Linux date command is piped to the getline function and the results are stored in d. The split function splits the string d into an array called mon. The second element of the array is printed. Example 6.188.% nawk 'BEGIN{ printf "Who are you looking for?" ; \ getline name < "/dev/tty"};\ EXPLANATION Input is read from the terminal, /dev/tty, and stored in the array called name. Example 6.189.% nawk 'BEGIN{while(getline < "/etc/passwd" > 0 ){lc++}; print lc}' datafile 16 EXPLANATION The while loop is used to loop through the /etc/passwd file one line at a time. Each time the loop is entered, a line is read by getline and the value of the variable lc is incremented. When the loop exits, the value of lc is printed (i.e., the number of lines in the /etc/passwd file). As long as the return value from getline is not 0 (i.e., a line has been read), the looping continues. 6.26.5 Control FunctionsExample 6.190.% nawk '{if ( $5 >= 4.5) next; print $1}' datafile northwest southwest southeast eastern north EXPLANATION If the fifth field is greater than 4.5, the next line is read from the input file (datafile) and processing starts at the beginning of the awk script (after the BEGIN block). Otherwise, the first field is printed.
Example 6.191.% nawk '{if ($2 ~ /S/){print ; exit 0}}' datafile southwest SW Chris Foster 2.7 .8 2 18 % echo $status (csh) or echo $? (sh or ksh) 0 EXPLANATION If the second field contains an S, the record is printed and the awk program exits. The C shell status variable contains the exit value. If using the Bourne or Korn shells, the $? variable contains the exit status. 6.26.6 User-Defined FunctionsExample 6.192.(The Command Line) % cat nawk.sc7 1 BEGIN{largest=0} 2 {maximum=max($5)} 3 function max ( num ) { 4 if ( num > largest){ largest=num } return largest 5 } 6 END{ print "The maximum is " maximum "."} % nawk -f nawk.sc7 datafile The maximum is 5.7. EXPLANATION
6.26.7 awk/gawk Command-Line OptionsAwk has a number of comand-line options. Gawk has two formats for command-line options: the GNU long format starting with a double dash (– –) and a word; and the traditional short POSIX format, consisting of a dash and one letter. Gawk-specific options are used with the –W option or its corresponding long option. Any arguments provided to long options are either joined by an = sign (with no intervening spaces), or may be provided in the next command-line argument. The – –help option (see Example 6.193) to gawk lists all the gawk options. See Table 6.15. Example 6.193.% awk --help Usage: awk [POSIX or GNU style options] -f progfile [--] file ... awk [POSIX or GNU style options] [--] 'program' file ... POSIX options: GNU long options: -f progfile --file=progfile -F fs --field-separator=fs -v var=val --assign=var=val -m[fr] val -W compat --compat -W copyleft --copyleft -W copyright --copyright -W help --help -W lint --lint -W lint-old --lint-old -W posix --posix -W re-interval --re-interval -W source=program-text --source=program-text -W traditional --traditional -W usage --usage -W version --version Report bugs to bug-gnu-utils@prep.ai.mit.edu, with a Cc: to arnold@gnu.ai.mit.edu
|
< Day Day Up > |