Previous Section  < Day Day Up >  Next Section

6.26. awk Built-In Functions

Unless otherwise noted, the examples in this section use the following datafile, repeated periodically for your convenience.

% cat datafile

northwest

NW

Joel Craig

3.0

.98

3

4

western

WE

Sharon Kelly

5.3

.97

5

23

southwest

SW

Chris Foster

2.7

.8

2

18

southern

SO

May Chin

5.1

.95

4

15

southeast

SE

Derek Johnson

4.0

.7

4

17

eastern

EA

Susan Beal

4.4

.84

5

20

northeast

NE

TJ Nichols

5.1

.94

3

13

north

NO

Val Shultz

4.5

.89

5

9

central

CT

Sheri Watson

5.7

.94

5

13


6.26.1 String Functions

Example 6.173.

% nawk 'NR==1{gsub(/northwest/,"southeast", $1) ;print}' datafile

southeast             NW       Joel Craig             3.0   .98     3    4


EXPLANATION

If this is the first record (NR == 1), globally substitute the regular expression northwest with southeast, if northwest is found in the first field.

Example 6.174.

% nawk 'NR==1{print substr($3, 1, 3)}' datafile

Joe


EXPLANATION

If this is the first record, display the substring of the third field, starting at the first character, and extracting a length of 3 characters. The substring Joe is printed.

Example 6.175.

% nawk 'NR==1{print length($1)}' datafile

9


EXPLANATION

If this is the first record, the length (number of characters) in the first field is printed.

Example 6.176.

% nawk 'NR==1{print index($1,"west")}' datafile

6


EXPLANATION

If this is the first record, print the first position where the substring west is found in the first field. The string west starts at the sixth position (index) in the string northwest.

% cat datafile

northwest

NW

Joel Craig

3.0

.98

3

4

western

WE

Sharon Kelly

5.3

.97

5

23

southwest

SW

Chris Foster

2.7

.8

2

18

southern

SO

`May Chin

5.1

.95

4

15

southeast

SE

Derek Johnson

4.0

.7

4

17

eastern

EA

Susan Beal

4.4

.84

5

20

northeast

NE

TJ Nichols

5.1

.94

3

13

north

NO

Val Shultz

4.5

.89

5

9

central

CT

Sheri Watson

5.7

.94

5

13


Example 6.177.

% nawk '{if(match($1,/^no/)){print substr($1,RSTART,RLENGTH)}}' datafile

no

no

no


EXPLANATION

If the match function finds the regular expression /^no/ in the first field, the index position of the leftmost character is returned. The built-in variable RSTART is set to the index position and the RLENGTH variable is set to the length of the matched substring. The substr function returns the string in the first field starting at position RSTART, RLENGTH number of characters.

Example 6.178.

% nawk 'BEGIN{split("10/14/04",now,"/");print now[1],now[2],now[3]}'

10 14 04


EXPLANATION

The string 10/14/04 is split into an array called now. The delimiter is the forward slash. The elements of the array are printed, starting at the first element of the array.

The following datafile2 is used for Example 6.179.

% cat datafile2

Joel Craig:northwest:NW:3.0:.98:3:4

Sharon Kelly:western:WE:5.3:.97:5:23

Chris Foster:southwest:SW:2.7:.8:2:18

May Chin:southern:SO:5.1:.95:4:15

Derek Johnson:southeast:SE:4.0:.7:4:17

Susan Beal:eastern:EA:4.4:.84:5:20

TJ Nichols:northeast:NE:5.1:.94:3:13

Val Shultz:north:NO:4.5:.89:5:9

Sheri Watson:central:CT:5.7:.94:5:13


Example 6.179.

% nawk -F: '/north/{split($1, name, " ");\

  print "First name: "name[1];\

  print "Last name: " name[2];\

  print "\n--------------------"}' datafile2



First name: Joel

Last name: Craig

--------------------

First name: TJ

Last name: Nichols

--------------------

First name: Val

last name: Shultz

--------------------


EXPLANATION

The input field separator is set to a colon (–F:). If the record contains the regular expression north, the first field is split into an array called name, where a space is the delimiter. The elements of the array are printed.

% cat datafile

northwest

NW

Joel Craig

3.0

.98

3

4

western

WE

Sharon Kelly

5.3

.97

5

23

southwest

SW

Chris Foster

2.7

.8

2

18

southern

SO

May Chin

5.1

.95

4

15

southeast

SE

Derek Johnson

4.0

.7

4

17

eastern

EA

Susan Beal

4.4

.84

5

20

northeast

NE

TJ Nichols

5.1

.94

3

13

north

NO

Val Shultz

4.5

.89

5

9

central

CT

Sheri Watson

5.7

.94

5

13


Example 6.180.

% nawk '{line=sprintf("%10.2f%5s\n",$7,$2); print line}' datafile

3.00   NW

5.00   WE

2.00   SW

4.00   SO

4.00   SE

5.00   EA

3.00   NE

5.00   NO

5.00   CT


EXPLANATION

The sprintf function formats the seventh and the second fields ($7, $2) using the formatting conventions of the printf function. The formatted string is returned and assigned to the user-defined variable line and printed.

The toupper and tolower Functions (gawk only)

The toupper function returns a string with all the lowercase characters translated to uppercase, and leaves nonalphabetic characters unchanged. Likewise, the tolower function translates all uppercase letters to lowercase. Strings must be quoted.

FORMAT


toupper (string)

tolower (string)


Example 6.181.

% awk 'BEGIN{print toupper("linux"), tolower("BASH 2.0")}'

LINUX bash 2.0


6.26.2 Time Functions with gawk

Gawk provides two functions for getting the time and formatting timestamps: systime and strftime.

The systime Function

The systime function returns the time of day in non-leap-year seconds since January 1, 1970 (called the Epoch).

FORMAT


systime()


Example 6.182.

% awk 'BEGIN{now=systime(); print now}'

939515282


EXPLANATION

The return value of the systime function is returned to a user-defined variable, now. The value is the time of day in non-leap-year seconds since January 1, 1970.

The strftime Function

The strftime function formats the time using the C library strftime function. The format specifications are in the form %T %D, and so on (see Table 6.14). The timestamp is in the same form as the return value from systime. If the timestamp is omitted, then the current time of day is used as the default.

Table 6.14. Date and Time Format Specifications

Date Format

Definition

For the following definitions, assume the current date and time as

Date: Sunday, October 17, 2004

Time: 15:26:26 PDT

%a

Abbreviated weekday name (Sun)

%A

Full weekday name (Sunday)

%b

Abbreviated month name (Oct)

%B

Full month name (October)

%c

Date and time for locale (Sun Oct 17 15:26:46 2004)

%d

Day of month in decimal (17)

%D

Date as 10/17/04[a]

%e

Day of the month, padded with space if only one digit

%H

Hour for a 24-hour clock in decimal (15)

%I

Hour for a 12-hour clock in decimal (03)

%j

Day of the year since January 1 in decimal (290)

%m

Month in decimal (10)

%M

Minute in decimal (26)

%p

AM/PM notation assuming a 12-hour clock (PM)

%S

Second as a decimal number (26)

%U

Week number of the year (with the first Sunday as the first day of week one) as a decimal number (42)

%w

Weekday (Sunday is 0) as a decimal number (0)

%W

The week number of the year (the first Monday as the first day of week one) as a decimal number (41)

%x

Date representation for locale (10/17/04)

%X

Time representation for locale (15:26:26)

%y

Year as two digits in decimal (04)

%Y

Year with century (2004)

%Z

Time zone (PDT)

%%

A literal percent sign (%)


[a] %D and %e are available only on some versions of gawk.

FORMAT


systime([format specification][,timestamp])


Example 6.183.

% awk 'BEGIN{now=strftime("%D", systime()); print now}'

10/09/04



% awk 'BEGIN{now=strftime("%T"); print now}'

17:58:03



% awk 'BEGIN{now=strftime("%m/%d/%y"); print now}'

10/09/04


EXPLANATION

The strftime function formats the time and date according to the format instruction provided as an argument. (See Table 6.14.) If systime is given as a second argument or no argument is given at all, the current time for this locale is assumed. If a second argument is given, it must be in the same format as the return value from the systime function.

The examples in the next sections use the following datafile database, repeated periodically for your convenience.

% cat datafile

Northwest

NW

Joel Craig

3.0

.98

3

4

Western

WE

Sharon Kelly

5.3

.97

5

23

Southwest

SW

Chris Foster

2.7

.8

2

18

Southern

SO

May Chin

5.1

.95

4

15

Southeast

SE

Derek Johnson

4.0

.7

4

17

Eastern

EA

Susan Beal

4.4

.84

5

20

Northeast

NE

TJ Nichols

5.1

.94

3

13

North

NO

Val Shultz

4.5

.89

5

9

central

CT

Sheri Watson

5.7

.94

5

13


6.26.3 Command-Line Arguments

Example 6.184.

% cat argvs.sc

# Testing command-line arguments with ARGV and ARGC using a for loop.



BEGIN{

     for(i=0;i < ARGC;i++)

         printf("argv[%d] is %s\n", i, ARGV[i])

         printf("The number of arguments, ARGC=%d\n", ARGC)

}



% nawk -f argvs.sc datafile

argv[0] is nawk

argv[1] is datafile

The number of arguments, ARGC=2


EXPLANATION

The BEGIN block contains a for loop to process the command-line arguments. ARGC is the number of arguments and ARGV is an array that contains the actual arguments. Nawk does not count options as arguments. The only valid arguments in this example are the nawk command and the input file, datafile.

% cat datafile

northwest

NW

Joel Craig

3.0

.98

3

4

western

WE

Sharon Kelly

5.3

.97

5

23

southwest

SW

Chris Foster

2.7

.8

2

18

southern

SO

May Chin

5.1

.95

4

15

southeast

SE

Derek Johnson

4.0

.7

4

17

eastern

EA

Susan Beal

4.4

.84

5

20

northeast

NE

TJ Nichols

5.1

.94

3

13

north

NO

Val Shultz

4.5

.89

5

9

central

CT

Sheri Watson

5.7

.94

5

13


Example 6.185.

1   % nawk 'BEGIN{name=ARGV[1]};\

    $0 ~ name {print $3 , $4}'  "Derek" datafile

    nawk: can't open Derek

    source line number 1



2   % nawk 'BEGIN{name=ARGV[1]; delete ARGV[1]};\

    $0 ~ name {print $3, $4}'  "Derek" datafile

    Derek Johnson


EXPLANATION

  1. The name "Derek" was assigned to the variable name in the BEGIN block. In the pattern-action block, nawk attempted to open "Derek" as an input file and failed.

  2. After assigning "Derek" to the variable name, ARGV[1] is deleted. When starting the pattern-action block, nawk does not try to open "Derek" as the input file, but opens datafile instead.

6.26.4 Reading Input (getline)

Example 6.186.

% nawk 'BEGIN{ "date" | getline d; print d}' datafile

Mon Jan 15 11:24:24 PST 2004


EXPLANATION

The UNIX/Linux date command is piped to the getline function. The results are stored in the variable d and printed.

Example 6.187.

% nawk 'BEGIN{ "date " | getline d; split( d, mon) ;print mon[2]}' datafile

Jan


EXPLANATION

The UNIX/Linux date command is piped to the getline function and the results are stored in d. The split function splits the string d into an array called mon. The second element of the array is printed.

Example 6.188.

% nawk 'BEGIN{ printf "Who are you looking for?" ; \

  getline name < "/dev/tty"};\


EXPLANATION

Input is read from the terminal, /dev/tty, and stored in the array called name.

Example 6.189.

% nawk 'BEGIN{while(getline < "/etc/passwd"  > 0 ){lc++}; print lc}' datafile

16


EXPLANATION

The while loop is used to loop through the /etc/passwd file one line at a time. Each time the loop is entered, a line is read by getline and the value of the variable lc is incremented. When the loop exits, the value of lc is printed (i.e., the number of lines in the /etc/passwd file). As long as the return value from getline is not 0 (i.e., a line has been read), the looping continues.

6.26.5 Control Functions

Example 6.190.

% nawk '{if ( $5 >= 4.5) next; print $1}' datafile

northwest

southwest

southeast

eastern

north


EXPLANATION

If the fifth field is greater than 4.5, the next line is read from the input file (datafile) and processing starts at the beginning of the awk script (after the BEGIN block). Otherwise, the first field is printed.

% cat datafile

northwest

NW

Joel Craig

3.0

.98

3

4

western

WE

Sharon Kelly

5.3

.97

5

23

southwest

SW

Chris Foster

2.7

.8

2

18

southern

SO

May Chin

5.1

.95

4

15

southeast

SE

Derek Johnson

4.0

.7

4

17

eastern

EA

Susan Beal

4.4

.84

5

20

northeast

NE

TJ Nichols

5.1

.94

3

13

north

NO

Val Shultz

4.5

.89

5

9

central

CT

Sheri Watson

5.7

.94

5

13


Example 6.191.

% nawk '{if ($2 ~ /S/){print ; exit 0}}' datafile

southwest      SW       Chris   Foster  2.7    .8      2       18



% echo $status (csh) or echo $? (sh or ksh)

0


EXPLANATION

If the second field contains an S, the record is printed and the awk program exits. The C shell status variable contains the exit value. If using the Bourne or Korn shells, the $? variable contains the exit status.

6.26.6 User-Defined Functions

Example 6.192.

(The Command Line)

% cat nawk.sc7

1   BEGIN{largest=0}

2   {maximum=max($5)}



3   function max ( num ) {

4       if ( num > largest){ largest=num }

        return largest

5   }

6   END{ print "The maximum is " maximum "."}



% nawk -f nawk.sc7 datafile

The maximum is 5.7.


EXPLANATION

  1. In the BEGIN block, the user-defined variable largest is initialized to 0.

  2. For each line in the file, the variable maximum is assigned the value returned from the function max. The function max is given $5 as its argument.

  3. The user-defined function max is defined. The function statements are enclosed in curly braces. Each time a new record is read from the input file, datafile, the function max will be called.

  4. It will compare the values in num and largest and return the larger of the two numbers.

  5. The function definition block ends.

  6. The END block prints the final value in maximum.

6.26.7 awk/gawk Command-Line Options

Awk has a number of comand-line options. Gawk has two formats for command-line options: the GNU long format starting with a double dash (– –) and a word; and the traditional short POSIX format, consisting of a dash and one letter. Gawk-specific options are used with the –W option or its corresponding long option. Any arguments provided to long options are either joined by an = sign (with no intervening spaces), or may be provided in the next command-line argument. The – –help option (see Example 6.193) to gawk lists all the gawk options. See Table 6.15.

Example 6.193.

% awk --help

Usage: awk [POSIX or GNU style options] -f progfile [--] file ...

        awk [POSIX or GNU style options] [--] 'program' file ...

POSIX options:          GNU long options:

        -f progfile             --file=progfile

        -F fs                   --field-separator=fs

        -v var=val              --assign=var=val

        -m[fr] val

        -W compat               --compat

        -W copyleft             --copyleft

        -W copyright            --copyright

        -W help                 --help

        -W lint                 --lint

        -W lint-old             --lint-old

        -W posix                --posix

        -W re-interval          --re-interval

        -W source=program-text  --source=program-text

        -W traditional          --traditional

        -W usage                --usage

        -W version              --version

Report bugs to bug-gnu-utils@prep.ai.mit.edu,

with a Cc: to arnold@gnu.ai.mit.edu


Table 6.15. gawk Command-Line Options

Options

Meaning


-F fs,

--field-separator fs


Specifies the input field separator,where fs is either a string or regular expression; for example, FS=":" or FS="[\t:]".


-v var=value,

--assign var=value


Assigns a value to a user-defined variable, var before the awk script starts execution. Available to the BEGIN block.


-f scriptfile,

--file scriptfile


Reads awk commands from the scriptfile.


-mf nnn,

-mr nnn


Sets memory limits to the value of nnn. With –mf as the option, limits the maximum number of fields to nnn; with –mr as the option, sets the maximum number of records. Not applicable for gawk.


-W traditional,

-W compat,

--traditional

--compat


Runs in compatibility mode so that gawk behaves exactly as UNIX versions of awk. All gawk extensions are ignored. Both modes do the same thing; --traditional is preferred.


-W copyleft

-W copyright

--copyleft


Prints abbreviated version of copyright information.


-W help

-W usage

--help

--usage


Prints the available awk options and a short summary of what they do.


-W lint

--lint


Prints warnings about the use of constructs that may not be portable to traditional versions of UNIX awk.


-W lint-old,

--lint-old


Provides warnings about constructs that are not portable to the original version of UNIX implementations.


-W posix

--posix


Turns on the compatibility mode. Does not recognize \x escape sequences, newlines as a field separator character if FS is assigned a single space, the function keyword (func), operators ** and **= to replace ^ and ^=, and fflush.


-W re-interval,

--re-interval


Allows the use of interval regular expressions (see "The POSIX Character Class" on page 176); that is, the bracketed expressions such as [[:alpha:]].


-W source program-text

--source program-text


Uses program-text as awk's source code allowing awk commands at the command line to be intermixed with -f files; for example, awk -W source '{print $1} -f cmdfile inputfile.


-W version

--version


Prints version and bug reporting information.

--

Signals the end of option processing.


    Previous Section  < Day Day Up >  Next Section