< Day Day Up > |
6.25. Odds and EndsSome data (e.g., that read in from tape or from a spreadsheet) may not have obvious field separators but may instead have fixed-width columns. To preprocess this type of data, the substr function is useful. 6.25.1 Fixed FieldsIn the following example, the fields are of a fixed width, but are not separated by a field separator. The substr function is used to create fields. Example 6.167.% cat fixed 031291ax5633(408)987–0124 021589bg2435(415)866–1345 122490de1237(916)933–1234 010187ax3458(408)264–2546 092491bd9923(415)134–8900 112990bg4567(803)234–1456 070489qr3455(415)899–1426 % nawk '{printf substr($0,1,6)" ";printf substr($0,7,6)" ";\ print substr($0,13,length)}' fixed 031291 ax5633 (408)987–0124 021589 bg2435 (415)866–1345 122490 de1237 (916)933–1234 010187 ax3458 (408)264–2546 092491 bd9923 (415)134–8900 112990 bg4567 (803)234–1456 070489 qr3455 (415)899–1426 EXPLANATION The first field is obtained by getting the substring of the entire record, starting at the first character, offset by 6 places. Next, a space is printed. The second field is obtained by getting the substring of the record, starting at position 7, offset by 6 places, followed by a space. The last field is obtained by getting the substring of the entire record, starting at position 13 to the position represented by the length of the line. (The length function returns the length of the current line, $0, if it does not have an argument.) Empty FieldsIf the data is stored in fixed-width fields, it is possible that some of the fields are empty. In the following example, the substr function is used to preserve the fields, regardless of whether they contain data. Example 6.168.1 % cat db xxx xxx xxx abc xxx xxx a bbb xxx xx % cat awkfix # Preserving empty fields. Field width is fixed. { 2 f[1]=substr($0,1,3) 3 f[2]=substr($0,5,3) 4 f[3]=substr($0,9,3) 5 line=sprintf("%-4s%-4s%-4s\n", f[1],f[2], f[3]) 6 print line } % nawk –f awkfix db xxx xxx xxx abc xxx xxx a bbb xxx xx EXPLANATION
Numbers with $, Commas, or Other CharactersIn the following example, the price field contains a dollar sign and comma. The script must eliminate these characters to add up the prices to get the total cost. This is done using the gsub function. Example 6.169.% cat vendor access tech:gp237221:220:vax789:20/20:11/01/90:$1,043.00 alisa systems:bp262292:280:macintosh:new updates:06/30/91:$456.00 alisa systems:gp262345:260:vax8700:alisa talk:02/03/91:$1,598.50 apple computer:zx342567:240:macs:e–mail:06/25/90:$575.75 caci:gp262313:280:sparc station:network11.5:05/12/91:$1,250.75 datalogics:bp132455:260:microvax2:pagestation maint:07/01/90:$1,200.00 dec:zx354612:220:microvax2:vms sms:07/20/90:$1,350.00 % nawk –F: '{gsub(/\$/,"");gsub(/,/,""); cost +=$7};\ END{print "The total is $" cost}' vendor $7474 EXPLANATION The first gsub function globally substitutes the literal dollar sign (\$) with the null string, and the second gsub function substitutes commas with a null string. The user-defined cost variable is then totaled by adding the seventh field to cost and assigning the result back to cost. In the END block, the string The total is $ is printed, followed by the value of cost.[a]
6.25.2 Multiline RecordsIn the sample data files used so far, each record is on a line by itself. In the following sample datafile, called checkbook, the records are separated by blank lines and the fields are separated by newlines. To process this file, the record separator (RS) is assigned a value of null, and the field separator (FS) is assigned the newline. Example 6.170.(The Input File) % cat checkbook 1/1/04 #125 –695.00 Mortgage 1/1/04 #126 –56.89 PG&E 1/2/04 #127 –89.99 Safeway 1/3/04 +750.00 Paycheck 1/4/04 #128 –60.00 Visa (The Script) % cat awkchecker 1 BEGIN{RS=""; FS="\n";ORS="\n\n"} 2 {print NR, $1,$2,$3,$4} (The Output) % nawk –f awkchecker checkbook 1 1/1/04 #125 –695.00 Mortgage 2 1/1/04 #126 –56.89 PG&E 3 1/2/04 #127 –89.99 Safeway 4 1/3/04 +750.00 Paycheck 5 1/4/04 #128 –60.00 Visa EXPLANATION
6.25.3 Generating Form LettersThe following example is modified from a program in The AWK Programming Language.[4] The tricky part of this is keeping track of what is actually being processed. The input file is called data.form. It contains just the data. Each field in the input file is separated by colons. The other file is called form.letter. It is the actual form that will be used to create the letter. This file is loaded into awk's memory with the getline function. Each line of the form letter is stored in an array. The program gets its data from data.form, and the letter is created by substituting real data for the special strings preceded by # and @ found in form.letter. A temporary variable, temp, holds the actual line that will be displayed after the data has been substituted. This program allows you to create personalized form letters for each person listed in data.form.
Example 6.171.(The Awk Script) % cat form.awk # form.awk is an awk script that requires access to 2 files: The # first file is called "form.letter." This file contains the # format for a form letter. The awk script uses another file, # "data.form," as its input file. This file contains the # information that will be substituted into the form letters in # the place of the numbers preceded by pound signs. Today's date # is substituted in the place of "@date" in "form.letter." 1 BEGIN{ FS=":"; n=1 2 while(getline < "form.letter" > 0) 3 form[n++] = $0 # Store lines from form.letter in an array 4 "date" | getline d; split(d, today, " ") # Output of date is Fri Mar 2 14:35:50 PST 2004 5 thisday=today[2]". "today[3]", "today[6] 6 } 7 { for( i = 1; i < n; i++ ){ 8 temp=form[i] 9 for ( j = 1; j <=NF; j++ ){ gsub("@date", thisday, temp) 10 gsub("#" j, $j , temp ) } 11 print temp } } % cat form.letter The form letter, form.letter, looks like this: ********************************************************* Subject: Status Report for Project "#1" To: #2 From: #3 Date: @date This letter is to tell you, #2, that project "#1" is up to date. We expect that everything will be completed and ready for shipment as scheduled on #4. Sincerely, #3 ********************************************************** The file, data.form, is awk's input file containing the data that will replace the #1–4 and the @date in form.letter. % cat data.form Dynamo:John Stevens:Dana Smith, Mgr:4/12/2004 Gallactius:Guy Sterling:Dana Smith, Mgr:5/18/2004 (The Command Line) % nawk –f form.awk data.form ********************************************************* Subject: Status Report for Project "Dynamo" To: John Stevens From: Dana Smith, Mgr Date: Mar. 2, 2004 This letter is to tell you, John Stevens, that project "Dynamo" is up to date. We expect that everything will be completed and ready for shipment as scheduled on 4/12/2001. Sincerely, Dana Smith, Mgr Subject: Status Report for Project "Gallactius" To: Guy Sterling From: Dana Smith, Mgr Date: Mar. 2, 2004 This letter is to tell you, Guy Sterling, that project "Gallactius" is up to date. We expect that everything will be completed and ready for shipment as scheduled on 5/18/2004. Sincerely, Dana Smith, Mgr EXPLANATION
6.25.4 Interaction with the ShellNow that you have seen how awk works, you will find that awk is a very powerful utility when writing shell scripts. You can embed one-line awk commands or awk scripts within your shell scripts. The following is a sample of a Korn shell program embedded with awk commands. Example 6.172.!#/bin/ksh # This korn shell script will collect data for awk to use in # generating form letter(s). See above. print "Hello $LOGNAME. " print "This report is for the month and year:" 1 cal | nawk 'NR==1{print $0}' if [[ –f data.form || –f formletter? ]] then rm data.form formletter? 2> /dev/null fi integer num=1 while true do print "Form letter #$num:" read project?"What is the name of the project? " read sender?"Who is the status report from? " read recipient?"Who is the status report to? " read due_date?"What is the completion date scheduled? " echo $project:$recipient:$sender:$due_date > data.form print –n "Do you wish to generate another form letter? " read answer if [[ "$answer" != [Yy]* ]] then break else 2 nawk –f form.awk data.form > formletter$num fi (( num+=1 )) done nawk –f form.awk data.form > formletter$num EXPLANATION
|
< Day Day Up > |