< Day Day Up > |
3.1. Regular Expressions3.1.1 Definition and ExampleFor users already familiar with the concept of regular expression metacharacters, this section may be bypassed. However, this preliminary material is crucial to understanding the variety of ways in which grep, sed, and awk are used to display and manipulate data. What is a regular expression? A regular expression[1] is just a pattern of characters used to match the same characters in a search. In most programs, a regular expression is enclosed in forward slashes; for example, /love/ is a regular expression delimited by forward slashes, and the pattern love will be matched any time the same pattern is found in the line being searched. What makes regular expressions interesting is that they can be controlled by special metacharacters. If you are new to the idea of regular expressions, let us look at an example that will help you understand what this whole concept is about. Suppose that you are working in the vi editor on an e-mail message to your friend. It looks like this:
% vi letter ------------------------------------------------------------------ Hi tom, I think I failed my anatomy test yesterday. I had a terrible stomachache. I ate too many fried green tomatoes. Anyway, Tom, I need your help. I'd like to make the test up tomorrow, but don't know where to begin studying. Do you think you could help me? After work, about 7 PM, come to my place and I'll treat you to pizza in return for your help. Thanks. Your pal, guy@phantom ~ ~ ~ ~ ---------------------------------------------------------------------- Now, suppose you find out that Tom never took the test either, but David did. You also notice that in the greeting, you spelled Tom with a lowercase t. So you decide to make a global substitution to replace all occurrences of tom with David, as follows:
% vi letter
------------------------------------------------------------------
Hi David,
I think I failed my anaDavidy test yesterday. I had a terrible
sDavidachache. I think I ate too many fried green Davidatoes.
Anyway, Tom, I need your help. I'd like to make the test up
Davidorrow, but don't know where to begin studying. Do you
think you could help me? After work, about 7 PM, come to
my place and I'll treat you to pizza in return for your help. Thanks.
Your pal,
guy@phanDavid
~
~
~
--> :1,$s/tom/David/g
----------------------------------------------------------------------
The regular expression in the search string is tom. The replacement string is David. The vi command reads "for lines 1 to the end of the file ($), substitute tom everywhere it is found on each line and replace it with David." Hardly what you want! And one of the occurrences of Tom was untouched because you only asked for tom, not Tom, to be replaced with David. So what to do? Enter the regular expression metacharacters. 3.1.2 Regular Expression MetacharactersMetacharacters are characters that represent something other than themselves. The two types of metacharacters that you will learn about in this book are shell metacharacters and regular expression metacharacters. They serve different purposes. Shell metacharacters are evaluated by the UNIX/Linux shell. For example, when you use the command: rm *, the asterisk is a shell metacharacter, called a wildcard, and is evaluated by the shell to mean "Match on all filenames in the current working directory." The shell metacharacters are described for the shells in their respective chapters. Regular expression metacharacters are evaluated by the programs that perform pattern matching, such as vi, grep, sed, and awk.[2] They are special characters that allow you to delimit a pattern in some way so that you can control what substitutions will take place. There are metacharacters to anchor a word to the beginning or end of a line. There are metacharacters that allow you to specify any characters, or some number of characters, to find both upper-and lowercase characters, digits only, and so forth. For example, to change the name tom or Tom to David, the following vi command would have done the job:
:1,$s/\<[Tt]om\>/David/g This command reads, "From the first line to the last line of the file (1,$), substitute (s) the word Tom or tom with David," and the g flag says to do this globally (i.e., make the substitution if it occurs more than once on the same line). The regular expression metacharacters are \< and \> for beginning and end of a word, and the pair of brackets, [Tt], match for one of the characters enclosed within them (in this case, for either T or t). There are five basic metacharacters that all UNIX/Linux pattern-matching utilities recognize. Table 3.1 presents regular expression metacharacters that can be used in all versions of vi, ex, grep, egrep, sed, and awk. Additional metacharacters are described for each of the utilities where applicable.
Assuming that you know how the vi editor works, each metacharacter is described in terms of the vi search string. In the following examples, characters are highlighted to demonstrate what vi will find in its search. Example 3.1.(A simple regular expression search) % vi picnic ---------------------------------------------------------------- I had a lovely time on our little picnic. Lovers were all around us. It is springtime. Oh love, how much I adore you. Do you know the extent of my love? Oh, by the way, I think I lost my gloves somewhere out in that field of clover. Did you see them? I can only hope love is forever. I live for you. It's hard to get back in the groove. ~ ~ ~ /love/ ----------------------------------------------------------------- EXPLANATION The regular expression is love. The pattern love is found by itself and as part of other words, such as lovely, gloves, and clover. Example 3.2.(The beginning-of-line anchor (^)) % vi picnic ---------------------------------------------------------------- I had a lovely time on our little picnic. Lovers were all around us. It is springtime. Oh love, how much I adore you. Do you know the extent of my love? Oh, by the way, I think I lost my gloves somewhere out in that field of clover. Did you see them? I can only hope love is forever. I live for you. It's hard to get back in the groove. ~ ~ ~ /^love/ ----------------------------------------------------------------- EXPLANATION The caret (^) is called the beginning-of-line anchor. Vi will find only those lines where the regular expression love is matched at the beginning of the line, i.e., love is the first set of characters on the line; it cannot be preceded by even one space. Example 3.3.(The end-of-line anchor ($)) % vi picnic ---------------------------------------------------------------- I had a lovely time on our little picnic. Lovers were all around us. It is springtime. Oh love, how much I adore you. Do you know the extent of my love? Oh, by the way, I think I lost my gloves somewhere out in that field of clover. Did you see them? I can only hope love is forever. I live for you. It's hard to get back in the groove. ~ ~ ~ /love$/ ---------------------------------------------------------------- EXPLANATION The dollar sign ($) is called the end-of-line anchor. Vi will find only those lines where the regular expression love is matched at the end of the line, i.e., love is the last set of characters on the line and is directly followed by a newline. Example 3.4.(Any Single Character (.)) % vi picnic ---------------------------------------------------------------- I had a lovely time on our little picnic. Lovers were all around us. It is springtime. Oh love, how much I adore you. Do you know the extent of my love? Oh, by the way, I think I lost my gloves somewhere out in that field of clover. Did you see them? I can only hope love is forever. I live for you. It's hard to get back in the groove. ~ ~ ~ /l.ve/ ----------------------------------------------------------------- EXPLANATION The dot (.) matches any one character, except the newline. Vi will find those lines where the regular expression consists of an l, followed by any single character, followed by a v and an e. It finds combinations of love and live. Example 3.5.(Zero or more of the preceding character (*)) % vi picnic ---------------------------------------------------------------- I had a lovely time on our little picnic. Lovers were all around us. It is springtime. Oh love, how much I adore you. Do you know the extent of my love? Oh, by the way, I think I lost my gloves somewhere out in that field of clover. Did you see them? I can only hope love is forever. I live for you. It's hard to get back in the groove. ~ ~ ~ /o*ve/ ----------------------------------------------------------------- EXPLANATION The asterisk (*) matches zero or more of the preceding character.[a] It is as though the asterisk were glued to the character directly before it and controls only that character. In this case, the asterisk is glued to the letter o. It matches for only the letter o and as many consecutive occurrences of the letter o as there are in the pattern, even no occurrences of o at all. Vi searches for zero or more occurrences of the letter o followed by a v and an e, finding love, loooove, lve, and so forth.
Example 3.6.(A set of characters ([])) % vi picnic ---------------------------------------------------------------- I had a lovely time on our little picnic. Lovers were all around us. It is springtime. Oh love, how much I adore you. Do you know the extent of my love? Oh, by the way, I think I lost my gloves somewhere out in that field of clover. Did you see them? I can only hope love is forever. I live for you. It's hard to get back in the groove. ~ ~ ~ /[Ll]ove/ ---------------------------------------------------------------- EXPLANATION The square brackets match for one of a set of characters. Vi will search for the regular expression containing either an uppercase or lowercase l followed by an o, v, and e. Example 3.7.(A range of characters ( [ - ] )) % vi picnic ---------------------------------------------------------------- I had a lovely time on our little picnic. Lovers were all around us. It is springtime. Oh love, how much I adore you. Do you know the extent of my love? Oh, by the way, I think I lost my gloves somewhere out in that field of clover. Did you see them? I can only hope love is forever. I live for you. It's hard to get back in the groove. ~ ~ ~ /ove[a-z]/ ----------------------------------------------------------------- EXPLANATION The dash between characters enclosed in square brackets matches one character in a range of characters. Vi will search for the regular expression containing an o, v, and e, followed by any character in the ASCII range between a and z. Since this is an ASCII range, the range cannot be represented as [z–a]. Example 3.8.(Not one of the characters in the set ([^])) % vi picnic ---------------------------------------------------------------- I had a lovely time on our little picnic. Lovers were all around us. It is springtime. Oh love, how much I adore you. Do you know the extent of my love? Oh, by the way, I think I lost my gloves somewhere out in that field of clover. Did you see them? I can only hope love is forever. I live for you. It's hard to get back in the groove. ~ ~ ~ /ove[^a-zA-Z0-9]/ ---------------------------------------------------------------- EXPLANATION The caret inside square brackets is a negation metacharacter. Vi will search for the regular expression containing an o, v, and e, followed by any character not in the ASCII range between a and z, not in the range between A and Z, and not a digit between 0 and 9. For example, it will find ove followed by a comma, a space, a period, and so on, because those characters are not in the set. |
< Day Day Up > |