Previous Section  < Day Day Up >  Next Section

4.9. grep –E or egrep (GNU Extended grep)

The main advantage of using extended grep is that additional regular expression metacharacters (see Table 4.10) have been added to the basic set. With the –E extension, GNU grep allows the use of these new metacharacters.

Table 4.10. egrep's Regular Expression Metacharacters

Metacharacter

Function

Example

What It Matches

^

Beginning-of-line anchor

^love

Matches all lines beginning with love.

$

End-of-line anchor

love$

Matches all lines ending with love.

.

Matches one character

l..e

Matches lines containing an l, followed by two characters, followed by an e.

*

Matches zero or more characters

*love

Matches lines with zero or more spaces, of the preceding characters followed by the pattern love.

[ ]

Matches one character in the set

[Ll]ove

Matches lines containing love or Love.

[^ ]

Matches one character not in the set

[^A–KM–Z]ove

Matches lines not containing A through K or M through Z, followed by ove.

New with grep –E or egrep

+

Matches one or more of the preceding characters

[a–z]+ove

Matches one or more lowercase letters, followed by ove. Would find move, approve, love, behoove, etc.

?

Matches zero or one of the preceding characters

lo?ve

Matches for an l followed by either one or not any o's at all. Would find love or lve.

a|b

Matches either a or b

love|hate

Matches for either expression,love or hate.

()

Groups characters


love(able|ly)

(ov)+


Matches for loveable or lovely.

Matches for one or more occurrences of ov.


x{m}

x{m,}

x{m,n}[a]


Repetition of character x: m times, at least m times, or between m and n times


o\{5}

o\{5,}

o\{5,10}


Matches if line has 5 occurrences of o at least 5 occurrences of o, or between 5 and 10 occurrences of o.

\w

Alphanumeric word character; [a-zA-Z0-9_]

l\w*e

Matches an l followed by zero more word characters, and an e.

\W

Nonalphanumeric word character; [^a-zA-Z0-9_]

\W\w*

Matches a non-word (\W) character followed by zero or more word characters (\w).

\b

Word boundary

\blove\b

Matches only the word love.


[a] The { } metacharacters are not supported on all versions of UNIX or all pattern-matching utilities; they usually work with vi and grep. They don't work with UNIX egrep at all.

4.9.1 grep –E and egrep Examples

The following examples illustrate the way the extended set of regular expression metacharacters are used with grep –E and egrep. The grep examples presented earlier illustrate the use of the standard metacharacters, also recognized by egrep. With basic GNU grep (grep –G), it is possible to use any of the additional metacharacters, provided that each of the special metacharacters is preceded by a backslash.

The following examples show all three variants of grep to accomplish the same task.

The examples in this section use the following datafile, repeated periodically for your convenience.

% cat datafile

northwest

NW

Charles Main

3.0

.98

3

34

western

WE

Sharon Gray

5.3

.97

5

23

southwest

SW

Lewis Dalsass

2.7

.8

2

18

southern

SO

Suan Chin

5.1

.95

4

15

southeast

SE

Patricia Hemenway

4.0

.7

4

17

eastern

EA

TB Savage

4.4

.84

5

20

northeast

NE

AM Main Jr.

5.1

.94

3

13

north

NO

Margot Weber

4.5

.89

5

9

central

CT

Ann Stephens

5.7

.94

5

13


Example 4.41.

1   % egrep 'NW|EA' datafile

    northwest         NW        Charles Main            3.0   .98     3    34

    eastern           EA        TB Savage               4.4   .84     5    20



2   % grep -E 'NW|EA' datafile

    northwest         NW        Charles Main            3.0   .98     3    34

    eastern           EA        TB Savage               4.4   .84     5    20



3   % grep 'NW|EA' datafile



4   % grep 'NW\|EA' datafile

    northwest         NW        Charles Main            3.0   .98     3    34

    eastern           EA        TB Savage               4.4   .84     5    20


EXPLANATION

  1. Prints the line if it contains either the expression NW or the expression EA. In this example, egrep is used. If you do not have the GNU version of grep, use egrep.

  2. In this example, the GNU grep is used with the –E option to include the extended metacharacters. Same as egrep.

  3. Regular grep does not normally support extended regular expressions; the vertical bar is an extended regular expression metacharacter used for alternation. Regular grep doesn't recognize it and searches for the explicit pattern 'NW|EA'. Nothing matches; nothing prints.

  4. With GNU regular grep (grep –G), if the metacharacter is preceded by a backslash it will be interpreted as an extended regular expression just as with egrep and grep –E.

% cat datafile

northwest

NW

Charles Main

3.0

.98

3

34

western

WE

Sharon Gray

53

.97

5

23

southwest

SW

Lewis Dalsass

2.7

.8

2

18

southern

SO

Suan Chin

5.1

.95

4

15

southeast

SE

Patricia Hemenway

4.0

.7

4

17

eastern

EA

TB Savage

4.4

.84

5

20

northeast

NE

AM Main Jr.

5.1

.94

3

13

north

NO

Margot Weber

4.5

.89

5

9

central

CT

Ann Stephens

5.7

.94

5

13


Example 4.42.

% egrep '3+' datafile

% grep -E '3+' datafile

% grep '3\+' datafile

northwest             NW       Charles Main           3.0   .98    3     34

western               WE       Sharon Gray            5.3   .97    5     23

northeast             NE       AM Main Jr.            5.1   .94    3     13

central               CT       Ann Stephens           5.7   .94    5     13


EXPLANATION

Prints all lines containing one or more 3s.

Example 4.43.

% egrep '2\.?[0–9]' datafile

% grep -E '2\.?[0–9]' datafile

% grep '2\.\?[0–9]' datafile

western               WE       Sharon Gray            5.3   .97    5     23

southwest             SW       Lewis Dalsass          2.7   .8     2     18

eastern               EA       TB Savage              4.4   .84    5     20


EXPLANATION

Prints all lines containing a 2, followed by zero or one period, followed by a number in the range between 0 and 9.

Example 4.44.

% egrep '(no)+' datafile

% grep -E '(no)+' datafile

% grep '\(no\)\+' datafile

northwest             NW       Charles Main             3.0   .98    3     34

northeast             NE       AM Main Jr.              5.1   .94    3     13

north                 NO       Margot Weber             4.5   .89    5      9


EXPLANATION

Prints lines containing one or more occurrences of the pattern group no.

Example 4.45.

% grep -E '\w+\W+[ABC]' datafile

northwest             NW       Charles Main            3.0   .98    3     34

southern              SO       Suan Chin               5.1   .95    4     15

northeast             NE       AM Main Jr.             5.1   .94    3     13

central               CT       Ann Stephens            5.7   .94    5     13


EXPLANATION

Prints all lines containing one or more alphanumeric word characters (\w+), followed by one or more nonalphanumeric word characters (\W+), followed by one letter in the set ABC.

Example 4.46.

% egrep 'S(h|u)' datafile

% grep -E 'S(h|u)' datafile

% grep 'S\(h\|u\)' datafile

western               WE       Sharon Gray           5.3   .97    5     23

southern              SO       Suan Chin             5.1   .95    4     15


EXPLANATION

Prints all lines containing S, followed by either h or u; i.e., Sh or Su.

Example 4.47.

% egrep 'Sh|u' datafile

% grep -E 'Sh|u' datafile

% grep 'Sh\|u' datafile

western               WE       Sharon Gray            5.3   .97    5     23

southern              SO       Suan Chin              5.1   .95    4     15

southwest             SW       Lewis Dalsass          2.7   .8     2     18

southeast             SE       Patricia Hemenway      4.0   .7     4     17


EXPLANATION

Prints all lines containing the expression Sh or u.

4.9.2 Anomalies with Regular and Extended Variants of grep

The variants of GNU grep supported by Linux are almost, but not the same, as their UNIX namesakes. For example, the version of egrep, found in Solaris or BSD UNIX, does not support three metacharacter sets: \{ \}for repetition, \( \) for tagging characters, and \< \>, the word anchors. Under Linux, these metacharacters are available with grep and grep –E, but egrep does not recognize \< \>. The following examples illustrate these differences, just in case you are running bash or tcsh under a UNIX system other than Linux, and you want to use grep and its family in your shell scripts.

The examples in this section use the following datafile, repeated periodically for your convenience.

% cat datafile

northwest

NW

Charles Main

3.0

.98

3

34

western

WE

Sharon Gray

53

.97

5

23

southwest

SW

Lewis Dalsass

2.7

.8

2

18

southern

SO

Suan Chin

5.1

.95

4

15

southeast

SE

Patricia Hemenway

4.0

.7

4

17

eastern

EA

TB Savage

4.4

.84

5

20

northeast

NE

AM Main Jr.

5.1

.94

3

13

north

NO

Margot Weber

4.5

.89

5

9

central

CT

Ann Stephens

5.7

.94

5

13


Example 4.48.

(Linux GNU grep)

1   % grep '<north>' datafile    # Must use backslashes



2   % grep '\<north\>' datafile

    north             NO       Margot Weber            4.5   .89    5     9



3   % grep -E '\<north\>' datafile

    north             NO       Margot Weber            4.5   .89    5     9



4   % egrep '\<north\>' datafile

    north             NO       Margot Weber            4.5   .89    5     9



(Solaris egrep)

5   % egrep '\<north\>' datafile

    <no output; not recognized>


EXPLANATION

  1. No matter what variant of grep is being used, the word anchor metacharacters, < >, must be preceded by a backslash.

  2. This time, grep searches for a word that begins and ends with north. \< represents the beginning-of-word anchor and \> represents the end-of-word anchor.

  3. Grep with the –E option also recognizes the word anchors.

  4. The GNU form of egrep recognizes the word anchors.

  5. When using Solaris (SVR4), egrep does not recognize word anchors as regular expression metacharacters.

Example 4.49.

(Linux GNU grep)

1   % grep 'w(es)t.*\1' datafile

    grep: Invalid back reference



2   % grep 'w\(es\)t.*\1' datafile

    northwest         NW       Charles Main           3.0   .98    3     34



3   % grep -E 'w(es)t.*\1' datafile

    northwest         NW       Charles Main           3.0   .98    3     34



4   % egrep 'w(es)t.*\1' datafile

    northwest         NW       Charles Main           3.0   .98    3     34



(Solaris egrep)

5   % egrep 'w(es)t.*\1' datafile

    <no output; not recognized>


EXPLANATION

  1. When using regular grep, the ( ) extended metacharacters must be backslashed or an error occurs.

  2. If the regular expression, w\(es\)t, is matched, the pattern, es, is saved and stored in memory register 1. The expression reads: if west is found, tag and save es, search for any number of characters (.*) after it, followed by es (\1) again, and print the line. The es in Charles is matched by the backreference.

  3. This is the same as the previous example, except that grep with the –E switch does not precede the ( ) with backslashes.

  4. The GNU egrep also uses the extended metacharacters, ( ), without backslashes.

  5. With Solaris, egrep doesn't recognize any form of tagging and backreferencing.

% cat datafile

northwest

NW

Charles Main

3.0

.98

3

34

western

WE

Sharon Gray

5.3

.97

5

23

southwest

SW

Lewis Dalsass

2.7

.8

2

18

southern

SO

Suan Chin

5.1

.95

4

15

southeast

SE

Patricia Hemenway

4.0

.7

4

17

eastern

EA

TB Savage

4.4

.84

5

20

northeast

NE

AM Main Jr.

5.1

.94

3

13

north

NO

Margot Weber

4.5

.89

5

9

central

CT

Ann Stephens

5.7

.94

5

13


Example 4.50.

(Linux GNU grep)

1   % grep '\.[0-9]\{2\}[^0-9]' datafile

    northwest       NW       Charles Main    3.0     .98     3       34

    western         WE       Sharon Gray     5.3     .97     5       23

    southern        SO       Suan Chin       5.1     .95     4       15

    eastern         EA       TB Savage       4.4     .84     5       20

    northeast       NE       AM Main Jr.     5.1     .94     3       13

    north           NO       Margot Weber    4.5     .89     5        9

    central         CT       Ann Stephens    5.7     .94     5       13



2   % grep -E '\.[0-9]{2}[^0-9]' datafile

    northwest       NW       Charles Main    3.0     .98     3       34

    western         WE       Sharon Gray     5.3     .97     5       23

    southern        SO       Suan Chin       5.1     .95     4       15

    eastern         EA       TB Savage       4.4     .84     5       20

    northeast       NE       AM Main Jr.     5.1     .94     3       13

    north           NO       Margot Weber    4.5     .89     5        9

    central         CT       Ann Stephens    5.7     .94     5       13



3   % egrep  '\.[0-9]{2}[^0-9]' datafile

    northwest       NW       Charles Main    3.0     .98     3       34

    western         WE       Sharon Gray     5.3     .97     5       23

    southern        SO       Suan Chin       5.1     .95     4       15

    eastern         EA       TB Savage       4.4     .84     5       20

    northeast       NE       AM Main Jr.     5.1     .94     3       13

    north           NO       Margot Weber    4.5     .89     5        9

    central         CT       Ann Stephens    5.7     .94     5       13



    (Solaris egrep)

4   % egrep  '\.[0-9]{2}[^0-9]' datafile

     <no output; not recognized with or without backslashes>


EXPLANATION

  1. The extended metacharacters, { }, are used for repetition. The GNU and UNIX versions of regular grep do not evaluate this extended metacharacter set unless the curly braces are preceded by backslashes. The whole expression reads: search for a literal period \., followed by a number between 0 and 9, [0–9], if the pattern is repeated exactly two times, \{2\}, followed by a nondigit [^0–9].

  2. With extended grep, grep –E, the repetition metacharacters, {2}, do not need to be preceded with backslashes as in the previous example.

  3. Because GNU egrep and grep –E are functionally the same, this command produces the same output as the previous example.

  4. This is the standard UNIX version of egrep. It does not recognize the curly braces as an extended metacharacter set either with or without backslashes.

    Previous Section  < Day Day Up >  Next Section