Previous Section  < Free Open Study >  Next Section

7.6 The Substitution Operator

Perl's substitution operator s/···/···/ extends a match to a full match-and-replace. The general form is:

     $text =~ s/regex/replacement/modifiers

In short, the text first matched by the regex operand is replaced by the value of the replacement operand. If the /g modifier is used, the regex is repeatedly applied to the text following the match, with additional matched text replaced as well.

As with the match operator, the target text operand and the connecting =~ are optional if the target is the variable $_. But unlike the match operator's m, the substitution's s is never optional.

We've seen that the match operator is fairly complex—how it works, and what it returns, is dependent upon the context it's called in, the target string's pos, and the modifiers used. In contrast, the substitution operator is simple: it always returns the same information (an indication of the number of substitutions done), and the modifiers that influence how it works are easy to understand.

You can use any of the core modifiers described in Section 7.2.3, but the substitution operator also supports two additional modifiers: /g and, described in a bit, /e.

7.6.1 The Replacement Operand

With the normal s/···/···/, the replacement operand immediately follows the regex operand, using a total of three instances of the delimiter rather than the two of m/···/. If the regex uses balanced delimiters (such as <···>), the replacement operand then has its own independent pair of delimiters (yielding a total of four). For example, s{···}{···} and s[···]/···/ and s<···>'···' are all valid. In such cases, the two sets may be separated by whitespace, and if so, by comments as well. Balanced delimiters are commonly used with /x or /e:

$text =~ s{

  ...some big regex here, with lots of comments and such...

} {

   ...a Perl code snippet to be evaluated to produce the replacement text...

}ex;

Take care to separate in your mind the regex and replacement operands. The regex operand is parsed in a special regex-specific way, with its own set of special delimiters (see Section 7.2.1.2). The replacement operand is parsed and evaluated as a normal double-quoted string. The evaluation happens after the match (and with /g, after each match), so $1 and the like are available to refer to the proper match slice.

There are two situations where the replacement operand is not parsed as a double-quoted string:

  • When the replacement operand's delimiters are single quotes, it is parsed as a single-quoted string, which means that no variable interpolation is done.

  • If the /e modifier (discussed in the next section) is used, the replacement operand is parsed like a little Perl script instead of like a double-quoted string. The little Perl script is executed after each match, with its result being used as the replacement.

7.6.2 The /e Modifier

The /e modifier causes the replacement operand to be evaluated as a Perl code snippet, as if with eval {···} . The code snippet's syntax is checked to ensure it's valid Perl when the script is loaded, but the code is evaluated afresh after each match. After each match, the replacement operand is evaluated in a scalar context, and the result of the code is used as the replacement. Here's a simple example:

     $text =~ s/-time-/localtime/ge;

This replaces occurrences of figs/boxdr.jpg-time-figs/boxul.jpg with the results of calling Perl's localtime function in a scalar context (which returns a textual representation of the current time, such as "Wed Sep 25 18:36:51 2002").

Since the evaluation is done after each match, you can refer to the text just matched with the after-match variables like $1. For example, special characters that might not otherwise be allowed in a URL can be encoded using % followed by their two-digit hexadecimal representation. To encode all non-alphanumerics this way, you can use

     $url =~ s/([^a-zA-Z0-9])/sprintf('%%%02x', ord($1))/ge;

and to decode back to the original, you can use:

     $url =~ s/%([0-9a-f][0-9a-f])/pack("C", hex($1))/ige;

In short, sprintf('%%%02x', ord( character )) converts characters to their numeric URL representation, while pack("C", value ) does the opposite; consult your favorite Perl documentation for more information.

7.6.2.1 Multiple uses of /e

Normally, repeating a modifier with an operator doesn't hurt (except perhaps to confuse the reader), but repeating the /e modifier actually changes how the replacement is done. Normally, the replacement operator is evaluated once, but if more than one 'e' is given, the results of the evaluation are themselves evaluated as Perl, over and over, for as many extra 'e' as are provided. This is perhaps useful mostly for an Obfuscated Perl Contest.

Still, it can be useful. Consider interpolating variables into a string manually (such as if the string is read from a configuration file). That is, you have a string that looks like '··· $var ···' and you want to replace the substring '$var' with the value of the variable $var.

A simple approach uses:

     $data =~ s/(\$[a-zA-Z_]\w*)/$1/eeg;

Without any /e, this would simply replace the matched '$var' with itself, which is not very useful. With one /e, it evaluates the code $1, yielding '$var', which again, effectively replaces the matched text with itself (which is again, not very useful). But with two /e, that '$var' is itself evaluated, yielding its contents. Thus, this mimics the interpolation of variables.

7.6.3 Context and Return Value

Recall that the match operator returns different values based upon the particular combination of context and /g. The substitution operator, however, has none of these complexities—it always returns either the number of substitutions performed or, if none were done, an empty string.

Conveniently, when interpreted as a Boolean (such as for the conditional of an if), the return value is taken as true if any substitutions are done, false if not.

    Previous Section  < Free Open Study >  Next Section