2.4 Quiz Answers

2.4.1 Quiz Answer

Answer to the question in Section 2.2.3.

How do [•]* and •*|* compare?

(•*|*) allows either •* or * to match, which allows either some spaces (or nothing) or some tabs (or nothing). It doesn't, however, allow a combination of spaces and tabs.

On the other hand, [•]* matches [•] any number of times. With a string such as '••' it matches three times, a tab the first time and spaces the rest.

[•]* is logically equivalent to (•|)*, although for reasons shown in Chapter 4, a character class is often much more efficient.

2.4.2 Quiz Answer

Answer to the question in Section 2.3.

Just what does $var =~ s/\bJeff\b/Jeff/i do?

It might be tricky because of the way I posed it. Had I used \bJEFF\b or \bjeff\b or perhaps \bjEfF\b as the regex, the intent might have been more obvious. Because of /i, the word "Jeff" will be found without regard to capitalization. It will then be replaced by 'Jeff', which has exactly the capitalization you see. (/i has no effect on the replacement text, although there are other modifiers examined in Chapter 7 that do.)

The end result is that "jeff", in any capitalization, is replaced by exactly 'Jeff'.

2.4.3 Quiz Answer

Answer to the question in Section 2.3.5.2.

What does s/(?=s\b)(?<=\bJeff)/'/g do?

In this case, it doesn't matter which order (?=s\b) and (?<=\bJeff) are arranged. Whether "checking on the right, then the left" or the other way around, the key is that both checks must succeed at the same position for the combination of the two checks to succeed. For example, in the string 'Thomas•Jefferson', both (?=s\b) and (?<=\bJeff) can match (at the two locations marked), but since there is no one position where both can be successful, the combination of the two cannot match.

It's fine for now to use the somewhat vague phrase "combination of the two" to talk about this, as the meaning is fairly intuitive in this case. There are times, however, when exactly how a regex engine goes about applying a regex may not necessarily be quite so intuitive. Since how it works has immediate practical effects on what our regular expressions really mean, Chapter 4 discusses this in explicit detail.

2.4.4 Quiz Answer

Answer to the question in Section 2.3.5.3.

Which "Jeffs" solutions would preserve case when applied with /i?

To preserve case, you've got to either replace the exact characters consumed (rather than just always inserting 'Jeff's'), or not consume any letters. The second solution listed in Table 2-1 takes the first approach, capturing what is consumed and using $1 and $2 to put it back. The last two solutions in the table take the "don't consume anything" approach. Since they don't consume text, they have nothing to preserve.

The first and third solutions hard-code the replacement string. If applied with /i, they don't preserve case. They end up incorrectly replacing JEFFS with Jeff's and JEFF's, respectively.

2.4.5 Quiz Answer

Answer to the question in Section 2.3.6.

Does $text =~ s/(\d)((\d\d\d)+\b)/$1,$2/g "commaify" a number?

This won't work the way we want. It leaves results such as "281,421906." This is because the digits matched by (\d\d\d)+ are now actually part of the final match, and so are not left "unmatched" and available to the next iteration of the regex via the /g.

When one iteration ends, the next picks up the inspection of the text at the point where the previous match ended. We'd like that to be the point where the comma was inserted so we can go ahead and check to see whether additional commas need to be inserted later in the same number. But, in this case, that restarting point is at the end of all the digits. The whole point of using lookahead was to get the positional check without actually having the inspected text check count toward the final "string that matched."

Actually, this expression can still be used to solve this problem. If the expression is applied repeatedly by the host language, such as via a while loop, the newly-modified text is completely revisited each time. With each such application, one more comma is added (to each number in the target string, due to the /g modifier). Here's an example:


while ( $text =~ s/(\d)((\d\d\d)+\b)/$1,$2/g ) {

   # Nothing to do inside the body of the while -- we merely want to reapply the regex until it fails

}

< Free Open Study >