4.8 Quiz Answers
4.7.1 Quiz Answer
Answer to the question in Section 4.2.2
.
Remember, the regex is tried completely each time, so
fat|cat|belly|your
matches 'The dragging belly indicates your cat is too fat' rather than fat, even though
fat
is listed first among the alternatives.
Sure, the regex could conceivably match fat and the other alternatives, but
since they are not the earliest possible match (the match starting furthest to the left), they are not the one chosen. The entire regex is attempted completely
from one spot before moving along the string to try again from the
next spot, and in this case that means trying each alternative
fat
,
cat
,
belly
, and
your
at each position before moving on.
4.7.2 Quiz Answer
Answer to the question in Section 4.2.4.3
.
When
^.*([0-9]+)
is applied to 'Copyright 2003.', what is captured by
the parentheses?
The desire is to get the last whole number, but it doesn't work. As before,
.*
is forced to relinquish some of what it had matched because the subsequent
[0-9]+
requires a match to be successful. In this example, that means
unmatching the final period and '3', which then allows
[0-9]
to match.
That's governed by
+
, so matching just once fulfills its minimum, and now
facing '.' in the string, it finds nothing else to match.
Unlike before, though, there's then nothing further that must match, so
.*
is
not forced to give up the 0 or any other digits it might have matched. Were
.*
to do so, the
[0-9]+
would certainly be a grateful and greedy recipient,
but nope, first come first served. Greedy constructs give up something
they've matched only when forced. In the end, $1 gets only '3'.
If this feels counter-intuitive, realize that
[0-9]+
is at most one match away
from
[0-9]*
, which is in the same league as
.*
. Substituting that into
^.*([0-9]+)
, we get
^.*(.*)
as our regex, which looks suspiciously like
the
^Subject:•(.*).*
example from Section 4.2.4.2, where the second
.*
was guaranteed to match nothing.
4.7.3 Quiz Answer
Answer to the question in Section 4.4.4.1
.
When matching
[0-9]*
against 'a•1234•num', would 'a• 1234•num' be part of
a saved state?
The answer is "no." I posed this question because the mistake is commonly
made. Remember, a component that has star applied can always match. If
that's the entire regex, it can always match anywhere. This certainly includes the attempt when the transmission applies the engine the first time, at the
start of the string. In this case, the regex matches at '
a•1234•num' and that's the end of it—it never even gets as far the digits.
In case you missed this, there's still a chance for partial credit. Had there
been something in the regex after the
[0-9]*]
that kept an overall match
from happening before the engine got to:
at 'a• 1234···' | matching
[0-9]*···.
|
then indeed, the attempt of the '1' also creates the state:
at 'a• 1234···' | matching
[0-9]* ···.
|
4.7.4 Quiz Answer
Answer to the question in Section 4.5.6.1.1
.
What does
(?>.*?) ···.
match?
It can never match, anything. At best, it's a fairly complex way to accomplish
nothing!
*?
is the lazy
*
, and governs a dot, so the first path it attempts is
the skip-the-dot path, saving the try-the-dot state for later, if required. But the
moment that state has been saved, it's thrown away because matching exits
the atomic grouping, so the skip-the-dot path is the only one ever taken. If
something is always skipped, it's as if it's not there at all.
|