5.1 Regex Balancing Act
Writing a good regex involves striking a balance among several concerns:
Matching what you want, but only what you want Keeping the regex manageable and understandable For an NFA, being efficient (creating a regex that leads the engine quickly to a
match or a non-match, as the case may be)
These concerns are often context-dependent. If I'm working on the command line
and just want to grep something quickly, I probably don't care if I match a bit
more than I need, and I won't usually be too concerned to craft just the right
regex for it. I'll allow myself to be sloppy in the interest of time, since I can
quickly peruse the output for what I want. However, when I'm working on an
important program, it's worth the time and effort to get it right: a complex regular
expression is okay if that's what it takes. There is a balance among all these issues.
Efficiency is context-dependent, even in a program. For example, with an NFA,
something long like
^-(display|geometry|cemap|···|quick24|random|raw)$
to check command-line arguments is inefficient because of all that alternation, but
since it is only checking command-line arguments (something done perhaps a few
times at the start of the program) it wouldn't matter if it took 100 times longer than
needed. It's just not an important place to worry much about efficiency. Were it
used to check each line of a potentially large file, the inefficiency would penalize
you for the duration of the program.
|