Previous Section  < Free Open Study >  Next Section

5.1 Regex Balancing Act

Writing a good regex involves striking a balance among several concerns:

  • Matching what you want, but only what you want

  • Keeping the regex manageable and understandable

  • For an NFA, being efficient (creating a regex that leads the engine quickly to a match or a non-match, as the case may be)

These concerns are often context-dependent. If I'm working on the command line and just want to grep something quickly, I probably don't care if I match a bit more than I need, and I won't usually be too concerned to craft just the right regex for it. I'll allow myself to be sloppy in the interest of time, since I can quickly peruse the output for what I want. However, when I'm working on an important program, it's worth the time and effort to get it right: a complex regular expression is okay if that's what it takes. There is a balance among all these issues.

Efficiency is context-dependent, even in a program. For example, with an NFA, something long like figs/boxdr.jpg^-(display|geometry|cemap|···|quick24|random|raw)$figs/boxul.jpg to check command-line arguments is inefficient because of all that alternation, but since it is only checking command-line arguments (something done perhaps a few times at the start of the program) it wouldn't matter if it took 100 times longer than needed. It's just not an important place to worry much about efficiency. Were it used to check each line of a potentially large file, the inefficiency would penalize you for the duration of the program.

    Previous Section  < Free Open Study >  Next Section