8.1 Judging a Regex Package
The first thing most people look at when judging a regex package is the regex flavor
itself, but there are other technical issues as well. On top of that, "political"
issues like source code availability and licensing can be important. The next sections
give an overview of some points of comparison you might use when selecting
a regex package.
8.1.1 Technical Issues
Some of the technical issues to consider are:
Engine Type? Is the underlying engine an NFA or DFA? If an NFA, is it a POSIX
NFA or a Traditional NFA? (see Chapter 4)
Rich Flavor? How full-featured is the flavor? How many of the items in
see Section 3.4 are supported? Are they supported well? Some things are more
important than others: lookaround and lazy quantifiers, for example, are more
important than possessive quantifiers and atomic grouping, because lookar
ound and lazy quantifiers can't be mimicked with other constructs, whereas
possessive quantifiers and atomic grouping can be mimicked with lookahead
that allows capturing parentheses.
Unicode Support? How well is Unicode supported? Java strings support Unicode
intrinsically, but does
\w
know which Unicode characters are "word" characters? What about
\d
and
\s
? Does
\b
understand Unicode? (Does its
idea of a word character match
\w
's idea of a word character?) Are Unicode
properties supported? How about blocks? Scripts? (Section 3.4.2.5) Which version of Unicode's mappings do they support: Version 3.0? Version 3.1? Version 3.2?
Does case-insensitive matching work properly with the full breadth of Unicode
characters? For example, does a case-insensitive 'ß' really match 'SS'?
(Even in lookbehind?)
How Flexible? How flexible are the mechanics? Can the regex engine deal
only with String objects, or the whole breadth of CharSequence objects? Is it
easy to use in a multi-threaded environment?
How Convenient? The raw engine may be powerful, but are there extra
"convenience functions" that make it easy to do the common things without a
lot of cumbersome overhead? Does it, borrowing a quote from Perl, "make the
easy things easy, and the hard things possible?"
JRE Requirements? What version of the JRE does it require? Does it need the
latest version, which many may not be using yet, or can it run on even an old
(and perhaps more common) JRE?
Efficient? How efficient is it? The length of Chapter 6 tells you how much
there is to be said on this subject. How many of the optimizations described
there does it do? Is it efficient with memory, or does it bloat over time? Do
you have any control over resource utilization? Does it employ lazy evaluation
to avoiding computing results that are never actually used?
Does it Work? When it comes down to it, does the package work? Are there
a few major bugs that are "deal-breakers?" Are there many little bugs that
would drive you crazy as you uncover them? Or is it a bulletproof, rock-solid
package that you can rely on?
Of course, this list just the tip of the iceberg—each of these bullet points could be
expanded out to a full chapter on its own. We'll touch on them when comparing
packages later in this chapter.
8.1.2 Social and Political Issues
Some of the non-technical issues to consider are:
Documented? Does it use Javadoc? Is the documentation complete? Correct?
Approachable? Understandable?
Maintained? Is the package still being maintained? What's the turnaround
time for bugs to be fixed? Do the maintainers really care about the package? Is
it being enhanced?
Support and Popularity? Is there official support, or an active user community
you can turn to for reliable support (and that you can provide support to,
once you become skilled in its use)?
Ubiquity? Can you assume that the package is available everywhere you go,
or do you have to include it whenever you distribute your programs?
Licensing?
May you redistribute it when you distribute your programs? Are
the terms of the license something you can live with? Is the source code available
for inspection? May you redistribute modified versions of the source
code? Must you?
Well, there are certainly a lot of questions. Although this book can give you the
answers to some of them, it can't answer the most important question: which is
right for you? I make some recommendations later in this chapter, but only you
can decide which is best for you. So, to give you more background upon which to
base your decision, let's look at one of the most basic aspects of a regex package:
its object model.
|