Previous Section  < Free Open Study >  Next Section

8.1 Judging a Regex Package

The first thing most people look at when judging a regex package is the regex flavor itself, but there are other technical issues as well. On top of that, "political" issues like source code availability and licensing can be important. The next sections give an overview of some points of comparison you might use when selecting a regex package.

8.1.1 Technical Issues

Some of the technical issues to consider are:

  • Engine Type? Is the underlying engine an NFA or DFA? If an NFA, is it a POSIX NFA or a Traditional NFA? (see Chapter 4)

  • Rich Flavor? How full-featured is the flavor? How many of the items in see Section 3.4 are supported? Are they supported well? Some things are more important than others: lookaround and lazy quantifiers, for example, are more important than possessive quantifiers and atomic grouping, because lookar ound and lazy quantifiers can't be mimicked with other constructs, whereas possessive quantifiers and atomic grouping can be mimicked with lookahead that allows capturing parentheses.

  • Unicode Support? How well is Unicode supported? Java strings support Unicode intrinsically, but does figs/boxdr.jpg\wfigs/boxul.jpg know which Unicode characters are "word" characters? What about figs/boxdr.jpg\dfigs/boxul.jpg and figs/boxdr.jpg\sfigs/boxul.jpg ? Does figs/boxdr.jpg\bfigs/boxul.jpg understand Unicode? (Does its idea of a word character match figs/boxdr.jpg\wfigs/boxul.jpg 's idea of a word character?) Are Unicode properties supported? How about blocks? Scripts? (Section 3.4.2.5) Which version of Unicode's mappings do they support: Version 3.0? Version 3.1? Version 3.2? Does case-insensitive matching work properly with the full breadth of Unicode characters? For example, does a case-insensitive 'ß' really match 'SS'? (Even in lookbehind?)

  • How Flexible? How flexible are the mechanics? Can the regex engine deal only with String objects, or the whole breadth of CharSequence objects? Is it easy to use in a multi-threaded environment?

  • How Convenient? The raw engine may be powerful, but are there extra "convenience functions" that make it easy to do the common things without a lot of cumbersome overhead? Does it, borrowing a quote from Perl, "make the easy things easy, and the hard things possible?"

  • JRE Requirements? What version of the JRE does it require? Does it need the latest version, which many may not be using yet, or can it run on even an old (and perhaps more common) JRE?

  • Efficient? How efficient is it? The length of Chapter 6 tells you how much there is to be said on this subject. How many of the optimizations described there does it do? Is it efficient with memory, or does it bloat over time? Do you have any control over resource utilization? Does it employ lazy evaluation to avoiding computing results that are never actually used?

  • Does it Work? When it comes down to it, does the package work? Are there a few major bugs that are "deal-breakers?" Are there many little bugs that would drive you crazy as you uncover them? Or is it a bulletproof, rock-solid package that you can rely on?

Of course, this list just the tip of the iceberg—each of these bullet points could be expanded out to a full chapter on its own. We'll touch on them when comparing packages later in this chapter.

8.1.2 Social and Political Issues

Some of the non-technical issues to consider are:

  • Documented? Does it use Javadoc? Is the documentation complete? Correct? Approachable? Understandable?

  • Maintained? Is the package still being maintained? What's the turnaround time for bugs to be fixed? Do the maintainers really care about the package? Is it being enhanced?

  • Support and Popularity? Is there official support, or an active user community you can turn to for reliable support (and that you can provide support to, once you become skilled in its use)?

  • Ubiquity? Can you assume that the package is available everywhere you go, or do you have to include it whenever you distribute your programs?

  • Licensing? May you redistribute it when you distribute your programs? Are the terms of the license something you can live with? Is the source code available for inspection? May you redistribute modified versions of the source code? Must you?

Well, there are certainly a lot of questions. Although this book can give you the answers to some of them, it can't answer the most important question: which is right for you? I make some recommendations later in this chapter, but only you can decide which is best for you. So, to give you more background upon which to base your decision, let's look at one of the most basic aspects of a regex package: its object model.

    Previous Section  < Free Open Study >  Next Section