Previous Section  < Free Open Study >  Next Section

8.2 Object Models

When looking at different regex packages in Java (or in any object-oriented language, for that matter), it's amazing to see how many different object models are used to achieve essentially the same result. An object model is the set of class structures through which regex functionality is provided, and can be as simple as one object of one class that's used for everything, or as complex as having separate classes and objects for each sub-step along the way. There is not an object model that stands out as the clear, obvious choice for every situation, so a lot of variety has evolved.

8.2.1 A Few Abstract Object Models

Stepping back a bit now to think about object models helps prepare you to more readily grasp an unfamiliar package's model. This section presents several representative object models to give you a feel for the possibilities without getting mired in the details of an actual implementation.

Starting with the most abstract view, here are some tasks that need to be done in using a regular expression:


Setup. . .

[1]Accept a string as a regex; compile to an internal form.

[2]Associate the regex with the target text.


Actually apply the regex . . .

[3]Initiate a match attempt.


See the results . . .

[4]Learn whether the match is successful.

[5]Gain access to further details of a successful attempt.

[6]Query those details (what matched, where it matched, etc.).

These are the steps for just one match attempt; you might repeat them from [3] to find the next match in the target string.

Now, let's look at a few potential object models from among the infinite variety that one might conjure up. In doing so, we'll look at how they deal with matching figs/boxdr.jpg\s+(\d+)figs/boxul.jpg to the string 'May•16,•1998' to find out that '• 16' is matched overall, and '16' matched within the first set of parentheses (within "group one"). Remember, the goal here is to merely get a general feel for some of the issues at hand— we'll see specifics soon.

8.2.1.1 An "all-in-one" model

In this conceptual model, each regular expression becomes an object that you then use for everything. It's shown visually in Figure 8-1 below, and in pseudocode here, as it processes all matches in a string:


DoEverythingObj myRegex = new DoEverythingObj("\\s+(\\d+)"); // [1]

      . 

      .

      .

while (myRegex.findMatch("May 16, 1998")) { // [2], [3], [4]

    String matched = myRegex.getMatchedText(); // [6]

    String num = myRegex.group(1); // [6]

      .

      .

      .

}

As with most models in practice, the compilation of the regex is a separate step, so it can be done ahead of time (perhaps at program startup), and used later, at which point most of the steps are combined together, or are implicit. A twist on this might be to clone the object after a match, in case the results need to be saved for a while.

Figure 1. An "all-in-one" model
figs/mre2_0801.jpg
8.2.1.2 A "match state" model

This conceptual model uses two objects, a "Pattern" and a "Matcher." The Pattern object represents a compiled regular expression, while the Matcher object has all of the state associated with applying a Pattern object to a particular string. It's shown visually in Figure 8-2 below, and its use might be described as: "Convert a regex string to a Pattern object. Give a target string to the Pattern object to get a Matcher object that combines the two. Then, instruct the Matcher to find a match, and query the Matcher about the result." Here it is in pseudo-code:


     PatternObj myPattern = new PatternObj("\\s+(\\d+)"); // [1]

           .

           .

           .

     MatcherObj myMatcher = myPattern.MakeMatcherObj("May 16, 1998"); // [2]

     while (myMatcher.findMatch()) { // [3], [4]

        String matched = myMatcher.getMatchedText(); // [6]

        String num     = myMatcher.Group(1); // [6]

           .

           .

           .

     }

This might be considered conceptually cleaner, since the compiled regex is in an immutable (unchangeable) object, and all state is in a separate object. However, It's not necessarily clear that the conceptual cleanliness translates to any practical benefit. One twist on this is to allow the Matcher to be reset with a new target string, to avoid having to make a new Matcher with each string checked.

Figure 2. A "match state" model
figs/mre2_0802.jpg
8.2.1.3 A "match result" model

This conceptual model is similar to the "all-in-one" model, except that the result of a match attempt is not a Boolean, but rather a Result object, which you can then query for the specifics on the match. It's shown visually in Figure 8-3 below, and might be described as: "Convert a regex string to a Pattern object. Give it a target string and receive a Result object upon success. You can then query the Result object for specific." Here's one way it might be expressed it in pseudo-code:


     PatternObj myPattern = new PatternObj("\\s+(\\d+)"); // [1]

          .

          .

          .

     ResultObj myResult = myPattern.findFirst("May 16, 1998"); // [2], [3], [5]

     while (myResult.wasSuccessful()) { // [4]

        String matched = myResult.getMatchedText(); // [6]

        String num = myResult.Group(1); // [6]

          .

          .

          .

     myResult = myPattern.findNext(); [3], [5]

     }

This compartmentalizes the results of a match, which might be convenient at times, but results in extra overhead when only a simple true/false result is desired. One twist on this is to have the Pattern object return null upon failure, to save the overhead of creating a Result object that just says "no match."

Figure 3. A "match result" model
figs/mre2_0803.jpg

8.2.2 Growing Complexity

These conceptual models are just the tip of the iceberg, but give you a feel for some of the differences you'll run into. They cover only simple matches — when you bring in search-and-replace, or perhaps string splitting (splitting a string into substrings separated by matches of a regex), it can become much more complex.

Thinking about search-and-replace, for example, the first thought may well be that it's a fairly simple task, and indeed, a simple "replace this with that" interface is easy to design. But what if the "that" needs to depend on what's matched by the "this," as we did many times in examples in Chapter 2 (see Section 2.3.6). Or what if you need to execute code upon every match, using the resulting text as the replacement? These, and other practical needs, quickly complicate things, which further increases the variety among the packages.

    Previous Section  < Free Open Study >  Next Section