Team LiB
Previous Section Next Section

The Building Blocks of Regular Expressions

Regular expressions in Java 2 Standard Edition (J2SE) consist of two essential parts, which are embodied by two new Java objects. The first part is a Pattern, and the second is a Matcher. Understanding these two objects is crucial to your ability to master regular expressions. Fortunately, they're easy concepts to understand.

I define these concepts in detail in the sections that follow, but at a general level, a pattern describes what you're searching for, and a matcher examines candidates that might match the pattern or description. For example, \s+ is a pattern describing one or spaces. Correspondingly, J2SE now provides the Pattern and Matcher objects.

Note 

When I refer to a candidate or a candidate string, I mean the string that the regex will be acting on. Thus, for the pattern described in the preceding section, a candidate string might be <coach@influxs.com>, <john_john_smith@w3c.org>, or <hana@saez.com>.

Defining Patterns

Patterns are the actual descriptions used in regular expressions. Their power stems from their capability to describe text, as opposed to specifying it. They're an important part of the regex vernacular, and you need to understand them well to use regular expressions. Fortunately, they're easy to grasp if you refuse to be intimidated, and their somewhat off-putting syntax soon becomes intuitive.

A pattern allows you to describe the characteristics of the item you're looking for, without specifying the item explicitly. This can be especially helpful when you only know the traits of your targets, but you're unable to name them specifically.

Imagine parsing a document. You might want to find every capitalized word; or every word beginning with the letter Z; or every word beginning with a capital Z, followed by a vowel, unless that vowel is an a. You can't know beforehand exactly what those words will be for a given document, but you can describe them. That description is your pattern.

I think of regular expressions as a police station. A pattern is the officer who takes a description of the suspects, and a matcher is the officer that rounds up and interrogates those suspects.

Defining Matchers

If you're familiar with Standard Query Language (SQL), it might help you to think of regular expressions as a sort of SQL for examining free-flowing text. A pattern is conceptually similar to the SQL query that's executed. A matcher corresponds to the ResultSet returned by that query.

A Matcher examines the results of applying a Pattern. If your pattern said, "Find every word starting with a in the previous sentence," then you would examine the Matcher after applying your pattern. Your code might look like Listing 1-1. The output for Listing 1-1 in shown in Output 1-1, which follows the listing.

Listing 1-1: Finding Every Occurrence of the Letter A
Start example
import java.util.regex.*;

public class FindA{
  public static void main(String args[])
  throws Exception{

    String candidate =
     "A Matcher examines the results of applying a pattern.";

    //define the matching pattern as a
    //word boundary, a lowercase a, any
    //number of immediately trailing letters
    //numbers, or underscores, followed by
    //a word boundary
    String regex = "\\ba\\w*\\b";
    Pattern p = Pattern.compile(regex);

   //extract the Matcher for the String text
    Matcher m = p.matcher(candidate);
    String val=null;
    //display the original input string
    System.out.println("INPUT: " + candidate);

    //display the search pattern
    System.out.println("REGEX: " + regex +"\r\n");

   //examine the Matcher, and extract all
   //words starting with a lowercase a
    while (m.find())
    {
      val = m.group();
      System.out.println("MATCH: " + val);
    }

    //if there were no matches, say so
    if (val == null) {
      System.out.println("NO MATCHES: ");
    }
  }
}
End example
Output 1-1: Result of Running FindA
Start example
INPUT: A Matcher examines the results of applying a pattern.
REGEX: \ba\w*\b

MATCH: applying
MATCH: a
End example

Again, it's not necessary that you be able to follow the code given in detail right now. I just want to establish a general sense of how things are done in J2SE regex. First, I define my Pattern:

   Pattern p = Pattern.compile(regex);

Then, I feed my candidate string to the Pattern and extract a Matcher:

   Matcher m = p.matcher(candidate);

Finally, I interrogate my Matcher:

   while (m.find()) {.}


Team LiB
Previous Section Next Section