Team LiB
Previous Section Next Section

The Matcher Object

Figure 2-2 illustrates the methods of the Matcher class. Please take a moment to study them.

Click To expand
Figure 2-2: The Matcher class

The following sections describe the various methods of the Matcher class. But first, let's briefly revisit the concept of groups, as they figure so prominently in the Matcher object.

Groups

Before you can take full advantage of the Matcher object, it's important that you understand the concept of a group, as some of the more powerful methods of Matcher deal with them. I discuss groups in even greater detail in Chapter 3, but you need an intuitive sense of them to take full advantage of the material in this chapter, so I provide a brief introduction here.

A group is exactly what it sounds like: a cluster of characters. Often, the term refers to a subportion of the original Pattern, though each group is, by definition, a subgroup of itself. You're probably already familiar with the concept of groups from your study of arithmetic. For example, the expression

6 * 7 + 4

has an implicit sense of grouping. You really read it as

 (6 * 7) + 4

where (6 * 7) is thought of as a clustering of numbers. Further, you can think of the expression as

( (6 * 7) + 4)

where you can consider ((6 * 7) + 4) another clustering of numbers, this one including the subcluster (6*7). Here, your group has a subgroup. Similarly, regex allows you to group a sequence of characters together. Why? I discuss that shortly. First, let's concentrate on how.

Remember that in regular expressions, you describe what you're looking for in general terms by using a Pattern object. Groups allow you to nest subdescriptions within your expression. As you examine a specific candidate String, the Matcher can keep track of submatches for that expression.

Creating a grouping of regex characters is very easy. You simply put the expression you want to think of as a group inside a pair of parentheses. That's it. Thus, the pattern (\w)(\d\d)(\w+) consists of four groups, ranging from 0 to 3. group(0), which is always the original expression itself, is as follows:

group(1), which consists of an alphanumeric or underscore character, is circled in the following image:

group(2) is circled in the following image:

group(3) is circled in the following image:

For a specific candidate String, say X99SuperJava, group(0) is always the part of the candidate string that matches the original regex pattern—namely, the pattern (\w)(\d\d)(\w+) itself:

The following image indicates the corresponding section of X99SuperJava for group(1):

The corresponding section of X99SuperJava for group(2) is circled in the following image:

The corresponding section of X99SuperJava for group(3) is circled in the following image:

OK, so you know how to designate groups and how to find the corresponding section in a candidate string. Now, why would you? A common reason for doing so is the ability to refer to subsections of the candidate string. For example, you may not know what this particular candidate string, namely X99SuperJava, is, but you can still write a program that rearranges it by creating a new String equal to group(3), appended to group(1), and appended to group(2). In this case, that rearranged String would be SuperJavaX99.

Chapter 3 provides detailed examples of groups.

public Pattern pattern()

The pattern method returns the Pattern that created this particular Matcher object. Consider Listing 2-6.

Listing 2-6: Matcher Pattern Example
Start example
import java.util.regex.*;

public class MatcherPatternExample{
  public static void main(String args[]){
      test();
  }

  public static void test(){
     Pattern p = Pattern.compile("\\d");
     Matcher m1 = p.matcher("55");
     Matcher m2 = p.matcher("fdshfdgdfh");

     System.out.println(m1.pattern() == m2.pattern());
     //return true
  }
}
End example

You should notice a few important things here. First, both Matcher objects successfully returned a Pattern, even though m2 wasn't a successful match. Second, the Matcher objects returned exactly the same Pattern object, because they were both created by that Pattern. Notice that the line

   System.out.println(m1.pattern() == m2.pattern());

did a == compare and not a .equals compare. This could only have worked if the actual object returned by m1 and m2 was, in fact, exactly the same object.

public Matcher reset()

The reset method clears all state information from the Matcher object it's called on. The Matcher is, in effect, reverted to the state it originally had when you first received a reference to it, as shown in Listing 2-7.

Listing 2-7: Matcher.reset Example
Start example
import java.util.regex.*;
/**
 * Demonstrates the usage of the
 * Matcher.reset() method
 */
public class MatcherResetExample{
  public static void main(String args[]){
      test();
  }
  public static void test(){
     //create a pattern, and extract a matcher
     Pattern p = Pattern.compile("\\d");
     Matcher m1 = p.matcher("01234");

     //exhaust the matcher
     while (m1.find()){
      System.out.println("\t\t" + m1.group());
     }
     //now reset the matcher to its original state
     m1.reset();
     System.out.println("After resetting the Matcher");
     //iterate through the matcher again.
     //this would not be possible without a cleared state
     while (m1.find()){
      System.out.println("\t\t" + m1.group());
     }
  }
}
End example

Output 2-2 shows the output of this method.

Output 2-2: Output for the Matcher.reset Example
Start example
        0
        1
        2
        3
        4
After resetting the Matcher
        0
        1
        2
        3
        4
End example

You wouldn't have been able to iterate through the elements of the Matcher again if it hadn't been reset.

public Matcher reset(CharSequence input)

The reset(CharSequence input) methods clears the state of the Matcher object it's called on and replaces the candidate String with the new input. This has the same effect as creating a new Matcher object, except that it doesn't have as much of the associated overhead. This can lead to useful optimization, and it's one that I often use. Listing 2-8 demonstrates this method's usage.

Listing 2-8: Matcher.reset(CharSequence) Example
Start example
import java.util.regex.*;
/**
 * Demonstrates the usage of the
 * Matcher.reset(CharSequence) method
 */
public class MatcherResetCharSequenceExample{
  public static void main(String args[]){
      test();
  }
  public static void test(){
     String output="";
     //create a pattern, and extract a matcher
     Pattern p = Pattern.compile("\\d");
     Matcher m1 = p.matcher("01234");

     //exhaust the matcher
     while (m1.find()){
      System.out.println("\t\t" + m1.group());
     }
     //now reset the matcher with new data
     m1.reset("56789");
     System.out.println("After resetting the Matcher");
     //iterate through the matcher again.
     //this would not be possible without
     while (m1.find()){
      System.out.println("\t\t" + m1.group());
     }
  }
}
End example

Output 2-3 shows the output of this method.

Output 2-3: Output for the Matcher.reset(CharSequence) Example
Start example
        0
        1
        2
        3
        4
After resetting the Matcher
        5
        6
        7
        8
        9
End example

public int start()

The start method returns the starting index of the last successful match the Matcher object had. Listing 2-9 demonstrates the use of the Start method. The code in this listing finds the starting index of the word Bond in the candidate My name is Bond. James Bond..

Listing 2-9: Matcher.start() Example
Start example
/**
 * Demonstrates the usage of the
 * Matcher.start() method
 */
public class MatcherStartExample{
  public static void main(String args[]){
      test();
  }
  public static void test(){
     //create a Matcher and use the Matcher.start() method
     String candidateString = "My name is Bond. James Bond.";
     String matchHelper[] =
      {"          ^","                      ^"};
     Pattern p = Pattern.compile("Bond");
     Matcher matcher = p.matcher(candidateString);

     //Find the starting point of the first 'Bond'
      matcher.find();
      int startIndex = matcher.start();
      System.out.println(candidateString);
      System.out.println(matchHelper[0] + startIndex);

     //Find the starting point of the second 'Bond'
      matcher.find();
      int nextIndex = matcher.start();
      System.out.println(candidateString);
      System.out.println(matchHelper[1] + nextIndex);
}
End example

Output 2-4 shows the output of running the start() method.

Output 2-4: Output for the Matcher.start() Example
Start example
My name is Bond. James Bond.
          ^11
My name is Bond. James Bond.
                      ^23
End example

If you execute another find() method

matcher.find();

and then execute start()

int nonIndex = matcher.start(); //throws IllegalStateException

the start() method will throw an IllegalStateException. I'm surprised that it doesn't simply return a negative number to indicate an unsuccessful match. Use the boolean returned by the matches() method to determine whether you should call methods such as start().

public int start(int group)

This method allows you to specify which subgroup within a match you're interested in. If there are no matches, or if no matches have been attempted, this method throws an IllegalStateException. Listing 2-10 demonstrates the use of the start(int) method shortly. But before examining the code, let's take a step back and consider what the code is actually trying to demonstrate.

Listing 2-10: Matcher.start(int) Example
Start example
import java.util.regex.*;
/**
 * Demonstrates the usage of the
 * Matcher.start(int) method
 */
public class MatcherStartParamExample{
  public static void main(String args[]){
      test();
  }
  public static void test(){
     //create a Pattern
      Pattern p = Pattern.compile("B(ond)");

     //create a Matcher and use the Matcher.start(int) method
     String candidateString = "My name is Bond. James Bond.";
     //create a helpful index for the sake of output
     String matchHelper[] =
                             {"          ^",
                              "           ^",
                              "                      ^",
                              "                       ^"};
     Matcher matcher = p.matcher(candidateString);
     //Find the starting point of the first 'B(ond)'
      matcher.find();
      int startIndex = matcher.start(0);
      System.out.println(candidateString);
      System.out.println(matchHelper[0] + startIndex);

      //find the starting point of the first subgroup (ond)
      int nextIndex = matcher.start(1);
      System.out.println(candidateString);
      System.out.println(matchHelper[1] + nextIndex);

     //Find the starting point of the second 'B(ond)'
      matcher.find();
      startIndex = matcher.start(0);
      System.out.println(candidateString);
      System.out.println(matchHelper[2] + startIndex);
      //find the starting point of the second subgroup (ond)
      nextIndex = matcher.start(1);
      System.out.println(candidateString);
      System.out.println(matchHelper[3] + nextIndex);
  }
}
End example

In the following example, the regex pattern is B(ond), which means that you have a subgroup within the pattern (the parentheses indicate a subgroup). The following is the portion of the candidate parsed when find() is called for the first time:

Thus, when you call the start(0) method, you're implicitly calling it only for the region that has already been parsed, which is outlined in the box. As far as the Matcher is concerned, this boxed region is the only one we can currently discuss. This is simply the nature of the find method, and it has nothing to do with the start(int) method yet.

The start(0) method returns the index of the first character in group(0), which is the B in Bond. group(0) is circled in the following image.

Similarly, when you call start(1), you're calling it only for the region that has already been parsed—again, the boxed region in the preceding image. This time, you're asking for the second grouping in the parsed region. The start(1) method returns the index of the first character in group(1), which is the o in Bond. group(1) is circled in the following image:

Next, you call matcher.find() again, which results in a new region of the candidate string coming under consideration, as shown in the following image:

Calling the start(0) method here implicitly calls it only for the new region that has already been parsed, which appears in the box in the preceding image. This is the only region the associated Matcher will consider. start(0) returns the index of first character in group(0), which is the B in Bond. group(0) is circled in the following image:

Again, calling start(1) asks the Matcher to consider only the new region that has been parsed—again, the boxed region. This time, you're asking for the second grouping in the parsed region. start(1) returns the index of first character in group(1), which is the o in Bond. group(1) is circled in the boxed region.

When you consider the process visually, it's easy to understand how the start(int) method interacts with groups, group numbers, and the find() method. find() parses just enough of the candidate string for all groups to match and works in that limited region. Keep this in mind as you read through Listing 2-10. Listing 2-10 is a fully working example of the algorithm discussed in this section. Please refer back to the preceding images as necessary when you read the example.

Output 2-5 shows the output of running the start() method.

Output 2-5: Output for the Matcher.start(int) Example
Start example
My name is Bond. James Bond.
          ^11
My name is Bond. James Bond.
          ^12
My name is Bond. James Bond.
                      ^23
My name is Bond. James Bond.
                       ^24
End example

If you execute another find() method

matcher.find();

and then execute start()

int nonIndex = matcher.start(0); //throws IllegalStateException

the start(int) method will throw an IllegalStateException because the find() method wasn't successful. Similarly, it will throw an IndexOutOfBoundsException if you try to refer to a group number that doesn't exist.

public int end()

The end method returns the ending index of the last successful match the Matcher object had plus 1. If no matches exist, or if no matches have been attempted, this method throws an IllegalStateException. Listing 2-11 demonstrates the use of the end method.

Listing 2-11: Matcher.end() Example
Start example
/**
 * Demonstrates the usage of the
 * Matcher.end() method
 */
public class MatcherEndExample{
  public static void main(String args[]){
      test();
  }
  public static void test(){
     //create a Matcher and use the Matcher.end() method
     String candidateString = "My name is Bond. James Bond.";
     String matchHelper[] =
      {"               ^","                           ^"};
     Pattern p = Pattern.compile("Bond");
     Matcher matcher = p.matcher(candidateString);

     //Find the end point of the first 'Bond'
      matcher.find();
      int endIndex= matcher.end();
      System.out.println(candidateString);
      System.out.println(matchHelper[0] + endIndex);

     //Find the end point of the second 'Bond'
      matcher.find();
      int nextIndex = matcher.end();
      System.out.println(candidateString);
      System.out.println(matchHelper[1] + nextIndex);
  }
}
End example

Output 2-6 shows the output of running the end method.

Output 2-6: Output for the Matcher.end() Example
Start example
My name is Bond. James Bond.
              ^15
My name is Bond. James Bond.
                          ^27
End example

If you execute another find method

matcher.find();

and then execute end

int nonIndex = matcher.end(); //throws IllegalStateException

the end method will throw an IllegalStateException, because there isn't a valid group to find the end of.

public int end(int group)

Like the start(int) method, this method allows you to specify which subgroup within a matching you're interested in. It returns the last index of the matching character sequence plus 1. Listing 2-12 demonstrates the usage of the end(int) method shortly.

Listing 2-12: Matcher.end(int) Example
Start example
import java.util.regex.*;
/**
 * Demonstrates the usage of the
 * Matcher.end(int) method
 */
public class MatcherEndParamExample{
  public static void main(String args[]){
      test();
  }
  public static void test(){
     //create a Pattern
      Pattern p = Pattern.compile("B(on)d");
     //create a Matcher and use the Matcher.start(int) method
     String candidateString = "My name is Bond. James Bond.";
     //create a helpful index for the sake of output
     String matchHelper[] =
                             {"               ^",
                              "              ^",
                              "                           ^",
                              "                          ^"};
     Matcher matcher = p.matcher(candidateString);
     //Find the end point of the first 'B(ond)'
      matcher.find();
      int endIndex = matcher.end(0);
      System.out.println(candidateString);
      System.out.println(matchHelper[0] + endIndex);

      //find the end point of the first subgroup (ond)
      int nextIndex = matcher.end(1);
      System.out.println(candidateString);
      System.out.println(matchHelper[1] + nextIndex);

     //Find the end point of the second 'B(ond)'
      matcher.find();
      endIndex = matcher.end(0);
      System.out.println(candidateString);
      System.out.println(matchHelper[2] + endIndex);

      //find the end point of the second subgroup (ond)
      nextIndex = matcher.end(1);
      System.out.println(candidateString);
      System.out.println(matchHelper[3] + nextIndex);
  }
}
End example

In the following example, the regex pattern is B(on)d, which means you have a subgroup within the pattern. The area that has been examined by the Matcher after find() is initially called is highlighted in the box shown in the following image:

By calling the end(0) method, you're implicitly calling it only for the region that has already been parsed, which is boxed in the preceding image. As far as the Matcher is currently concerned, this boxed region is the only one we can discuss at present.

The end(0) method returns the index of last character in group(0) plus 1. Remember that group(0) is the entire expression B(on)d. In this region, the last character is the d in Bond, which is at position 14. Because end(int) adds 1 to that last index, 15 is returned. group(0) is circled in the following image:

Similarly, when you call end(1), you're calling it only for the region that has already been parsed—again, the boxed region. This time, you're asking for the second grouping in that region. The end(1) method returns the index of the last character in group(1) plus 1. The last character in group(1) is the n in Bond, because the pattern is B(on)d, and the index of that n is 13. Because end adds 1 to the index, 14 is returned. group(1) is circled in the following image:

Next, you call matcher.find() again, which results in a new region of the candidate String coming under consideration, as shown here:

Calling the end(0) method implicitly calls it only for the new region that has already been parsed, which is boxed in the preceding image. The end(0) method returns the index of last character in group(0) plus 1, which is the d in Bond. The index of d is 26, and because end adds 1 to that number, 27 is returned. group(0) is circled in the following image:

Calling end(1) only considers the new region that been parsed—again, the boxed region. This time, you're asking for the second grouping in the parsed region. The end(1) method returns the index of last character in group(1) plus 1. That last character is the o in Bond, which is at index 25, as shown in the following image. Because end(int) adds 1 to that number, 26 in returned. The result of calling group(1) is as follows:

Please refer back to the preceding images as necessary when you read Listing 2-12. The listing is simply a fully working example of the steps you just went through.

Output 2-7 shows the output of running Listing 2-12.

Output 2-7: Output for the Matcher.end(int) Example
Start example
My name is Bond. James Bond.
              ^15
My name is Bond. James Bond.
             ^14
My name is Bond. James Bond.
                          ^27
My name is Bond. James Bond.
                         ^26
End example

If you execute another find() method

matcher.find();

and then execute end()

int nonIndex = matcher.end(0); //throws IllegalStateException

the end(int) method will throw an IllegalStateException if the find method isn't successful or if it isn't called in the first place. Similarly, it will throw an IndexOutOfBoundsException if you try to refer to a group number that doesn't exist.

public String group()

The group method can be a powerful and convenient tool in the war against jumbled code. It simply returns the substring of the candidate String that matches the original regex pattern. For example, say you want to extract occurrences of the pattern Bond

Pattern p = Pattern.compile("Bond");

from the candidate String My name is Bond. James Bond.. You extract the Matcher

Matcher matcher = p.matches("My name is Bond. James Bond.");

and call find() on it.

Matcher.find();

Now the boxed region in the following image is ready to be scrutinized by the Matcher:

You can now extract the part of the candidate String that matches your criteria by using the group() method:

String tmp = matcher.group(); \\return "Bond";

This method extracts the matching part of the region under consideration. That area is circled in the following image:

A clumsier way of achieving the same result is to use the start and end methods to find the starting and ending indexes of the group within the candidate String, and use a String.substring method to extract that text.

The group() method will throw an IllegalStateException if the find() method is unsuccessful or if it's never initially called. Listing 2-13 presents a complete working example of this method and the algorithm discussed.

Listing 2-13: The Matcher.group() Method
Start example
import java.util.regex.*;
/**
 * Demonstrates the usage of the
 * Matcher.group() method
 */
public class MatcherGroupExample{
  public static void main(String args[]){
      test();
  }
  public static void test(){
      //create a Pattern
      Pattern p = Pattern.compile("Bond");

      //create a Matcher and use the Matcher.group() method
      String candidateString = "My name is Bond. James Bond.";
      Matcher matcher = p.matcher(candidateString);
      //extract the group
      matcher.find();
      System.out.println(matcher.group());
  }
}

End example

public String group(int group)

This method is a more powerful counterpart to the group() method. It allows you to extract parts of a candidate String that match a subgroup within your pattern. The use of the group(int) method is demonstrated shortly in Listing 2-14.

Listing 2-14: Matcher.group(int) Method Example
Start example
import java.util.regex.*;
/**
 * Demonstrates the usage of the
 * Matcher.group(int) method
 */
public class MatcherGroupParamExample{
  public static void main(String args[]){
      test();
  }
  public static void test(){
     //create a Pattern
      Pattern p = Pattern.compile("B(ond)");

     //create a Matcher and use the Matcher.group(int) method
     String candidateString = "My name is Bond. James Bond.";
     //create a helpful index for the sake of output
     Matcher matcher = p.matcher(candidateString);
     //Find group number 0 of the first find
      matcher.find();
      String group_0 = matcher.group(0);
      String group_1 = matcher.group(1);
      System.out.println("Group 0 " + group_0);
      System.out.println("Group 1 " + group_1);
      System.out.println(candidateString);

     //Find group number 1 of the second find
      matcher.find();
      group_0 = matcher.group(0);
      group_1 = matcher.group(1);
      System.out.println("Group 0 " + group_0);
      System.out.println("Group 1 " + group_1);
      System.out.println(candidateString);
  }
}

End example

In the following example, the regex pattern is again B(ond), which means you have a subgroup within the pattern. The portion the candidate parsed when find() is called for the first time is shown here:

Thus, when you call the group(0) method, you're implicitly calling it only for the region that has already been parsed, which is boxed in the preceding image. As far as the Matcher is currently concerned, this boxed region is the only one we can discuss.

Calling group(0) returns Bond because that's the first group that matches your criteria in the region of the candidate String currently under inspection. Again, that area is shown in the box in the preceding image. The actual matching group is shown in the following image:

Similarly, when you call group(1), you're calling it only for the region that has already been parsed—again, the boxed area. This time, you're asking for the second grouping in the parsed region. group(1) is circled in the following image:

Next, you call matcher.find() again, which results in a new region of the candidate String coming under inspection, as shown here:

Calling the group(0) method implicitly calls it only for the new region that has already been parsed, which is boxed in the preceding image. The group(0) method returns the String Bond. group(0) is circled in the following image:

Calling group(1) only considers the new region that been parsed—again, the boxed region. Within that region, group(1) refers to ond. group(1) is circled in the following image:

Listing 2-14 presents an example using the group(int) method, and Output 2-8 shows the output of this example.

Output 2-8: Output of the Matcher.Group(int) Example
Start example
My name is Bond. James Bond.
Group 0 Bond
Group 1 ond
My name is Bond. James Bond.
Group 0 Bond
Group 1 ond
End example

If you execute another find() method

matcher.find();

and then execute group(0)

String tmp = matcher.group(0); //throws IllegalStateException

the group(0) method will throw an IllegalStateException because the find method call wasn't successful. Similarly, it will throw an IllegalStateException if find hadn't been called at all. If you try to refer to a group number that doesn't exist, it will throw an IndexOutOfBoundsException.

public int groupCount()

This method simply returns the number of groups that the Pattern defined. In Listing 2-15, the groupCount method displays the number of possible groups a given pattern might have.

Listing 2-15: MatcherGroupCountExample Example
Start example
import java.util.regex.*;
/**
 * Demonstrates the usage of the
 * Matcher.groupCount() method
 */
public class MatcherGroupCountExample{
  public static void main(String args[]){
     test();
  }
  public static void test(){
      //create a Pattern
      Pattern p = Pattern.compile("B(ond)");

      //create a Matcher and use the Matcher.group() method
      String candidateString = "My name is Bond. James Bond.";
      Matcher matcher = p.matcher(candidateString);

      //extract the possible number of groups.
      //It's important to be aware that this
      //represents only the number of groups that
      //are possible: not the actual number of groups
      //found in the candidate string
      int numberOfGroups = matcher.groupCount();
      System.out.println("numberOfGroups ="+numberOfGroups);
  }
}
End example

There's a very important, and somewhat counterintuitive, subtlety to notice about this method. It returns the number of possible groups based on the original Pattern, without even considering the candidate String. Thus, it's not really information about the Matcher object; rather, it's information about the Pattern that helped spawn it. This can be tricky, because the fact that this method lives on the Matcher object could be interpreted to mean that it's providing feedback about the state of the Matcher. It just isn't. It's telling you how many matches are theoretically possible for the given Pattern.

public boolean matches()

This method is designed to help you match a candidate String against the Matcher's Pattern. If it returns true if—and only if—the candidate String under consideration matches the pattern exactly.

Listing 2-16 demonstrates how you might use this method. Three strings, j2se, J2SE, and J2SE (notice the space after the E), are compared to the Pattern J2SE.

Listing 2-16: Matcher.matches Example
Start example
    import java.util.regex.*;
/**
 * Demonstrates the usage of the
 * Matcher.matches method
 */
public class MatcherMatchesExample{
  public static void main(String args[]){
      test();
  }
  public static void test(){
     //create a Pattern
      Pattern p = Pattern.compile("J2SE");

     //create the candidate Strings
     String candidateString_1 = "j2se";
     String candidateString_2 = "J2SE ";
     String candidateString_3 = "J2SE";

     //Attempt to match the candidate Strings.
     Matcher matcher_1 = p.matcher(candidateString_1);
     Matcher matcher_2 = p.matcher(candidateString_2);
     Matcher matcher_3 = p.matcher(candidateString_3);

     //display the output for first candidate
     String msg = ":" + candidateString_1 + ": matches?: ";
     System.out.println(msg + matcher_1.matches());

     //display the output for second candidate
     msg = ":" + candidateString_2 + ": matches?: ";
     System.out.println(msg + matcher_2.matches());

     //display the output for third candidate
     msg = ":" + candidateString_3 + ": matches?: ";
     System.out.println(msg + matcher_3.matches());
  }
}
End example

Only one of the three candidates successfully matches here. j2se is rejected because it is the wrong case. J2SE is again rejected because it contains a space character after the E, which means that it isn't a perfect match. The only perfect match is J2SE.

public boolean find()

The find() method parses just enough of the candidate string to find a match. If such a substring is successfully found, then true is returned and find stops parsing the candidate. If no part of the candidate string matches the pattern, then find returns false.

Thus, for the pattern

Pattern p = Pattern.compile("Bond");

and candidate String My name is Bond. James Bond.

Matcher matcher = p.matcher("My name is Bond. James Bond");

calling find() parses My name is Bond. James Bond. until the substring My name is Bond meets the first Bond, as follows:

The boxed section is the part of the candidate that has been parsed; thus, it's the part that calls to the start, end, or group methods we will be concerned with. Why? Because the find method only had to parse up to d in Bond to find a match. Having accomplished that mission, the find method doesn't waste resources parsing the rest of the candidate String.

Calling find is a necessary preamble to using methods such as start, end, and group. Without first evoking find, calling these methods will cause an IllegalStateException to be thrown.

One common use of this method is as a control condition in a while loop, so that the start, end, or group method isn't called when they might throw an IllegalStateException. Listing 2-17 is an example of a simple regular expression that loops through the String I love Java. Java is my favorite language. Java Java Java. and finds the pattern Java.

Listing 2-17: Using the find() Method
Start example
import java.util.regex.*;
/**
 * Demonstrates the usage of the
 * Matcher.find method
 */
public class MatcherFindExample{
  public static void main(String args[]){
      test();
  }
  public static void test(){
     //create a Pattern
      Pattern p = Pattern.compile("Java");

     //create the candidate String
     String candidateString =
      "I love Java. Java is my favorite language. Java Java Java.";

     //Attempt to match the candidate String.
     Matcher matcher = p.matcher(candidateString);

     //loop through and display all matches
     while (matcher.find()){
        System.out.println(matcher.group());
     }
  }
}
End example

In this example, the candidate String is

     String candidateString =
      "I love Java. Java is my favorite language. Java Java Java.";

When the while loop is fist entered, find() is immediately called on the Matcher, which results in the boxed area in the following image. Within that boxed region, the matching part of the region is circled, as shown in the images that follow.

The boxed area is the region parsed, and the circled part is the matching substring:

Click To expand

The boxed area is the next region parsed, and the circled part is the matching substring:

Click To expand

The boxed area is the next region parsed, and the circled part is the matching substring:

Click To expand

The boxed area is the next region parsed, and the circled part is the matching substring:

Click To expand

The boxed area is the next region parsed, and the circled part is the matching substring:

Click To expand

public boolean find(int start)

The find(int start) method works exactly like its overloaded counterpart, except for where it starts searching. The int parameter in start simply tells the Matcher at which character to start its search on.

Thus, for the candidate String I love Java. Java is my favorite language. Java Java Java. and the pattern Java, if you only want to start searching at character index 11, you use the command find(11). The area parsed is boxed in the following image, and the actual matching group is circled:

Click To expand

If the index given is greater than the length of the candidate string, then this method will throw an IndexOutOfBoundsException. Thus, for the preceding candidate string, calling find(58) will cause an IndexOutOfBoundsException, because the length of the string is only 57.

You can also use this method to set the start of the searching point. Thus, you could execute find(11) to start searching at character 11, and then use find(0) to start searching at character 0.

Listing 2-18 provides an example for the candidate String I hate mice. I really hate MICE. and the pattern MICE, in which the comparison is made is case insensitive. The code uses a case-insensitive comparison to demonstrate that the first match is, in fact, for the String that matches after character number 11.

Listing 2-18: Using the find(int) Method
Start example
import java.util.regex.*;
/**
 * Demonstrates the usage of the
 * Matcher.find(int) method
 */
public class MatcherFindParamExample{
  public static void main(String args[]){
      test();
  }
  public static void test(){
     //create a Pattern
      Pattern p = Pattern.compile("mice", Pattern.CASE_INSENSITIVE);

     //create the candidate String
     String candidateString =
      "I hate mice. I really hate MICE.";

     //Attempt to match the candidate String.
     Matcher matcher = p.matcher(candidateString);

     //display the latter match
     System.out.println(candidateString);
     matcher.find(11);
     System.out.println(matcher.group());

     //display the earlier match
     System.out.println(candidateString);
     matcher.find(0);
     System.out.println(matcher.group());
  }
}
End example

When you execute the find(11) method, the search region starts character 11, as illustrated in the following image:

Next, you execute find(0), which moves the search index back to 0. The following image illustrates the resulting search region:

public boolean lookingAt()

The lookingAt() method is a more relaxed version of the matches method. It simply compares as little of the String against the Pattern as necessary to achieve a match. If such a subsection exists, then this method returns true.

Thus, for the pattern J2SE

Pattern = Pattern.compile("J2SE");

and the candidate J2SE is the only one for me

Matcher matcher_1 = Pattern.matcher("J2SE is the only one for me");

the lookingAt method returns true. However, calling lookingAt() for the candidate string For me, it's J2SE, or nothing at all

Matcher matcher_2 = Pattern.matcher("For me, it's J2SE, or nothing at all");

will return false, because the first part of For me, it's J2SE, or nothing at all doesn't match the pattern J2SE.

Like the matches method, the lookingAt method always starts looking at the candidate string at the beginning of the input sequence; unlike matches, the lookingAt method doesn't require that the entire input sequence be matched. If the match succeeds, then more information can be obtained by using the start, end, and group methods. Listing 2-19 provides an example of the lookingAt method's use.

Listing 2-19: Using the lookingAt Method
Start example
import java.util.regex.*;
/**
 * Demonstrates the usage of the
 * Matcher.LookingAt method
 */
public class MatcherLookingAtExample{
  public static void main(String args[]){
      test();
  }
  public static void test(){
     //create a Pattern
      Pattern p = Pattern.compile("J2SE");

     //create the candidate Strings
     String candidateString_1 = "J2SE is the only one for me";
     String candidateString_2 =
      "For me, it's J2SE, or nothing at all";
     String candidateString_3 = "J2SEistheonlyoneforme";

     //Attempt to match the candidate Strings.
     Matcher matcher = p.matcher(candidateString_1);
     //display the output for the candidate
     String msg = ":" + candidateString_1 + ": matches?: ";
     System.out.println(msg + matcher.lookingAt());
     matcher.reset(candidateString_2);
     //display the output for the candidates
     msg = ":" + candidateString_2 + ": matches?: ";
     System.out.println(msg + matcher.lookingAt());

     matcher.reset(candidateString_3);
     //display the output for the candidate
     msg = ":" + candidateString_3 + ": matches?: ";
     System.out.println(msg + matcher.lookingAt());

     /*
     *returns
     *:J2SE is the only one for me: matches?: true
     *:For me, it's J2SE, or nothing at all: matches?: false
     *:J2SEistheonlyoneforme: matches?: true
     */
  }
}
End example

public Matcher appendReplacement (StringBuffer sb, String replacement)

There will be times when you'll prefer to use a StringBuffer instead of a String when working with regular expressions. This might be for performance, utility, or other reasons. Fortunately, the java.util.regex package offers the appendReplacement and appendTail methods for doing so. This section focuses on the appendReplacement method.

Simply speaking, appendReplacement allows you to create a StringBuffer based on the contents of your Pattern and Matcher objects. Say you want to swap out Smith for Bond in the My name is Bond. James Bond. I would like a martini., and you want the results stored in a StringBuffer. To use appendReplacement, you must first create a Pattern and a corresponding Matcher. For this example's purposes, you'll use Bond:

Pattern pattern = Pattern.compile("Bond");

Also, you'll work with the candidate string My name is Bond. James Bond. I would like a martini.:

Matcher matcher =
 pattern.matcher("My name is Bond. James Bond. I would like a martini.");

Next, you call the find method, so that the Matcher can start to parse the candidate String. The first time you call find, the Matcher simply parses enough of the candidate String such that the first match, if any, is found. This parsed region is boxed in the following image:

Click To expand

Recall that the boxed region is the only part of the candidate string the Matcher is currently aware of.

Then you'll call the appendReplacement method, which populates the StringBuffer sb with everything in the boxed region shown in the preceding image, except that Smith is swapped out for Bond. Therefore, your StringBuffer now contains My name is Smith.

One last thing bears mentioning. Internally, the Matcher object maintains an append position. This append position is state information maintained by the Matcher object for the sake of the StringBuffer object. It records the position in the StringBuffer that the last call to appendReplacement read from. Of course, the append position is initially 0, as shown in the following image:

Click To expand

After you call appendReplacement, the append position is moved forward to just after the match, as shown in the following image. This is the same position that the matcher.end() method would return.

Click To expand

Next, you call matcher.find() again, so that the current position under consideration becomes the boxed region highlighted in the following image:

Click To expand

You then call appendReplacement again, thus appending . James Smith to the StringBuffer. Remember, because this is a replacement method, it automatically replaces Bond with Smith. The content of the StringBuffer becomes My name is Smith. James Smith, and the append position is moved forward, as shown in the following image:

Click To expand

Your mission is accomplished. The complete code listing is displayed in Listing 2-20.

Listing 2-20: appendReplacement Method Example
Start example
import java.util.regex.*;
import java.util.*;
/**
 * Demonstrates usage of the
 * Matcher.appendReplacement method
 */
public class Scrap{
  public static void main(String args[]){
      test();
  }
  public static void test(){
     //create a Pattern
      Pattern p = Pattern.compile("Bond");
      //create a StringBuffer
      StringBuffer sb =new StringBuffer();

     //create the candidate Strings
     String candidateString =
     "My name is Bond. James Bond. I would like a martini.";

     String replacement = "Smith";
     //Attempt to match the first candidate String
     Matcher matcher = p.matcher(candidateString);
     matcher.find();

     //populate the StringBufffer
     matcher.appendReplacement(sb,replacement);
     //Attempt to match the second candidate String
     Matcher matcher = p.matcher(candidateString);
     matcher.find();

     //populate the StringBufffer
     matcher.appendReplacement(sb,replacement);

     //display the output for the candidate
     String msg = sb.toString();

     System.out.println(msg.length());
     System.out.println(msg);
  }
}
End example

Special Notes

This appendReplacement method offers a lot of power. As you may know, with great power comes subtle distinctions. By using the expression $d, in which d is a number less than or equal to the number of groups in the previous match, you can actually embed and reorganize subgroups in your search. For example, say your pattern is (James) (Bond):

     Pattern p = Pattern.compile("(James) (Bond)");

and your candidate is My name is Bond. James Bond.

     String candidateString = "My name is Bond. James Bond.";

and you want to insert the middle name Waldo. Your replacement String might look like the following:

String replacement = "$1 Waldo $2";

where $1 refers to the first matching subgroup, James, and $2 refers to the second matching subgroup, Bond.

In this case, the StringBuffer will contain the value My name is Bond. James Waldo Bond.. Listing 2-20 presents a complete working example.

Listing 2-20: Using appendReplacement with Subgroup Replacements
Start example
import java.util.regex.*;
import java.util.*;
/**
 * Demonstrates usage of the
 * Matcher.appendReplacement method, with
 * subgroup replacement.
 */
public class MatcherAppendReplacementGroupExample{
  public static void main(String args[]){
      test();
  }
  public static void test(){
     //create a Pattern
      Pattern p = Pattern.compile("(James) (Bond)");
      //create a StringBuffer
      StringBuffer sb =new StringBuffer();

     //create the candidate Strings
     String candidateString =
     "My name is Bond. James Bond.";

     String replacement = "$1 Waldo $2";
     //Attempt to match the first candidate String
     Matcher matcher = p.matcher(candidateString);
     matcher.find();

     //populate the StringBufffer
     matcher.appendReplacement(sb,replacement);

     //display the output for the candidate
     String msg = sb.toString();
     System.out.println(msg);
  }
}
End example

The appendReplacement method will throw an IllegalStateException if a find() has not been called, or if find returns false. It will throw an IndexOutOfBoundsException if the capturing group referred to by $1, $2, and so on doesn't exist in the part of the pattern currently being scrutinized by the Matcher.

public StringBuffer appendTail(StringBuffer sb)

The appendTail method is a supplement to the appendReplacement method. It simply appends every remaining subsequence from the original candidate string to the StringBuffer. It reads from the append position, which I explained in the appendReplacement section, to the end of the candidate string.

In the appendReplacement example given earlier, you swapped out Smith for Bond in the string My name is Bond. James Bond. I would like a martini.. When you finished, you had a StringBuffer that contained the value My name is Smith. James Smith.

That was as much as the appendReplacement method could accomplish for you, because it's based on a successful match, and there are no more successful matches to be found after the d in the second occurrence of the word Bond. The state of the Matcher after the second call to appendReplacement is shown in the following image:

Click To expand

Correspondingly, the StringBuffer created by using appendReplacement would only have contained the phrase My name is Smith. James Smith. The appendTail method simply appends the rest of the String, namely . I would like a martini. to the StringBuffer buffer. That same StringBuffer is returned.

public String replaceAll(String replacement)

This method is one of my favorite new additions, both for its functionality and for its intuitive application programming interface (API). The replaceAll method simply returns a String that replaces every occurrence of the description with the replacement.

Imagine that you have the String I love ice. Ice is my favorite. Ice Ice Ice., and you want to replace every occurrence of ice or Ice with the word Java. Your first step is to describe the word you want to look for. In this case, because you want to match both uppercase Ice and lowercase ice you'll use the regex pattern (i|I)ce:

Pattern pattern = Pattern.compile("(i|I)ce");

Next, use the candidate String to get a Matcher:

Matcher matcher = pattern.matcher("I love ice. Ice is my favorite. Ice Ice Ice.");

Finally, make the replacement:

String tmp = matcher.replaceAll("Java");

Now the string tmp holds the value I love Java. Java is my favorite. Java Java Java.. Listing 2-21 presents the complete code for this example.

Listing 2-21: replaceAll Method Example
Start example
import java.util.regex.*;
import java.util.*;
/**
 * Demonstrates usage of the
 * Matcher.replaceAll method
 */
public class MatcherReplaceAllExample{
  public static void main(String args[]){
      test();
  }
  public static void test(){
     //create a Pattern
      Pattern p = Pattern.compile("(i|I)ce");

     //create the candidate String
     String candidateString =
     "I love ice. Ice is my favorite. Ice Ice Ice.";

     Matcher matcher = p.matcher(candidateString);
     String tmp = matcher.replaceAll("Java");

     System.out.println(tmp);
  }
}

End example
Caution 

Using this method will change the state of your Matcher object. Specifically, the reset method will be called. Therefore, it's as if all start, end, group, and find calls hadn't been called.

Like the appendReplacement method, this replaceAll method can contain references to substrings by using the $ symbol. For details, please see the appendReplacement documentation presented earlier in the chapter.

public String replaceFirst(String replacement)

The replaceFirst method is a more focused version of the replaceAll method. This method returns a String that replaces the first occurrence of the description with the replacement.

Imagine that you have the candidate I love ice. Ice is my favorite. Ice Ice Ice., and you want to replace the first occurrence of ice or Ice with the word Java. Again, your first step is to describe the word you want to look for. In this case, because you want to match both uppercase Ice and lowercase ice, you use the regex pattern (i|I)ce:

Pattern pattern = Pattern.compile("(i|I)ce");

Next, use the candidate String to get a Matcher:

Matcher matcher = pattern.matcher("I love ice. Ice is my favorite. Ice Ice Ice.");

Finally, make the replacement:

String tmp = matcher.replaceFirst("Java");

The string tmp holds the value I love Java. Ice is my favorite. Ice Ice Ice.. Listing 2-22 presents the complete code for this example.

Listing 2-22: replaceFirst Method Example
Start example
import java.util.regex.*;
import java.util.*;
/**
 * Demonstrates usage of the
 * Matcher.replaceFirst method
 */
public class MatcherReplaceFirstExample{
  public static void main(String args[]){
      test();
  }
  public static void test(){
     //create a Pattern
      Pattern p = Pattern.compile("(i|I)ce");

     //create the candidate String
     String candidateString =
     "I love ice. Ice is my favorite. Ice Ice Ice.";

     Matcher matcher = p.matcher(candidateString);
     String tmp = matcher.replaceFirst("Java");

     System.out.println(tmp);
  }}

End example
Caution 

Using this method will change the state of your Matcher object. Specifically, the reset method will be called. Therefore, remember that all start, end, group, and find calls will have to be re-executed.

Like the appendReplacement method, the replaceFirst method can contain references to substring by using the $ symbol. For details, please see the appendReplacement documentation presented earlier in the chapter.


Team LiB
Previous Section Next Section