One of the most obvious changes brought about by the introduction of regular expressions into J2SE is the addition of five powerful new methods in the String class. In the following sections I discuss these changes and offer direction on how you can use them in your future coding adventures.
There's one very important consideration that you have to keep in mind when you work with regular expressions and String objects: Special characters, such as the digit, \d, and the word token, \w, to name just a couple, have to be delimited twice when passed into a String. For example, to search for a digit, you must double the number of \ characters you use. Thus, \d becomes \\d when you use it in a Java String object.
This doesn't sound overly complicated, but it can be surprisingly difficult to deal with at times. For example, imagine that you want to replace every occurrence of the character d in I want to use a d character with \d. That is, you want the new String to say I want to use a \d character. How do you start?
Of course, you could try this:
String retval = tmp.replaceAll("d","\d");
which fails to compile with an illegal escape character error. OK, so you double up the \ characters to achieve the following:
String retval = tmp.replaceAll("d","\\d");
This manages to compile, but it returns the bizarre result of no change at all. What's going on here?
Wait—recall that \\d, as a regular expression, doesn't mean a delimited d character; it means a digit. Well, of course that wouldn't have worked. Your candidate doesn't have any digits. Try adding another \ character to delimit the \\\d:
String retval = tmp.replaceAll("d","\\\d");
This again fails to compile with an illegal escape character error. This is getting frustrating. Didn't the material in this book say to add a \ character when trying to delimit special characters?
Well, actually, it didn't. The material in this book said to double the number of \ characters. Because there are currently two \ characters, doubling them would create \\\\d as the expression. It looks weird, but try it anyway:
String retval = tmp.replaceAll("d","\\\\d");
Amazingly, it works! But why did it work? Because the first \ of \\\\d acts as a delimiter for the second \. Similarly, the third \ acts as a delimiter for the fourth \ character.
OK, that's all clear now. Try to swap out the $ in I want to use a $ character so that the resulting String reads I want to use a \$ character. See the FAQs section at the end of this chapter for the solution.
The String.matches method is probably the regex method you'll use most often. It simply compares the given String to a candidate regex and returns true if the two match exactly in terms of regular expressions. For example, for the String
String num = "4";
comparing 4 to \d, which represents a single digit, will return true:
num.matches("\\d");\\returns true
However, comparing 4 (that is, 4 followed by a space) to a digit, \d, will return false. Similarly, comparing 4 (that is, 4 with no space after it) to \d (that is, a digit followed by a space) will also return false.
The point here is that, when you use this method, you have to be careful that the regular expression describes the entirety of the String and does not describe anything that is not a part of the String. Even a space, per the preceding example, can throw your match off-kilter.
Behind the scenes, this method instantiates a Pattern object and simply makes a pass through to the Pattern.matches method discussed earlier. If you're going to be doing a lot of matches operations, you'll probably find it more efficient to explicitly create Pattern and Matcher objects, and use them directly.
If the regular expression passed in is invalid, then this method will throw a PatternSyntaxException error. If the regular expression is null, matches will throw a NullPointerException.
The String.replaceFirst method replaces the first occurrence of the regex description, with the String represented by the second parameter of this method. Thus, for the String tmp
String tmp = "I want to eat 5 hamburgers, 7 days a week";
the command
String newTmp = tmp.replaceFirst("\d","900");
sets newTmp to I want to eat 900 hamburgers, 7 days a week.
Behind the scenes, this method instantiates Pattern and Matcher objects, and simply makes a pass through to the Matcher.replaceFirst method discussed earlier. If you're going to be doing a lot of replaceFirst operations, you'll probably find it more efficient to explicitly create Pattern and Matcher objects, and use them directly.
Note |
If you explicitly create Pattern and Matcher objects and use them directly, you may want to optimize your patterns by putting in end-of-line $ and beginning-of-line ^ characters where appropriate. |
If the regular expression passed in is invalid, then this method will throw a PatternSyntaxException error. If the regular expression is null, replaceFirst will throw a NullPointerException.
The String.replaceAll method replaces every occurrence of the regex description with the String represented by the second parameter of this method. Thus, for the String tmp
String tmp = "I want to eat 5 hamburgers, 7 days a week";
the command
String newTmp = tmp.replaceAll("\d","900");
sets newTmp to I want to eat 900 hamburgers, 900 days a week.
Behind the scenes, this method instantiates Pattern and Matcher objects, and simply makes a pass through to the Matcher.replaceAll method discussed earlier. If you're going to be doing a lot of replaceAll operations, you'll probably find it more efficient to explicitly create Pattern and Matcher objects, and use them directly.
If the regular expression passed in is invalid, then this method will throw a PatternSyntaxException error. If the regular expression is null, replaceFirst will throw a NullPointerException.
This method can be particularly helpful if you need to break up a String into an array of substrings based on some criteria—in concept, it's similar to the StringTokenizer. However, it's much more powerful and more resource intensive than StringTokenizer because it allows your program to use a regular expressions as the splitting criteria.
This method always returns at least one element. If the split candidate, input, can't be found, then a String array is returned that contains exactly one String— namely, the original input. If the input can be found, then a String array is returned. That array contains every substring after an occurrence of the input.
Thus, calling the split(",") method on the String Hello, Dolly, will return a String array consisting of two elements. The first element of the array will contain Hello, and the second will contain Dolly.
There are some subtleties you should be aware of when working with this method. If the String had been Hello,Dolly, with a trailing comma character after the y in Dolly then this method would still have returned a two-element String array consisting of Hello and Dolly. The implicit behavior is that trailing spaces aren't returned.
If the String had been Hello,,,Dolly, then the resulting String array would have had four elements. The return value of the split method, as applied to the pattern , is as follows:
// "Hello,,,Dolly".split() is equal to {"Hello","","","Dolly"}
Behind the scenes, this method instantiates a Pattern object and simply makes a pass through to the Pattern.split method discussed earlier. If you're going to be doing a lot of split operations, you'll probably find it more efficient to explicitly create Pattern objects and use them directly.
If the regular expression passed in is invalid, then this method will throw a PatternSyntaxException error. If the regular expression is null, replaceFirst will throw a NullPointerException.
This method returns an array containing substrings of the String object it was called on. Those substrings are the text surrounding the regex expression described by the first parameter, regex. The actual number of elements in the array is controlled by the second parameter, limit. The following sections explain what the different values of limit can mean.
If you specify that the second parameter, limit, should equal 0, then this method returns an array containing as many matching substrings as possible, and trailing spaces are discarded. Thus, the pattern
will return an array consisting of two elements when split against the candidate Hello, Dolly.
Similarly, split will return two elements when matched against Hello, Dolly, that has a trailing comma after the y in Dolly:
String tmp[] = "Hello, Dolly,.".split(",",0);
However, you may not always want this behavior. For example, there may be times when you want to limit the number of elements returned.
Use a positive limit if you're interested in only a set number of matches. You should use that number plus 1 as the limit. To split Hello, Dolly, You, Are, My, Favorite when you want only the first two tokens, you would use this:
String[] tmp = "Hello, Dolly, You, Are, My, Favorite".split(",",3);
The value of the resulting String is as follows:
//tmp[0] is "Hello" // tmp[1] is "Dolly";
The interesting behavior here is that a third element is returned:
//tmp[2] is "You, Are, My, Favorite";
Using a positive limit can potentially lead to performance enhancements, because the regex engine can stop searching when it meets the specified number of matches.
Using a negative number—any negative number—for the limit tells the regex engine that you want to return as many matches as possible and that you want trailing spaces, if any, to be returned. Thus, for the regex pattern , and the candidate Hello,Dolly, the command
results in
//tmp == {"Hello","Dolly"};
However, for the String Hello, Dolly, which has trailing spaces after the comma following Dolly, the method call
String tmp[] = "Hello,Dolly, ".split(",", -1);
results in
//tmp is equal to {"Hello","Dolly"," "};
Notice that the actual value of the negative limit doesn't matter. Thus
p.split("Hello,Dolly", -1);
is exactly equivalent to
p.split("Hello,Dolly", -100);
Behind the scenes, this method instantiates a Pattern object and simply makes a pass through to the Pattern.split method discussed earlier. If you're going to be doing a lot of split operations, you'll probably find it more efficient to explicitly create the Pattern object and use it directly.
If the regular expression passed in is invalid, then this method will throw a PatternSyntaxException error. If the regular expression is null, replaceFirst will throw a NullPointerException.