You'll notice that the Pattern class doesn't have a public constructor. This means that you can't write the following type of code:
Pattern p = new Pattern("my regex");//wrong!
To get a reference to a Pattern object, you must use the static method pattern(String regex). Thus, your first line of regex code might look like the following:
The parameter for this method is a String that represents a regular expression. When passing a String to a method that expects a regular expression, it's important to delimit any \ characters that the regular expressions might have by appending another \ character to them. This is because String objects internally use the \ character to delimit metacharacters in a character sequences, regardless of whether those character sequences are regular expressions. This has been true long before regular expression were a part of Java. Thus, the regular expression \d becomes \\d. To match a single digit, your regular expression code becomes the following:
Pattern p = Pattern.compile("\\d");
The point here is that the regular expression \d becomes the String \\d .
The delimitation of the String parameter can sometimes be tricky, so it's important to understand it well. By and large, it means that you double the \ characters that might already be present in the regular expression. It doesn't mean that you simply append a single \ character.
The compile method will throw a java.util.regex.PatternSyntaxException if the regular expression itself is badly formed. For example, if you pass in a String that contains [4, the compile method will throw a PatternSyntaxException at runtime, because the syntax of the regular expression [4 is illegal.
The compile(String regex) method returns a Pattern object.
The compile(String regex, int flags) method is a more powerful form of the compile method. The first parameter for this method, regex, is a String that represents a regular expression, as detailed in the previous pattern.compile(String regex) method entry. For details on how the String parameter must be formatted, please see the previous compile(String regex) method entry.
The flexibility of this compile method is fully realized by using the second parameter, int flags. For example, if you want a match to be successful regardless of the case of the candidate String, then your pattern might look like the following:
Pattern p = Pattern.compile(regex,Pattern.CASE_INSENSITIVE);
You can combine the flags by using the | operator. For example, to achieve case-insensitive Unicode matches that include a comment, you might use the following:
Pattern p = Pattern.compile("t # a compound flag example",Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE| Pattern.COMMENT);
The compile(String regex, int flags) method returns a Pattern object.
This method returns the regular expression from which this pattern was compiled. This is a simple String that represents the regex you passed in.
This method can be misleading in two ways. First, the String that is returned doesn't reflect any flags that were set when the pattern was compiled. Second, the regex String you passed in isn't always the pattern String you get back out. Specifically, the original String delimitations aren't shown. Thus, if your original code was
Pattern p = Pattern.compile("\\d");
then you should expect your output to be \d, with a single \ character.
Remember that you create a Pattern object by compiling a description of what you're looking for. A Pattern lists the features of what you're looking for. Speaking purely conceptually, your patterns might look like the following:
Pattern p = Pattern.compile("She must have red hair and be smarter than I am");
Correspondingly, you'll need to compare that description against candidates. That is, you'll want to examine a given String to see whether it matches the description you provided.
The Matcher object is designed specifically to help you do this sort of interrogation. The Pattern.matcher(CharSequence input) method returns the Matcher that will help get details about how your candidate String compares with the description you passed in.
Earlier, I discussed the constant flags that you can use in compiling your regex pattern. The flags method simply returns an int that represents those flags.
To see if your Pattern class is currently using a given flag—for example, the Pattern.COMMENTS flag—simply extract the flag
int flgs = myPattern.flags();
and then & that flag to the Pattern.COMMENTS flag:
boolean isUsingCommentFlag =(Pattern.COMMENTS == (Pattern.COMMENTS & flgs)) ;
Similarly, to see if you're using CASE_insensitive, do the following:
boolean isUsingCaseInsensitiveFlag = (Pattern.CASE_insensitive == (Pattern. CASE_insensitive & flgs));
Very often, you'll find that all you need to know about a String is whether it matches a given regular expression exactly. You don't want to have to create a Pattern object, extract its Matcher object, and interrogate that Matcher.
This static utility method is designed to help you do exactly that. Internally, it creates the Pattern and Matcher objects you need, compares the regex to the input String, and returns a Boolean indicating whether the two objects match exactly. Usage might look something like PatternMatchesTest example shown here:
import java.util.regex.*; public class PatternMatchesTest{ public static void main(String args[]){ String regex = "ad*"; String input = "add"; boolean isMatch = Pattern.matches(regex,input); System.out.println(isMatch);\\return true } }
If you're going to be doing a lot of comparisons, then it's more efficient to explicitly create a Pattern object and do your matches manually. However, if you're not going to be doing a lot of comparisons, then matches is a handy utility method.
The Pattern.matches(String regex, CharSequence input) method is also used internally by the String class. As of J2SE 1.4, String has a new method called matches, which internally defers to this one. Thus, you might already be using this method without being aware of it.
Of course, this method can throw a PatternSyntaxException if the regex pattern under consideration isn't well formed.
This method can be particularly helpful if you need to break up a String into an array of substrings based on some criteria. In concept, it's similar to the StringTokenizer. However, it's much more powerful and more resource intensive than StringTokenizer, because it allows your program to use a regular expressions as the splitting criteria.
This method always returns at least one element. If the split candidate, input, can't be found, a String array is returned that contains exactly one String, namely the original input.
If the input can be found, then a String array is returned. That array contains every substring after an occurrence of the input. Thus, for the pattern
Pattern p = new Pattern.compile(",");
the split method for Hello, Dolly will return a String array consisting of two elements. The first element of the array will contain the String Hello, and the second will contain the String Dolly. That String array is obtained as follows:
String tmp[] = p.split("Hello,Dolly");
In this case, the value return is
//tmp =={ "Hello", "Dolly"}
There are some subtleties you should be aware of when working with this method. If the candidate String had been Hello,Dolly, with a trailing comma character after the y in Dolly, then this method would still have returned two elements: a String array consisting of Hello and Dolly. The implicit behavior is that training spaces aren't returned.
If the input String had been Hello,,,Dolly, the resulting String array would have four elements. The return value of the split method, as applied to the Pattern, is
The String method further optimizes its search criteria by placing an invisible ^ before the pattern and a $ after it.
This method works in exactly the same way as Pattern.split(CharSequence input), with one variation. The second parameter, limit, allows you to control how many elements are returned:
Limit == 0
If you specify that the second parameter, limit, should equal 0, then this method behaves exactly like its overloaded counterpart:
Limit >0
Use a positive limit if you're interested in only a certain number of matches. You should use number 1 as the limit. Say the Pattern p has been compiled for the String , as previously. To split the String Hello, Dolly, You, Are, My, Favorite when you only want the first two tokens, you would use this:
String[] tmp = pattern.split("Hello, Dolly, You, Are, My, Favorite",3);
The value of the resulting String would be this:
//tmp[0] = "Hello", tmp[1] = "Dolly";
The interesting behavior here is that a third element is returned, in this case
//tmp[2] = "You, Are, My, Favorite";
Using a positive limit can potentially lead to performance enhancements, because the regex engine can stop searching when it meets the specified number of matches:
Limit <0
Using a negative number—any negative number—for the limit tells the regex engine that you want to return as many matches as possible and that you want trailing spaces, if any, to be returned. Thus, for the regex pattern , and the candidate String Hello,Dolly the command
results in
//tmp == {"Hello","Dolly"};
However, for the String Hello, Dolly,<space><space><space>, with trailing spaces after the comma following the Dolly, the method call
String tmp[] = p.split("Hello,Dolly, ", -1);
results in
//tmp == {"Hello","Dolly"," "};
Notice that the actual value of the negative limit doesn't matter. Thus,
p.split("Hello,Dolly", -1);
is exactly equivalent to
p.split("Hello,Dolly", -100);