Thus far, you've worked almost exclusively with regular expressions, but not really with Java. Now it's time to consider how the two interact. The following examples differ from the preceding ones in that they incorporate Java code with regular expressions. They offer a more complete picture of how you can use some J2SE regex syntax.
Some of the regular expressions you'll see here are slightly more advanced than in the examples you've seen previously, as they build on the fundamentals discussed thus far in the chapter. For example, Listing 1-2 combines groups with quantifiers.
![]() |
import java.util.regex.*; public class MatchPhoneNumber{ public static void main(String args[]){ isPhoneValid(args[0]); } /** * Confirms that the format for the given phone number is valid. * @param phone is a String representing the phone number. * @returns true if the phone number format is acceptable. */ public static boolean isPhoneValid(String phone){ boolean retval=false; String phoneNumberPattern = "(\\d-)?(\\d{3}-)?\\d{3}-\\d{4}"; retval= phone.matches(phoneNumberPattern); //prepare a message indicating success or failure String msg = " NO MATCH: pattern:" + phone + "\r\n regex: " + phoneNumberPattern; if (retval){ msg = " MATCH : pattern:" + phone + "\r\n regex: " + phoneNumberPattern; } System.out.println(msg +"\r\n"); return retval; } }
![]() |
Don't be discouraged if the patterns themselves aren't completely clear to you right now. An intuitive understanding will develop as you continue to read this book. Focus on the concepts and become comfortable with how the Java code and the regex complement each other.
There are only two pieces of information you need to take full advantage of the following examples:
Any \-delimited regex expression metacharacter needs to be delimited once again when it's used in Java code. Thus, \d becomes \\d and \s becomes\\s in your Java code. Correspondingly, a more complex expression such as (\d-)?(\d{3}-)?\d{3}-\d{4}\s becomes (\\d-)?(\\d{3}-)?\\d{3}-\\d{4}\\s in Java code. All \ characters are doubled to produce \\ when they're used in a String object.
In this book, when I talk about a regular expression in and of itself, I don't use the double delimiting mechanism. However, I do when working with specific coding examples.
The String.matches(String regex) method is a new method that has been added to the String class. It compares the String it's called on to the given regular expression, regex, and returns true if the regex pattern matches the String exactly. To match exactly means that the String in question can't contain any characters—not even invisible characters such as newlines and spaces—that aren't accounted for in the regex pattern.
The code in Listing 1-2 simply determines if the given phone number meets the criteria of being well formatted. It takes advantage of two metacharacters introduced in Table 1-6. Specifically it uses range, {n,m}, indicating that the previous character or class must be repeated at least n times and no more than m times. It also uses the character, indicating the previous character or class must be present zero or one time.
The pattern as a whole checks for seven digits preceded by optional country and area codes. Output 1-2 shows the result of running the program, and Table 1-19 dissects the pattern.
![]() |
C:\RegEx\Examples\chapter1>java MatchPhoneNumber "1-999-111-2222" MATCH : pattern:1-999-111-2222 regex: (\d-)?(\d{3}-)?\d{3}-\d{4} C:\RegEx\Examples\chapter1>java MatchPhoneNumber "999-111-2222" MATCH : pattern:999-111-2222 regex: (\d-)?(\d{3}-)?\d{3}-\d{4} C:\RegEx\Examples\chapter1>java MatchPhoneNumber "1-111-2222" MATCH : pattern:1-111-2222 regex: (\d-)?(\d{3}-)?\d{3}-\d{4} C:\RegEx\Examples\chapter1>java MatchPhoneNumber "111-2222" MATCH : pattern:111-2222 regex: (\d-)?(\d{3}-)?\d{3}-\d{4} C:\RegEx\Examples\chapter1>java MatchPhoneNumber "1.999-111-2222" NO MATCH: pattern:1.999-111-2222 regex: (\d-)?(\d{3}-)?\d{3}-\d{4} C:\RegEx\Examples\chapter1>java MatchPhoneNumber "999 111-2222" NO MATCH: pattern:999 111-2222 regex: (\d-)?(\d{3}-)?\d{3}-\d{4} C:\RegEx\Examples\chapter1>java MatchPhoneNumber "1 111 2222" NO MATCH: pattern:1 111 2222 regex: (\d-)?(\d{3}-)?\d{3}-\d{4} C:\RegEx\Examples\chapter1>java MatchPhoneNumber "111-JAVA" NO MATCH: pattern:111-JAVA regex: (\d-)?(\d{3}-)?\d{3}-\d{4}
![]() |
The code in Listing 1-3 determines if the zip code meets the criterion of being well formatted. It checks for five digits optionally followed by a hyphen and four digits. Output 1-3 shows the result of running the program. Table 1-20 dissects the pattern.
Regex |
Description |
---|---|
\d |
A digit |
{ |
Repeated at least |
5 |
Five times |
} |
End repetition |
( |
Open group |
- |
Consisting of a hyphen |
\d |
A digit |
{ |
Repeated at least |
4 |
Four times |
} |
End repetition |
) |
The end of this group |
? |
Look for zero or one of the preceding |
* In English: Look for five digits, optionally followed by a hyphen and four digits. |
![]() |
import java.util.regex.*; import java.io.*; public class MatchZipCodes{ public static void main(String args[]){ isZipValid(args[0]); } /** * Confirms that the format for the given zip code is valid. * @param zip is a String representing the zip code. * @returns true if the zip code format is acceptable. */ public static boolean isZipValid(String zip){ boolean retval=false; String zipCodePattern = "\\d{5}(-\\d{4})?"; retval = zip.matches(zipCodePattern); //prepare a message indicating success or failure String msg = " NO MATCH: pattern:" + zip + "\r\n regex: " + zipCodePattern; if (retval){ msg = " MATCH : pattern:" + zip + "\r\n regex: " + zipCodePattern; } System.out.println(msg +"\r\n"); return retval; } }
![]() |
![]() |
C:\RegEx\Examples\chapter1>java MatchZipCodes "45643-4443" MATCH : pattern:45643-4443 regex: \d{5}(-\d{4})? C:\RegEx\Examples\chapter1>java MatchZipCodes "45643" MATCH : pattern:45643 regex: \d{5}(-\d{4})? C:\RegEx\Examples\chapter1>java MatchZipCodes "443" NO MATCH: pattern:443 regex: \d{5}(-\d{4})? C:\RegEx\Examples\chapter1>java MatchZipCodes "45643-44435" NO MATCH: pattern:45643-44435 regex: \d{5}(-\d{4})? C:\RegEx\Examples\chapter1>java MatchZipCodes "45643 44435" NO MATCH: pattern:45643 44435 regex: \d{5}(-\d{4})?
![]() |
The code in Listing 1-4 checks the format of a given date. It confirms that given date format consists of one or two digits followed by a hyphen, followed by one or two digits, followed by a hyphen, followed by four digits. Output 1-4 shows the result of running the program. Table 1-21 dissects the pattern.
![]() |
import java.util.regex.*; import java.io.*; public class MatchDates{ public static void main(String args[]){ isDateValid(args[0]); } /** * Confirms that given date format consists of one or two digits * followed by a hyphen, followed by one or two digits, followed * by a hyphen, followed by four digits * @param date is a String representing the date. * @returns true if date format is acceptable. */ public static boolean isDateValid(String date){ boolean retval=false; String datePattern ="\\d{1,2}-\\d{1,2}-\\d{4}"; retval = date.matches(datePattern); //prepare a message indicating success or failure String msg = " NO MATCH: pattern:" + date + "\r\n regexLength: " + datePattern; if (retval){ msg = " MATCH : pattern:" + date + "\r\n regexLength: " + datePattern; } System.out.println(msg +"\r\n"); return retval; } }
![]() |
![]() |
C:\RegEx\Examples\chapter1>java MatchDates "04-02-1999" MATCH : pattern:04-02-1999 regexLength: \d{1,2}-\d{1,2}-\d{4} C:\RegEx\Examples\chapter1>java MatchDates "15-42-1999" MATCH : pattern:15-42-1999 regexLength: \d{1,2}-\d{1,2}-\d{4} C:\RegEx\Examples\chapter1>java MatchDates "April fourth nineteen ninety nine" NO MATCH: pattern:April fourth nineteen ninety nine regexLength: \d{1,2}-\d{1,2}-\d{4} C:\RegEx\Examples\chapter1>java MatchDates "15-42-20002" NO MATCH: pattern:15-42-20002 regexLength: \d{1,2}-\d{1,2}-\d{4} C:\RegEx\Examples\chapter1>java MatchDates "02-02-20002" NO MATCH: pattern:02-02-20002 regexLength: \d{1,2}-\d{1,2}-\d{4} C:\RegEx\Examples\chapter1>java MatchDates "04-02-02" NO MATCH: pattern:04-02-02 regexLength: \d{1,2}-\d{1,2}-\d{4} C:\RegEx\Examples\chapter1>java MatchDates "04-02-garbage" NO MATCH: pattern:04-02-garbage regexLength: \d{1,2}-\d{1,2}-\d{4}
![]() |
The code in Listing 1-5 determines if the given name meets the criterion of being well formatted. It looks for a first name token, an optional middle name token, and finally a last name token. For this example's purposes, a name token consists of a capital letter followed by any number of lowercase letters.
![]() |
import Java.util.regex.*; import java.io.*; public class MatchNameFormats{ public static void main(String args[]){ isNameValid(args[0]); } /** * Confirms that the format for the given name is valid. * @param name is a String representing the name. * @returns true if the name format is acceptable. */ public static boolean isNameValid(String name){ boolean retval=false; String nameToken ="\\p{Upper}(\\p{Lower}+\\s?)"; String namePattern = "("+nameToken+"){2,3}"; retval = name.matches(namePattern); //prepare a message indicating success or failure String msg = "NO MATCH: pattern:" + name + "\r\n regex :" + namePattern; if (retval){ msg = "MATCH pattern:" + name + "\r\n regex :" + namePattern; } System.out.println(msg +"\r\n"); return retval; } }
![]() |
This example is interesting because it takes advantage of Java's robustness to a degree that the previous example didn't. Specifically, you define what you mean when you say a "name token":
String nameToken ="\\p{Upper}(\\p{Lower}+\\s?)";
Then you use that definition later:
String namePattern = "("+nameToken+"){2,3}";
Note |
\p{Upper} and \p{Lower} are described shortly. They simply mean any uppercase character and any lowercase character, respectively. |
This helps to keep the regex pattern from becoming overwhelming, and it also helps to isolate errors. As the examples in this book grow more ambitious, you'll start to see that coupling regular expressions with Java's powerful language can offer benefits that would, at best, be terse using regular expressions alone. Listing 1-5 shows the program MatchNameFormats.java, Output 1-5 shows the result of running the program, and Table 1-22 dissects the pattern.
![]() |
C:\RegEx\Examples\chapter1>java MatchNameFormats "John Smith" MATCH pattern:John Smith regex :(\p{Upper}(\p{Lower}+\s?)){2,3} C:\RegEx\Examples\chapter1>java MatchNameFormats "John McGee" MATCH pattern:John McGee regex :(\p{Upper}(\p{Lower}+\s?)){2,3} C:\RegEx\Examples\chapter1>java MatchNameFormats "John Willliam Smith" MATCH pattern:John Willliam Smith regex :(\p{Upper}(\p{Lower}+\s?)){2,3} C:\RegEx\Examples\chapter1>java MatchNameFormats "John Q Smith" NO MATCH: pattern:John Q Smith regex :(\p{Upper}(\p{Lower}+\s?)){2,3} C:\RegEx\Examples\chapter1>java MatchNameFormats "John allen Smith" NO MATCH: pattern:John allen Smith regex :(\p{Upper}(\p{Lower}+\s?)){2,3} C:\RegEx\Examples\chapter1>java MatchNameFormats "John" NO MATCH: pattern:John regex :(\p{Upper}(\p{Lower}+\s?)){2,3}
![]() |
A couple of questions naturally arise from this example:
Why did John Q Public fail? Because Q is not a name token, as you've defined name tokens (i.e., a capital letter followed by one or more lowercase letters).
Why did John allen Smith fail? Because allen doesn't start with a capital letter.
Why did John fail? Although John is a valid name token, it isn't repeated two or three name tokens. It's simply one name token.
Why did John McGee pass? McGee isn't an uppercase letter followed by any number of lowercase letters. Try to puzzle this one out on your own. It's answered in the "FAQs" section at the end of the chapter.
This example uses the composition technique mentioned at the beginning of this chapter. That is, it uses patterns previous defined to compose a new pattern. If you think about it, this is a very engineer-like thing to do: Build small blocks, then use those blocks to build more complicated pieces.
The code in Listing 1-6 simply determines if the given address meets the criterion of being well formatted. It takes advantage of the name and zip code patterns created earlier, and it adds its own address pattern. Output 1-6 shows the result of running the program. Table 1-23 dissects the pattern.
![]() |
import java.util.regex.*; import java.io.*; public class MatchAddress{ public static void main(String args[]){ isAddressValid(args[0]); } /** * Confirms that the format for the given address is valid. * @param addr is a String representing the address * @returns true if the zip code format is acceptable. */ public static boolean isAddressValid(String addr){ boolean retval = false; //use the name pattern created earlier. String nameToken ="\\p{Upper}(\\p{Lower}+\\s?)"; String namePattern = "("+nameToken+"){2,3}"; //use the zip code pattern created earlier. String zipCodePattern = "\\d{5}(-\\d{4})?"; //construct an address pattern String addressPattern = "^" + namePattern + "\\w+ .*, \\w+ " + zipCodePattern +"$"; retval= addr.matches(addressPattern); //prepare a message indicating success or failure String msg = "NO MATCH\npattern:\n " + addr + "\nregexLength:\n " + addressPattern; if (retval){ msg = "MATCH\npattern:\n " + addr + "\nregexLength:\n " + addressPattern; } System.out.println(msg +"\r\n"); return retval; } }
![]() |
![]() |
C:\RegEx\chapter_1\Examples\chapter1> java MatchAddress "John Smith 888 Luck Street, NY 64332" MATCH pattern: John Smith 888 Luck Street, NY 64332 regexLength: ^(\p{Upper}(\p{Lower}+\s?)){2,3}\w+ .*, \w+ \d{5}(-\d{4})?$ C:\RegEx\chapter_1\Examples\chapter1> java MatchAddress "John A. Smith 888 Luck Stree t, NY 64332-4453" NO MATCH pattern: John A. Smith 888 Luck Street, NY 64332-4453 regexLength: ^(\p{Upper}(\p{Lower}+\s?)){2,3}\w+ .*, \w+ \d{5}(-\d{4})?$ C:\RegEx\chapter_1\Examples\chapter1> java MatchAddress "John Allen Smith 888 Luck Street, NY 64332-4453" MATCH pattern: John Allen Smith 888 Luck Street, NY 64332-4453 regexLength: ^(\p{Upper}(\p{Lower}+\s?)){2,3}\w+ .*, \w+ \d{5}(-\d{4})?$ C:\RegEx\chapter_1\Examples\chapter1> java MatchAddress "888 Luck Street, NY 64332" NO MATCH pattern: 888 Luck Street, NY 64332 regexLength: ^(\p{Upper}(\p{Lower}+\s?)){2,3}\w+ .*, \w+ \d{5}(-\d{4})?$ C:\RegEx\chapter_1\Examples\chapter1> java MatchAddress "P.O. BOX 888 Luck Street, NY 64332-4453" NO MATCH pattern: P.O. BOX 888 Luck Street, NY 64332-4453 regexLength: ^(\p{Upper}(\p{Lower}+\s?)){2,3}\w+ .*, \w+ \d{5}(-\d{4})?$ C:\RegEx\chapter_1\Examples\chapter1> java MatchAddress "John Allen Smith 888 Luck st., NY" NO MATCH pattern: John Allen Smith 888 Luck st., NY regexLength: ^(\p{Upper}(\p{Lower}+\s?)){2,3}\w+ .*, \w+ \d{5}(-\d{4})?$
![]() |
I discussed the code in Listing 1-7 in the "Groups and Back References" section earlier. The point in reintroducing it here is to demonstrate how regular expressions actually interact with Java code.
![]() |
import java.util.regex.*; import java.io.*; public class MatchDuplicateWords{ public static void main(String args[]){ hasDuplicate(args[0]); } /** * Confirms that given phrase avoids duplicate words. * @param phrase is a String representing the phrase. * @returns true if the phrase avoids duplicate * words. */ public static boolean hasDuplicate(String phrase){ boolean retval=false; String duplicatePattern = "\\b(\\w+) \\1\\b"; // Compile the pattern Pattern p = null; try{ p = Pattern.compile(duplicatePattern); } catch (PatternSyntaxException pex){ pex.printStackTrace(); System.exit(0); } //count the number of matches. int matches = 0; //get the matcher Matcher m = p.matcher(phrase); String val=null; //find all matching Strings while (m.find()){ retval = true; val = ":" + m.group() +":"; System.out.println(val); matches++; } //prepare a message indicating success or failure String msg = " NO MATCH: pattern:" + phrase + "\r\n regex: " + duplicatePattern; if (retval){ msg = " MATCH : pattern:" + phrase + "\r\n regex: " + duplicatePattern; } System.out.println(msg +"\r\n"); return retval; } }
![]() |
As you read this example, notice that it uses a Pattern and Matcher, and not the String.matches(regex) method, as most of the examples in the previous sections have. Try to guess why this approach has been taken. For the answer, look in the "FAQs" section at the end of this chapter. Output 1-7 shows the result of running the program. The pattern is dissected in Table 1-24.
Regex |
Description |
---|---|
\b |
A word boundary |
( |
Followed by a group consisting of |
\w |
An alphanumeric or underscore character |
+ |
Repeated one or more times |
) |
Close group |
<space> |
Followed by a space |
\1 |
Followed by the exact group of characters captured previously |
\b |
Followed by a word boundary |
* In English: Look for a word boundary, followed by a group of alphanumeric characters, followed by a space, followed by the exact same group of alphanumeric characters found previously, followed by a word boundary. In short, look for duplicate words. |
![]() |
C:\RegEx\Examples\chapter1>java MatchDuplicateWords "pizza pizza" :pizza pizza: MATCH : pattern:pizza pizza regex: \b(\w+) \1\b C:\RegEx\Examples\chapter1>java MatchDuplicateWords "Faster pussycat kill kill" :kill kill: MATCH : pattern:Faster pussycat kill kill regex: \b(\w+) \1\b C:\RegEx\Examples\chapter1>java MatchDuplicateWords "The mayor of of simpleton" :of of: MATCH : pattern:The mayor of of simpleton regex: \b(\w+) \1\b C:\RegEx\Examples\chapter1>java MatchDuplicateWords "Never Never Never Never Never" :Never Never: :Never Never: MATCH : pattern:Never Never Never Never Never regex: \b(\w+) \1\b C:\RegEx\Examples\chapter1>java MatchDuplicateWords "222 2222" NO MATCH: pattern:222 2222 regex: \b(\w+) \1\b C:\RegEx\Examples\chapter1>java MatchDuplicateWords "sara sarah" NO MATCH: pattern:sara sarah regex: \b(\w+) \1\b C:\RegEx\Examples\chapter1>java MatchDuplicateWords "Faster pussycat kill, kill" NO MATCH: pattern:Faster pussycat kill, kill regex: \b(\w+) \1\b C:\RegEx\Examples\chapter1>java MatchDuplicateWords ". ." NO MATCH: pattern:. . regex: \b(\w+) \1\b
![]() |