Team LiB
Previous Section Next Section

FAQs

Q: 

The \b metacharacter seems to act inconsistently in regular expressions as I write them. What's going on?

In regex, \b means a word boundary. However, in general Java vernacular, \b means a backspace. Here's the rule: The literal String  \b means a backspace character. However, the literal String  \\b means a word boundary.

Q: 

When should I use the String.matches method instead of the Pattern and Matcher objects directly?

Use the String.matches method if you require an exact match. For example, if you want exactly seven consecutive digits and nothing else is acceptable, then use String.matches with the pattern \d{7} . In general, if you're prepared to narrow the definition of acceptable patterns, or if you're willing to define every possible variation, then use the String.matches method. On the other hand, if you're looking for the existence of substring, you're better served by the Pattern and Matcher objects.

Q: 

Is using the String.matches method less resource-intensive than using the Pattern and Matcher objects?

No. The String.matches method simply calls the Pattern.matches method, which in turn creates and uses both a Pattern object and a Matcher object.

Q: 

Can I modify a String by applying a regular expression to it?

Absolutely not. String s are immutable objects in Java, and thus they cannot be changed. However, you can create a new String object that has the requested changes. Thus, if you have   String tmp = `Hello`;  and you want to change the e to a X by doing the following:   String newTmp = tmp.replaceFirst(`e`,`X`);  the value of tmp is still Hello , but the value of newTmp is HXllo .

Q: 

Why did the pattern (\p{Upper}(\p{Lower}+\s?)){2,3} match John McGee in the NameFormat.java example?

Because John meets the first part of the pattern, Mc meets the second part of the pattern, and Gee meets the second part of the pattern. As a test, try running John Janis McGee through the NameFormat.java program. The point here is that John consists of an uppercase letter, followed by one or more lowercase letters, followed by one space. Mc consists of an uppercase letter, followed by one or more lowercase letters, followed by no space, and Gee consists of an uppercase letter, followed by one or more lowercase letters, followed by no space. This isn't exactly what you may have had in mind, but it seems permissible in this case. It's very important to be precise and do a lot of testing when working with regular expressions, or unexpected results are sure to follow.

Q: 

What type of regex engine does Java use?

J2SE uses a traditional nondeterministic finite automaton (NFA) engine. This means that when the engine reaches a fork in the road, it chooses one path, remembers where the other path is in case things don't work out, and goes from there. The advantage here is that you could be leading the engine to a match very, very quickly if you write efficient expressions. The disadvantage is that you could be leading the regex engine on a wild goose chase before it finally gets the match by writing inefficient expressions.

Answers

A: 

In regex, \b means a word boundary. However, in general Java vernacular, \b means a backspace. Here's the rule: The literal String \b means a backspace character. However, the literal String \\b means a word boundary.

A: 

Use the String.matches method if you require an exact match. For example, if you want exactly seven consecutive digits and nothing else is acceptable, then use String.matches with the pattern \d{7}. In general, if you're prepared to narrow the definition of acceptable patterns, or if you're willing to define every possible variation, then use the String.matches method. On the other hand, if you're looking for the existence of substring, you're better served by the Pattern and Matcher objects.

A: 

No. The String.matches method simply calls the Pattern.matches method, which in turn creates and uses both a Pattern object and a Matcher object.

A: 

Absolutely not. Strings are immutable objects in Java, and thus they cannot be changed. However, you can create a new String object that has the requested changes. Thus, if you have

String tmp = "Hello";

and you want to change the e to a X by doing the following:

String newTmp = tmp.replaceFirst("e","X");

the value of tmp is still Hello, but the value of newTmp is HXllo.

A: 

Because John meets the first part of the pattern, Mc meets the second part of the pattern, and Gee meets the second part of the pattern. As a test, try running John Janis McGee through the NameFormat.java program.

The point here is that John consists of an uppercase letter, followed by one or more lowercase letters, followed by one space. Mc consists of an uppercase letter, followed by one or more lowercase letters, followed by no space, and Gee consists of an uppercase letter, followed by one or more lowercase letters, followed by no space. This isn't exactly what you may have had in mind, but it seems permissible in this case. It's very important to be precise and do a lot of testing when working with regular expressions, or unexpected results are sure to follow.

A: 

J2SE uses a traditional nondeterministic finite automaton (NFA) engine. This means that when the engine reaches a fork in the road, it chooses one path, remembers where the other path is in case things don't work out, and goes from there.

The advantage here is that you could be leading the engine to a match very, very quickly if you write efficient expressions. The disadvantage is that you could be leading the regex engine on a wild goose chase before it finally gets the match by writing inefficient expressions.


Team LiB
Previous Section Next Section