Team LiB
Previous Section Next Section

Accessing Subgroups

So how does all of this actually work? Are there little fairies running around under the hood of the regex engine, keeping track of all these groups? Well, yes and no.

Although there are no fairies under the hood that I know of, the regex engine is internally tracking all subgroups by putting the matching sections from the candidate String into memory. Thus, because you defined the pattern as \w(\d), the regex will keep track of any single digit when that digit is preceded by an alphanumeric or underscore character. That's what the regex thinks you mean for it do when you put the expression (\d) in parentheses.

The engine provides access to these captured groups based on their numeric index. Captured groups are indexed from left to right, in the order of their opening parentheses, and group(0) always refers to the original expression in its entirety. Thus, in the preceding example, group(0) refers to the part of the candidate string that matches the entire expression \w(\d), whereas group(1) refers to the part of the expression that matches the (\d) part of the expression.

For example, if your pattern was (\w)(\d)(\w)(\w) and your candidate string was J2SE, then group 0 would have matched the entire candidate J2SE. Group 1 would have matched J, group 2 would have matched 2, group 3 would have matched S, and group 4 would have matched E.

Correspondingly, if your pattern stayed (\w)(\d)(\w)(\w) but your candidate string was R2D2, then group 0 would have matched the entire candidate R2D2. Group 1 would have matched R, group 2 would have matched 2, group 3 would have matched D, and group 4 would have matched 2.


Team LiB
Previous Section Next Section