What does the pattern <((?i)TITLE>)(.*?)</\1 break down to? | ||
How do I know if my regex is too complex? |
Answers
The answer is given in Table 4-1. Of particular interest is the subgroup (.*?). Notice that this is a reluctant qualifier, thus it will only match as little as possible before seeing the next <title> element. The difference here is that given <title>first title</title><title>second title></title>, the pattern will only extract first title. However, without the reluctant qualifier, it would extract first title</title><title>second title>. |
|
The first goal of any regex pattern is, of course, that it works accurately and efficiently enough. The second goal is that it be legible. How do you know if it's legible? My advice is comment it with as much detail as you feel it needs, and then pass it to a few developers who are likely to have to decipher it. If they follow it (or better yet, if they're able to modify it), then it's probably clear enough. If not, then you may want to consider refactoring. |