Validating an EDI Document

This next example is taken from a posting on the Sun site. A programmer needs help validating an Electronic Data Interchange (EDI) document. He needs to make sure that the String ISA always occurs before the String IEA, and that each occurs only once. He provided the sample input ISA*XX*XXXXXXXXXXXXXXX*XX*XXXXXXXXXXXXXXX*030130*0912*~IEA*1*000005900~.

This problem is a candidate for the push technique, because it's fairly clear that I'll have to push the data into a pattern. To simplify the problem, I decide to deal in the abstract a bit. Instead of the strings ISA and IEA, I decide to use the @ sign and the # sign. Furthermore, I decide that everything—all the stuff in between @ and #—is a number. These are just logical placeholders, for my own benefit. I want to be able abstract away some of the messy details.

Note

If you happened to have liked mathematics in school, you'll notice that this is similar to the algebraic technique of factoring out messy subexpressions and referring to them using a simple variable.

Now I'll see if I can take this anywhere with the reasoning in Table 5-6.

Table 5-6: Pulling a General Regex Pattern from @45#78
Step	What I Did	Why I Did It	Justification	Resulting Pattern
Step 1	Nothing	Initial state	N/A	@45#87
Step 2	Substituted *[^@]* for 4	To get a more generic description	The only distinguishing feature of 4 is that it's not @, hence *[^@]*.	@[^@]5#7
Step 3	Substituted *[^@]* for 5	To get a more generic description	The only distinguishing feature of 5 is that it's not @.	@[^@][^@]#7
Step 4	Swapped in [^@]* for *[^@][^@]*	To get a more generic description	[^@]* is a superset of *[^@]*.	@[^@]*#7
Step 5	Swapped in *([^@][^#])* for 7	To get a more generic description	The only distinguishing feature of 7 is that it's not @ or #.	@[^@]*#([^@][^#])8
Step 6	Swapped in *([^@][^#])* for 8	To get a more generic description	The only distinguishing feature of 8 is that it's not @ or #.	@[^@]*#([^@][^#])([^@][^#])
Step 7	Swapped in ([^@][^#])* for *([^@][^#])([^@][^#])*	To get a more generic description	([^@][^#])* is a superset of *([^@][^#])([^@] [^#])*.	@[^@]#([^@][^#])

I think I've taken that about as far as I can. Now I'll start stepping away from the abstract and heading back toward what I actually wanted. Table 5-7 breaks down my reasoning.

Table 5-7: Pulling an EDI Regex out of *@[^@]*#([^@][^#]*)*
Step	What I Did	Why I Did It	Justification	Resulting Pattern
Step 8	Nothing	Initial state	N/A	@[^@]#([^@][^#])
Step 9	Substitute ISA for @	To get a more specific description	@ was always just a stand-in for ISA.	ISA[^ISA]#([^ISA][^#])
Step 10	Substitute IEA for #	To get a more specific description	# was always just a stand-in for IEA.	ISA[^ISA]IEA([^ISA][^IEA])
Step 11	Added ?: inside *([^ISA][^IEA])*	To improve efficiency	I don't need a capturing group.	ISA[^ISA]IEA(?:[^ISA][^IEA])