< Free Open Study > |
2.1 About the ExamplesThis chapter takes a few sample problems — validating user input; working with email headers; converting plain text to HTML — and wanders through the regular expression landscape with them. As I develop them, I'll "think out loud" to offer a few insights into the thought processes that go into crafting a regex. During our journey, we'll see some constructs and features that egrep doesn't have, and we'll take plenty of side trips to look at other important concepts as well. Toward the end of this chapter, and in subsequent chapters, I'll show examples in a variety of languages including Java and Visual Basic .NET, but the examples throughout most of this chapter are in Perl. Any of these languages, and most others for that matter, allow you to employ regular expressions in much more complex ways than egrep, so using any of them for the examples would allow us to see interesting things. I choose to start with Perl primarily because it has the most ingrained, easily accessible regex support among the popular languages. Also, Perl provides many other concise data-handling constructs that alleviate much of the "dirty work" of our example tasks, letting us concentrate on regular expressions. Just to quickly demonstrate some of these powers, recall the file-check example from Section 1.1, where I needed to ensure that each file contained 'ResetSize' exactly as many times as 'SetSize'. The utility I used was Perl, and the command was: % perl -0ne 'print "$ARGV\n" if s/ResetSize//ig != s/SetSize//ig' * (I don't expect that you understand this yet — I hope merely that you'll be impressed with the brevity of the solution.) I like Perl, but it's important not to get too caught up in its trappings here. Remember, this chapter concentrates on regular expressions. As an analogy, consider the words of a computer science professor in a first-year course: "You're going to learn computer-science concepts here, but we'll use Pascal to show you." [1]
Since this chapter doesn't assume that you know Perl, I'll be sure to introduce enough to make the examples understandable. (Chapter 7, which looks at all the nitty-gritty details of Perl, does assume some basic knowledge.) Even if you have experience with a variety of programming languages, normal Perl may seem quite odd at first glance because its syntax is very compact and its semantics thick. In the interest of clarity, I won't take advantage of much that Perl has to offer, instead presenting programs in a more generic, almost pseudo-code style. While not "bad," the examples are not the best models of The Perl Way of programming. But, we will see some great uses of regular expressions. 2.1.1 A Short Introduction to PerlPerl is a powerful scripting language first developed in the late 1980s, drawing ideas from many other programming languages and tools. Many of its concepts of text handling and regular expressions are derived from two specialized languages called awk and sed, both of which are quite different from a "traditional" language such as C or Pascal. Perl is available for many platforms, including DOS/Windows, MacOS, OS/2, VMS, and Unix. It has a powerful bent toward text handling, and is a particularly common tool used for Web-related processing. See www.perl.com for information on how to get a copy of Perl for your system. This book addresses the Perl language as of Version 5.8, but the examples in this chapter are written to work with versions as early as Version 5.005. Let's look at a simple example: $celsius = 30; $fahrenheit = ($celsius * 9 / 5) + 32; # calculate Fahrenheit print "$celsius C is $fahrenheit F.\n"; # report both temperatures When executed, this produces: 30 C is 86 F. Simple variables, such as $fahrenheit and $celsius, always begin with a dollar sign, and can hold a number or any amount of text. (In this example, only numbers are used.) Comments begin with # and continue for the rest of the line. If you're used to languages such as C, C#, Java, or VB.NET, perhaps most surprising is that in Perl, variables can appear within a double-quoted string. With the string "$celsius C is $fahrenheit F.\n", each variable is replaced by its value. In this case, the resulting string is then printed. (The \n represents a newline.) Perl offers control structures similar to other popular languages:
$celsius = 20;
while ($celsius <= 45)
{
$fahrenheit = ($celsius * 9 / 5) + 32; # calculate Fahrenheit
print "$celsius C is $fahrenheit F.\n";
$celsius = $celsius + 5;
}
The body of the code controlled by the while loop is executed repeatedly so long as the condition (the $celsius <= 45 in this case) is true. Putting this into a file, say temps, we can run it directly from the command line. Here's how a run looks:
% perl -w temps
20 C is 68 F.
25 C is 77 F.
30 C is 86 F.
35 C is 95 F.
40 C is 104 F.
45 C is 113 F.
The -w option is neither necessary nor has anything directly to do with regular expressions. It tells Perl to check your program more carefully and issue warnings about items it thinks to be dubious, (such as using uninitialized variables and the like — variables do not normally need to be predeclared in Perl). I use it here merely because it is good practice to always do so. Well, that's it for the general introduction to Perl. We'll move on now to see how Perl allows us to use regular expressions. |
< Free Open Study > |