6.3 Benchmarking

Because this chapter talks a lot about speed and efficiency, and I often mention benchmarks I've done, I'd like to mention a few principles of benchmarking. I'll also show simple ways to benchmark in a few languages.

Basic benchmarking is simply timing how long it takes to do some work. To do the timing, get the system time, do the work, get the system time again, and report the difference between the times as the time it took to do the work. As an example, let's compare ^(a|b|c|d|e|f|g)+$ with ^[a-g]+$. We'll first look at benchmarking in Perl, but will see it in other languages in a bit. Here's a simple (but as we'll see, somewhat lacking) Perl script:

     use Time::HiRes 'time'; # So time() gives a high-resolution value.



     $StartTime = time();

     "abababdedfg" =~ m/^(a|b|c|d|e|f|g)+$/;

     $EndTime = time();

     printf("Alternation takes %.3f seconds.\n", $EndTime - $StartTime);



     $StartTime = time();

     "abababdedfg" =~ m/^[a-g]+$/;

     $EndTime = time();

     printf("Character class takes %.3f seconds.\n", $EndTime - $StartTime);

It looks (and is) simple, but there are some important points to keep in mind while constructing a test for benchmarking:

Time only "interesting" work Time as much of the "work" as possible, but as little "non-work" as possible. If there is some initialization or other setup that must done, do it before the starting time is taken. If there's cleanup, do it after the ending time is taken.
Do "enough" work Often, the time it takes to do what you want to test is very short, and a computer's clock doesn't have enough granularity to give meaning to the timing.
When I run the simple Perl test on my system, I get
```
     Alternation takes 0.000 seconds.
     Character class takes 0.000 seconds.
```
which really doesn't tell me anything other than both are faster than the shortest time that can be measured. So, if something is fast, do it twice, or 10 times, or even 10,000,000 times—whatever is required to make "enough" work. What is "enough" depends on the granularity of the system clock, but most systems now have clocks accurate down to 1/100^th of a second, and in such cases, timing even half a second of work may be sufficient for meaningful results.
Do the "right" work Doing a very fast operation ten million times involves the overhead of ten million updates of a counter variable in the block being timed. If possible, it's best to increase the amount of real work being done in a way that doesn't increase the overhead work. In our Perl example, the regular expressions are being applied to fairly short strings: if applied to much longer strings, they'd do more "real" work each time.

So, taking these into account, here's another version:

     use Time::HiRes 'time'; # So time() gives a high-resolution value.

     $TimesToDo = 1000;                     # Simple setup

     $TestString = "abababdedfg" x 1000;    # Makes a huge string



     $Count = $TimesToDo;

     $StartTime = time();

     while (--$Count > 0) {

          $TestString =~ m/^(a|b|c|d|e|f|g)+$/;

     }

     $EndTime = time();

     printf("Alternation takes %.3f seconds.\n", $EndTime - $StartTime);



     $Count = $TimesToDo;

     $StartTime = time();

     while (--$Count > 0) {

          $TestString =~ m/^[a-g]+$/;

     }

     $EndTime = time();

     printf("Character class takes %.3f seconds.\n", $EndTime - $StartTime);

Notice how the $TestString and $Count are initialized before the timing starts? ($TestString is initialized with Perl's convenient x operator, which replicates the string on its left as many times as the number on its right.) On my system, with Perl 5.8, this prints:

     Alternation takes 7.276 seconds.
     Character class takes 0.333 seconds.

So, with this test case, one is about 11x faster than the other. The benchmark should be executed a few times, with the fastest times taken, to lessen the impact of sporadic background system activity.

6.3.1 Know What You're Measuring

It might be interesting to see what happens when the initialization is changed to:

     my $TimesToDo = 1000000;

     my $TestString = "abababdedfg";

Now, the test string is 1,000 times shorter, but the test is done 1,000x more times. Conceptually, the amount of "work" should be the same, but the results are quite different:

     Alternation takes 18.167 seconds.
     Character class takes 5.231 seconds.

Both are slower than before, but the difference between the two is now a factor of only 3.5x. Why the relative difference ? After the change, there's more "non-work" overhead than before. That is, the update and testing of $Count, and the setup of the regex engine, now happen 1,000x more than before.

These tests are about 5.3 and 4.1 seconds slower than the first tests. Most of the extra time is probably the overhead of working with $Count. The fact that ^(a|b|c|d|e|f|g)+$ is hit relatively harder (5.3 seconds slower than the first time, rather than 4.1 seconds slower) may reflect additional pre-match (or earlymatch) setup by the regex engine before getting into the main part of the match.

In any case, the point of this change is to illustrate that the results are strongly influenced by how much real work vs. non-work overtime is part of the timing.

6.3.2 Benchmarking with Java

Benchmarking Java can be a slippery science, for a number of reasons. Let's first look at a somewhat naïve example, and then look at why it's naïve, and at what can be done to make it less so. The listing below shows the benchmark example with Java, using Sun's java.util.regex.

Notice how the regular expressions are compiled in the initialization part of the program? We want to benchmark the matching speed, not the compile speed.

Speed is dependent upon which virtual machine (VM) is used. Sun standard JRE ^[4] comes with two virtual machines, a client VM optimized for fast startup, and a server VM optimized for heavy-duty long-haul work.

^[4] I had to use a shorter string for this test to run on my Linux system, as a longer string somehow tickles a problem with the VM, causing the test to abort. Engineers at Sun tell me it's due to an unexpected interaction between the aggressively optimizing C compiler used to build the VM (gcc), and an overly conservative use of Linux's stack-monitoring hooks. It may be fixed as early as Java 1.4.1. To compensate for the shortened string in the current test, I've increased the number of times the loop executes the match, so these results should be comparable to the original.

Benchmarking with java.util.regex

     import java.util.regex.*;

     public class JavaBenchmark {

       public static void main(String [] args)

      {

         Matcher regex1 = Pattern.compile("^(a|b|c|d|e|f|g)+$").matcher("");

         Matcher regex2 = Pattern.compile("^[a-g]+$").matcher("");

         long timesToDo = 4000;



        StringBuffer temp = new StringBuffer();

        for (int i = 250; i > 0; i--)

             temp.append("abababdedfg");

        String testString = temp.toString();



       // Time first one . . .

       long count = timesToDo;

       long startTime = System.currentTimeMillis();

       while (--count > 0)

            regex1.reset(testString).find();

       double seconds = (System.currentTimeMillis() - startTime)/1000.0;

       System.out.println("Alternation takes " + seconds + " seconds");



       // Time second one . . .

       count = timesToDo;

       startTime = System.currentTimeMillis();

       while (--count > 0)

            regex2.reset(testString).find();

       seconds = (System.currentTimeMillis() - startTime)/1000.0;

       System.out.println("Character Class takes " + seconds + " seconds");

     }

   }

On my system, running the benchmark on the client VM produces:

     Alternation takes 19.318 seconds
     Character Class takes 1.685 seconds

while the server VM yields:

     Alternation takes 12.106 seconds
     Character Class takes 0.657 seconds

What makes benchmarking slippery, and this example somewhat naïve, is that the timing can be highly dependent on how well the automatic pre-execution compiler works, or how the run-time compiler interacts with the code being tested. Some VM have a JIT (Just-In-Time compiler), which compiles code on the fly, just before it's needed.

Sun's Java 1.4 has what I call a BLTN (Better-Late-Than-Never) compiler, which kicks in during execution, compiling and optimizing heavily-used code on the fly. The nature of a BLTN is that it doesn't "kick in" until it senses that some code is "hot" (being used a lot). A VM that's been running for a while, such as in a server environment, will be "warmed up," while our simple test ensures a "cd" server (nothing yet optimized by the BLTN).

One way to see "warmed up" times is to run the benchmarked parts in a loop:

     // Time first one . . .

     for (int i = 4; i > 0; i--)

     {

          long count = timesToDo;

          long startTime = System.currentTimeMillis();

          while (--count > 0)

                regex1.reset(testString).find();

          double seconds = (System.currentTimeMillis() - startTime)/1000.0;

          System.out.println("Alternation takes " + seconds + " seconds");

     }

If the extra loop runs enough times (say, for 10 seconds), the BLTN will have optimized the hot code, leaving the last times reported as representative of a warmedup system. Testing again with the server VM, these times are indeed a bit faster by about 8% and 25%:

     Alternation takes 11.151 seconds
     Character Class takes 0.483 seconds

Another issue that makes Java benchmarking slippery is the unpredictable nature of thread scheduling and garbage collection. Again, running the test long enough helps amortize their unpredictable influence.

6.3.3 Benchmarking with VB.NET

The benchmark example in VB.NET is shown in the listing below. On my system, it produces:

     Alternation takes 13.311 seconds
     Character Class takes 1.680 seconds

The .NET Framework allows a regex to be compiled to an even more efficient form, by providing RegexOptions.Compiled as a second argument to each Regex constructor (see Section 9.1.1.4). Doing that results in:

     Alternation takes 5.499 seconds
     Character Class takes 1.157 seconds

Both tests are faster using the Compiled option, but alternation sees a greater relative benefit (its almost 3x faster when Compiled, but the class version is only about 1.5x faster), so it seems that alternation benefits from the more efficient compilation relatively more than a character class does.

Benchmarking with VB.NET

     Option Explicit On

     Option Strict On



     Imports System.Text.RegularExpressions



     Module Benchmark

     Sub Main()

        Dim Regex1 as Regex = New Regex("^(a|b|c|d|e|f|g)+$")

        Dim Regex2 as Regex = New Regex("^[a-g]+$")

        Dim TimesToDo as Integer = 1000

        Dim TestString as String = ""

        Dim I as Integer

        For I = 1 to 1000

             TestString = TestString & "abababdedfg"

        Next



        Dim StartTime as Double = Timer()

        For I = 1 to TimesToDo

             Regex1.Match(TestString)

        Next

        Dim Seconds as Double = Math.Round(Timer() - StartTime, 3)

        Console.WriteLine("Alternation takes " & Seconds & " seconds")



        StartTime = Timer()

        For I = 1 to TimesToDo

             Regex2.Match(TestString)

        Next

        Seconds = Math.Round(Timer() - StartTime, 3)

        Console.WriteLine("Character Class takes " & Seconds & " seconds")

     End Sub

     End Module

6.3.4 Benchmarking with Python

The benchmark example in Python is shown in the listing below.

For Python's regex engine, I had to cut the size of the string a bit because the original causes an internal error ("maximum recursion limit exceeded") within the regex engine. To compensate, I increased the number of times the test is done by a proportional amount.

On my system, the benchmark produces:

     Alternation takes 10.357 seconds
     Character Class takes 0.769 seconds

Benchmarking with Python

     import re

     import time

     import fpformat



     Regex1 = re.compile("^(a|b|c|d|e|f|g)+$")

     Regex2 = re.compile("^[a-g]+$")



     TimesToDo = 1250;

     TestString = ""

     for i in range(800):

          TestString += "abababdedfg"



     StartTime = time.time()

     for i in range(TimesToDo):

          Regex1.search(TestString)

     Seconds = time.time() - StartTime

     print "Alternation takes " + fpformat.fix(Seconds,3) + " seconds"



     StartTime = time.time()

     for i in range(TimesToDo):

          Regex2.search(TestString)

     Seconds = time.time() - StartTime

     print "Character Class takes " + fpformat.fix(Seconds,3) + " seconds"

6.3.5 Benchmarking with Ruby

Here's the benchmark example in Ruby:

     TimesToDo=1000

     testString=""

     for i in 1..1000

          testString += "abababdedfg"

     end



     Regex1 = Regexp::new("^(a|b|c|d|e|f|g)+$");

     Regex2 = Regexp::new("^[a-g]+$");



     startTime = Time.new.to_f

     for i in 1..TimesToDo

          Regex1.match(testString)

     end

     print "Alternation takes .3f seconds\n" % (Time.new.to_f - startTime);



     startTime = Time.new.to_f

     for i in 1..TimesToDo

          Regex2.match(testString)

     end

     print "Character Class takes .3f seconds\n" % (Time.new.to_f - startTime);

On my system, it produces:

     Alternation takes 16.311 seconds
     Alternation takes 16.311 seconds

6.3.6 Benchmarking with Tcl

Here's the benchmark example in Tcl:

     set TimesToDo 1000

     set TestString ""

     for {set i 1000} {$i > 0} {incr i -1} {

          append TestString "abababdedfg"

     }



     set Count $TimesToDo

     set StartTime [clock clicks -milliseconds]

     for {} {$Count > 0} {incr Count -1} {

          regexp {^(a|b|c|d|e|f|g)+$} $TestString

     }

     set EndTime [clock clicks -milliseconds]

     set Seconds [expr ($EndTime - $StartTime)/1000.0] 

     puts [format "Alternation takes .3f seconds" $Seconds]



     set Count $TimesToDo

     set StartTime [clock clicks -milliseconds]

     for {} {$Count > 0} {incr Count -1} {

          regexp {^[a-g]+$} $TestString 

     }

     set EndTime [clock clicks -milliseconds]

     set Seconds [expr ($EndTime - $StartTime)/1000.0]

     puts [format "Character class takes .3f seconds" $Seconds]

On my system, this benchmark produces:

     Alternation takes 0.362 seconds
     Character class takes 0.352 seconds

Wow, they're both about the same speed! Well, recall from the table on Section 4.1.3 that Tcl has a hybrid NFA/DFA engine, and these regular expressions are exactly the same to a DFA engine. Most of what this chapter talks about simply does not apply to Tcl. See the sidebar in Section 6.4.4.1.1 for more.

< Free Open Study >