When you have a hammer, everything starts to look like a nail. It's important to be aware that not all text-parsing problems require a regex solution. For example, say you need to break a comma-delimited String into its various components. Of course, it's easy enough to write a regex that does this. However, you don't need regular expressions for this problem; the StringTokenizer is enough.
Along the same lines, you don't have to limit yourself exclusively to a regex or a traditional Java solution; you can mix and match. For example, say you need to parse a log file and identify the type and frequency of the Exceptions in it. You could probably write a regex that does this for you in a single line—I'm not smart or patient enough to do this, but there are probably plenty of people who are.
However, I would contend that this probably isn't the correct approach in Java. By the time you're done writing, testing, and documenting the regex, the other programmers on your team will be trembling in fear at the thought of maintaining your code. It's probably easier to take a programmatic solution that takes advantage of regular expressions, as opposed to writing a pure regex solution. Such a solution is presented in Listing 4-9.
The code in Listing 4-11 is self-sufficient and should be ready to go as is. However, it requires two small accommodations. First, you'll have to define a regexCache.txt file to hold your regex pattern, as discussed previously for Listing 4-7. Second, you'll have to define a regex entry in that file for the key exRegex. The value of the entry should be \s([a-zA-Z.]*\.[a-zA-Z.]*Exception). Then, to run the code, you simply have to point to a log file that contains exceptions.
![]() |
import java.util.regex.*; import java.util.*; import java.io.*; import java.nio.*; import java.nio.channels.*; import java.beans.*; /** * This class parses the log, and identifies all of the * exceptions thrown, as well as the frequency with which * they occurred. * *@ author M Habibi */ public class LogParser{ /** * the key of the regex that applies the exception * pattern */ public static final String REGEX_KEY="exRegex"; /** * The name of the file that contains the regex keys. * This should probably be extracted from a properties file, * but we'll leave it as is for right now. */ public static final String REGEX_KEY_FILE="regexCache.txt"; /** * Runs the program from the command line. * @param args[]. If the name of the log file * is passed in, then it is used. Otherwise, * the code looks for a file name 'server.log' */ public static void main(String args[]){ String logFile = "server.log"; if (args != null && args.length == 1){ logFile = args[0]; } examineLog(logFile); } /** * parses the log, and identifies all of the * exceptions thrown, as well as the frequency with which * they occurred. * * @param logFile the name of the log file to examine. * @param regexCacheFile the name of the file containing * the regex cache. * @param regexKey the name of the file containing the * key in the regecCacheFile. * @return a Map containing the names of the exceptions * found as keys, and their frequency as values */ public static Map examineLog( String logFile, String regexCacheFile, String regexKey){ //create a map that will preserve the order of the //exceptions as they occur Map retval = new LinkedHashMap(); //extract the regex String regex =getRegex(regexCacheFile,regexKey); //get the contents of the log file String fileContent = readFile(logFile); //compile the pattern, and mark the time //for its execution Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE); long startTime = System.currentTimeMillis() ; Matcher matcher = pattern.matcher(fileContent); //seek out matches. while (matcher.find()){ String exceptionName = matcher.group(1); incrementMapCount(retval,exceptionName); } long endTime = System.currentTimeMillis() ; long totalTime= endTime - startTime; //record the total processing time totalTime =totalTime/(long)1000; System.out.println("totalTime = " + totalTime); //display the output System.out.println("retval = " + retval); return retval; } /** * parses the log, and identifies all of the * exceptions thrown, as well as the frequency with which * they occurred. * * @param logFile the name of the log file to examine * @return a Map containing the names of the exceptions * found as keys, and their frequency as values */ public static Map examineLog(String logFile){ Map map =examineLog(logFile,REGEX_KEY_FILE,REGEX_KEY); return map; } /** * Extracts the contents of the given file at the given key. * This particular extraction process is specifically * expecting the content of the file to be a * non-Java-delimited regex pattern. * * @param fileName the name of the file that * has the regex pattern. * @param key the key that defines the regex in the file * @returns a string holding the content of the file * @author Mehran Habibi **/ private static String getRegex(String fileName, String key){ String retval = null; //get content of the file String content = readFile(fileName); //if the file has content, then try to find the key if (content != null) { //look for a beginning of line, followed by the key, //followed by an equal sign, and capture everything between //that key and the end of the line. String keyRegex = "^"+key+"=(.*)$"; Pattern pattern = Pattern.compile(keyRegex,Pattern.MULTILINE); Matcher matcher = pattern.matcher(content); if (matcher != null && matcher.find()) retval = matcher.group(1); } return retval; } /** * Increments the count of the exception name, or creates * a new entry for it. * @param map the map expected to hold String keys of the * exception name, and Integer values to track the count. * @param exceptionName the name of the exception being * tracked */ private static void incrementMapCount (Map map, String exceptionName) { Integer currentCount = (Integer)map.get(exceptionName); if (currentCount == null){ map.put(exceptionName, new Integer(1)); } else{ currentCount = new Integer(currentCount.intValue() + 1); map.put(exceptionName,currentCount); } } /** *Returns the content of the file in question as string. * @param fileName the name of the file in question * @return A string containing the file's content * */ private static String readFile(String fileName){ String retval = null; try{ //open a connection to the file FileInputStream fis = new FileInputStream(fileName); FileChannel fc = fis.getChannel(); //create a byte buffer big enough //to hold the content of the file, //and read the file content into it ByteBuffer bb = ByteBuffer.allocate((int)fc.size()); fc.read(bb); bb.flip(); //save the contents as a string retval = new String(bb.array()); //clean up fc.close(); bb= null; fis = null; fc= null; } catch(Exception e){ e.printStackTrace(); } return retval; } }
![]() |
Listing 4-11 is a long example, but it's worthwhile to look over. There's not a lot going on here in terms of regex complexity, but the code does a reasonable job of using a Java-based, code-oriented solution: It's not caught up in being a regex solution. In fact, the regular expressions are treated like simply another tool, much like the Map or the FileChannel. This is, I think, the way it should be.
The Pattern object is conditionally compiled in the examineLog method, the important regex pattern is externalized, and the approach is object oriented and logic based. You don't need to be regex expert to figure what's going on here, which is the point. Incidentally, when I ran it on my 1500 MHz laptop with 256 RAM, it executed in 4 seconds, even though the size of the log was 9.26MB.