Building on the previous example, I decide to provide a utility for searching the content of a file and returning all matching strings within that file. I'll use FileChannels for the actual file I/O. Although a discussion of FileChannels is beyond the scope of this book, in my opinion they're the best way to access files in Java.
My strategy is to use a FileChannel to open a file, read its content into a String, release the FileChannel, and then use the searchString method to parse the String. This is faster than reading through the file line by line and examining its content, though it is memory intensive. Listing 5-7 shows the code for doing this.
![]() |
01 /** 02 * extracts the content of a file 03 * @param String fileName the name of the file to extract 04 * @throws IOException 05 * 06 * @return String representing the contents of the file 07 */ 08 public static String getFileContent(String fileName) 09 throws IOException{ 10 String retval = null; 11 //get access to the FileChannel 12 FileInputStream fis = 13 new FileInputStream(fileName); 14 FileChannel fc = fis.getChannel(); 15 //get the file content 16 retval = getFileContent(fc); 17 //close up shop 18 fc.close(); 19 fc = null; 20 return retval; 21 } 22 /** 23 * extracts the content of a file 24 * @param String fileName the name of the file to extract 25 * @throws IOException 26 * 27 * @return String representing the contents of the file 28 */ 29 private static String getFileContent(FileChannel fc) 30 throws IOException{ 31 String retval = null; 32 //read the contents of the FileChannel 33 ByteBuffer bb = ByteBuffer.allocate((int)fc.size()); 34 fc.read(bb); 35 //save the contents as a string 36 bb.flip(); 37 retval = new String(bb.array()); 38 bb = null; 39 return retval; 40 }
![]() |
Next, I need to provide a method that will load the file, search it, and return the results. Given the two previous methods, this becomes fairly easy, as shown in Listing 5-8.
![]() |
01 public static Map searchFile( 02 String file, 03 String searchPattern, 04 int flags 05 ) throws IOException 06 { 07 String fileContent = getFileContent(file); 08 Map retval = searchFile(fileContent,searchPattern,flags); 09 return retval; 10 }}
![]() |
I take the program out for a spin and compare it to grep. To be honest, it seems to lack a bit in the comparison. The grep program returns the entire line of a matching token, whereas this method only returns the matching token. That's not terrible, because the client could request the entire line by using the correct regex pattern. But it's not really as friendly as it could be, especially for the average user.
I decide to "pad" the pattern to capture an entire line, assuming that the original search pattern has no punctuation, and thus no regex, in it. Listing 5-9 shows my modified searchFile method.
![]() |
01 public static Map searchFile( 02 String file, 03 String searchPattern, 04 int flags 05 ) throws IOException 06 { 07 String fileContent = getFileContent(file); 08 //if the search pattern doesn't have any punctuation 09 //then assume it's not a regular expression and extract 10 //the entire line in which it was found 11 String[] regexTokens = searchPattern.split("\\p{Punct}"); 12 if (regexTokens.length == 1) 13 { 14 searchPattern = "^.*"+ searchPattern+".*$"; 15 } 16 Map retval = searchString(fileContent,searchPattern,flags); 17 return retval; 18 }
![]() |
At this point, there should be some reasonable questions on your mind. Isn't this supposed to be a regex book? There wasn't anything particularly regex-like about the search file and search string methods; they were pretty much straight Java code, which you already know how to write. What's going on?
The point here is that regex is just a tool. It doesn't change the fact that you're still writing Java code, and you need to follow good, modular, object-oriented principles, even as you're working with regular expressions. Regex allows you to bridge trouble spots you might never have crossed otherwise, but it's just a tool. Like any well-built engine, the java.util.regex engine announces its excellence by humming quietly along and not forcing you to worry about it.
Another valid question at this point is, what if the content of the file you're trying to parse is too large to make reading all of it into memory a practical option? In general, you have two paths you can take here. You can use one of the new Java features, such as MappedByteBuffers, or you can split the file into manageable sections and parse each of those in turn.
If you decide to use MappedByteBuffers for regex, Listing 5-10 contains an example showing how. I'm hesitant, however, to advocate MappedByteBuffers with regex too strongly for three reasons. First and foremost, their behavior is very system dependent, so you should probably rule them out if you need platform independence. Second, even within a given platform, their behavior isn't well defined. Thus, depending on what else you're doing with your operating system, you could get inconsistent results. Third, you need to consider the fact that, if the entire file can't be loaded into memory at one time, trying to apply a pattern that might have wildcards in it is going to be a tricky affair.
![]() |
01 public static boolean getFileContentUsingMappedByteBuffer 02 ( 03 String fileName 04 ) throws IOException 05 { 06 boolean retval = false; 07 RandomAccessFile raf = new RandomAccessFile(fileName,"rwd"); 08 FileChannel fc = raf.getChannel(); 09 MappedByteBuffer mbb = 10 fc.map(FileChannel.MapMode.READ_WRITE,0,fc.size()); 11 CharSequence cb = mbb.asCharBuffer(); 12 return retval; 13 }
![]() |
You may very well need to reconsider your patterns, and break the file up into logical blocks based on your insight into its structure. One strategy might be to check the size of the file and divide that by 10, 100, or whatever fraction is easily loadable given your system's memory limitations, and then search that portion. Although this isn't ideal, it is more predictable than the corresponding mapped-memory approach. The bottom line is that regardless of the regex flavor or provider you use, very large files require special treatment.