Team LiB
Previous Section Next Section

Batch Reads and Writes

Another important connection issue is the willingness to read the entire contents of a file into memory while you work. Although this principle can sometimes backfire with extremely large files, I've found that it generally works out well for me. The most expensive part of your operations is often going to be the I/O transaction time. It may be imminently reasonable to trade memory for I/O usage, and thus save repeated I/O calls, depending on your situation.

Generally speaking, I've found that I'm better off reading the content into memory, manipulating it there, and then writing it back to the file as I need to. I've found this more efficient than reading a bit of a file, making some changes, writing those out, reading more of the file, and so on.

Note 

If I'm working with very large files, I've sometimes found it necessary to ignore the preceding advice and actually read parts of the data at a time. That's because a pattern can easily describe a section that might require a match at the beginning of the file as well as the end, thus potentially requiring you to keep the entire file in memory. With extremely large files, this is simply not possible.

In these sorts of scenarios, I'll optimize my expressions to the best of my ability and apply them a section at a time. If that's not sufficient, I'll write a custom Java program that's uniquely designed to parse the specific file. The first pass might look for the opening sequence, the second pass might look within that section for a submatch, and so on.


Team LiB
Previous Section Next Section