Menu

BufferedFileReader

ENGITEX

BufferedFileReader

There is little difference in terms of performance how a small file (up to a few Mb) will be parsed on a modern laptop.
However, large files always should be parsed in a smart way. Two common examples of large files are:
- detailed CAD models (mostly binary);
- outputs of scientific and engineering simulations (sometimes text).

As an example, 4 Gb file can be read line-by-line (in case of text file) or byte-by-byte (in the binary case) but this approach does not allow any vectorized operations on the data (e.g., if some efficient string search has to be performed), and thus only naive data processing algorithms can be implemented. Moreover, a huge number of read-from-disk operations will strongly deteriorate the performance.

Another limiting case is reading the entire file into RAM in one go. Altough it would make use of more advanced processing algorithms possible, the conventional laptops do not have enough free RAM to read the entire file.

The right way to read such file is reading chunks of it into a buffer. An optimal buffer size is often found from the following:
- available RAM;
- optimal data size for the methods used for processing;
- size of a data structure in the file that should be preferrably read within one read operation (e.g. data under a tag, a sub-mesh, etc.)

A plot shows how parsing performance (run-time) changes with buffer size:

Implementing data parallelism for reading a large file can further improve the performance. Apparently parallelism with no buffering can, under certain circumstances, make disk access a "bottleneck" reducing the benefits of parallelism. An example of parallel parsing of CAD binary file is shown in Program.cs

Further see [Class description and API]


Related

Wiki: Class description and API
Wiki: Home