Menu

BufferedFileReader

ENGITEX

There is little difference in terms of performance how a small file (up to a few Mb) will be parsed on a modern laptop.
However, large files should still be parsed in a smart way. Two examples of large files (both text and binary) are:
- detailed CAD models;
- outputs of scientific and engineering simulations.

As an example, 4 Gb file can be read line-by-line (in case of text file) or byte-by-byte (in the binary case)
but this approach does not allow any vectorized operations on the data thus only a very naive data processing algorithm can be used. Moreover, a huge number of read-from-disk operations will strongly deteriorate the performance.

Another limiting case is reading the entire file into RAM in one go.
Altough it would make use of more advanced processing algorithms possible, the conventional laptops do not have enough free RAM to read the entire file.

The right way to read such file is reading chunks of it into a buffer. An optimal buffer size is often found from the following:
- available RAM;
- optimal data size for the methods used for processing;
- size of a data structure in the file that should be preferrably read within one read operation (e.g. data under a tag, a sub-mesh, etc.)

A plot shows how parsing performance (run-time) changes with buffer size:

Implementing data parallelism for reading a large file can further improve the performance.
Apparently parallelism with no buffering can, under certain circumstances, make disk access a "bottleneck" reducing the benefits of parallelism.
An example of parallel parsing of CAD binary file is shown in BufferedFileReaderExample.java

The main classes are:

BinaryFileReader - copies the API of FileReader for text files: https://docs.oracle.com/javase/8/docs/api/java/io/FileReader.html
int BinaryFileReader.read(byte[] buff, int offset, int n) returns -1 if end-of-file or an error occurred.

BufferedLineIterator - allows to simply iterate over the lines.

BufferedLineReader - does more or less the same, i.e. holds a list of strings (lines), but gives user more flexibility by returning an entire list rather than a single line.

BufferedDataIterator - allows iterating over data in a binary file.

BufferedByteReader - return byte[] byteArray from byte[] byteArray = getBytes(appendByteIndex, skip);
If appendByteIndex > -1 then the byte array will begin with the byte at position (appendByteIndex + skip) in the file,
otherwise - with the byte at position skip. Normally it is recommended to either "append" or "skip".

None of these readers/iterators have to be closed with a method like close(). The actual data retrieval is always from a buffer in RAM.
The respective file is not kept open for long.


Related

Wiki: Home

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.