Menu

FileIO

ENGITEX

Overview

At the lowest level OODB uses FileIO.dll to read/write data from/to the database file(s). A user might find FileIO useful in his own development tasks as well.
FileIO classes:
a) For reading BufferedFileReader classes are used. These are particularly useful for large files and/or parallel data processing due to buffering. Both binary and text files are supported.
b) For writing a binary file BinaryFileWriter is used. It either accepts a byte array for writing or can write a single value type, e.g. a float.
All classes allow easy reading/writing from/to a specific position in the file.

Buffering

There is little difference in terms of performance how a small file (up to several Mb) will be parsed on a modern laptop.
However, large files always should be parsed in a smart way. Two common examples of large files are:

  • detailed CAD models (mostly binary);
  • outputs of scientific and engineering simulations (sometimes text).

As an example, 4 Gb file can be read line-by-line (in case of text file) or byte-by-byte (in the binary case) but this approach does not allow any vectorized operations on the data (e.g., if some efficient string search has to be performed), and thus only naive data processing algorithms can be implemented. Moreover, a huge number of read-from-disk operations will strongly deteriorate the performance.

Another limiting case is reading the entire file into RAM in one go. Altough it would make use of more advanced processing algorithms possible, the conventional laptops do not have enough free RAM to read the entire file.

The right way to read such file is reading chunks of it into a buffer. An optimal buffer size is often found from the following:

  • available RAM;
  • optimal data size for the methods used in data processing;
  • size of a data structure in the file that should be preferrably read within one read operation (e.g. data under a tag, a sub-mesh, etc.)

Class description and API

Import

using FileIO.BinaryFileWriter;
using FileIO.BufferedFileReader;

Reading

BufferedFileReader library contains two classes for text (ASCII) file reading and two classes for binary files:

1. Class BufferedLineIterator allows to simply iterate over the lines as:

BufferedLineIterator bli = new BufferedLineIterator(fname, charlim, encoding);
/* fname - file name, string
  charlim - buffer size in bytes, int
  encoding - one the available encodings, e.g. Encoding.UTF8 */
while (bli.hasNext())
{
    Console.WriteLine(bli.next());
}

2. Class BufferedLineReader does more or less the same, i.e. holds a list of strings (lines), but gives user more flexibility by returning an entire list rather than a single line:

BufferedLineReader blr = new BufferedFileReader.BufferedLineReader(fname, charlim, encoding);
/* fname - file name, string
 charlim - buffer size in bytes, int
 encoding - one the available encodings, e.g. Encoding.UTF8
*/
while (!blr.isEOF())
{
    List<String> lines = blr.getLines(lastFunctionBeginLineIndex);
    /* lastFunctionBeginLineIndex - int, if > -1,
    the method will return the next list of lines beginning with the line that has index lastFunctionBeginLineIndex in the current list
    */

    // ... data processing ...

}

3. BufferedDataIterator allows iterating over data in a binary file like so:

BufferedDataIterator bdi = new BufferedDataIterator(fname, bytelim);
/* fname - file name, string
 bytelim - buffer size in bytes, int
 */

ushort var1 = bdi.getUInt16();
short var2 = bdi.getInt16();
uint var3 = bdi.getUInt32();
int var4 = bdi.getInt32();
ulong var5 = bdi.getUInt64();
long var6 = bdi.getInt64();
char ch1 = bdi.getChar8();
char ch2 = bdi.getChar16();
float f = bdi.getFloat();
double d = bdi.getDouble();

bdi.skip(n); // n - int, number of bytes to rewind forward 

4. Finally there is another class for binary files BufferedByteReader:

BufferedByteReader bbr = BufferedByteReader(fname, bytelim);
/* fname - file name, string
 bytelim - buffer size in bytes, int
 */

 byte[] byteArray = getBytes(appendByteIndex, skip);
 /* 
 appendByteIndex - int, 
 skip - int,
 if appendByteIndex > -1 then the byte array will begin with the byte at position (appendByteIndex + skip) in the file,
 otherwise - with the byte at position skip
 normally it is recommended to either "append" or "skip"
 */

Writing

BinaryFileWriter object is initialized as

BinaryFileWriter bfw = new BinaryFileWriter(fname);
// fname - String, path to file

Write operation follows as

bfw.write(bytes, fileOffset, arrOffset, lengthBytes);
/*
bytes - byte array byte[]
fileOffset - int, byte index position with respect to the 0th file byte
arrOffset - int, first byte index in the array for writing
lengthBytes - total number of bytes to write
*/

General remarks:

  • The objects of the described reader classes do not have to be "closed" like a regular file stream because the reading is always buffered and the file does not stay opened.
  • Read methods throw IOException if something goes wrong.

Text encoding

BufferedDataIterator allows reading either a char encoded in a single byte with
char ch = getChar8();
or a char encoded in 2 bytes:
char ch = getChar16();

BufferedLineReader assumes that chars are encoded in a single byte.

BinaryFileWriter writes char as 2 bytes.


Related

Wiki: Home

MongoDB Logo MongoDB