JRecord Wiki

Read Cobol data files in Java

Status: Beta

Brought to you by: bruce_a_martin

File Organisation

JRecord File Structures / File Organisation

Unlike on Linux / Windows most Text files are going to have \n or \r\n marking the End of Line.
On The mainframe the situation is quite different; the 2 main ways for organising lines are

FB - Fixed length records - every record is a constant fixed length; there are no \n (or \r) line markers
VB - Variable Length record - each record starts with a Record Length and is followed the lines text or data. Again there is no \n (or \r) line markers

FB File (constant recordlength records, no eol marker):

 (Record  1 --------------------)
 (Record  2 --------------------)
 (Record  3 --------------------)
 (Record  4 --------------------)

VB File (length at the start of the record):

 (Record Length)(Record  1 ----------------------------)
 (Record Length)(Record  2 ----------)
 (Record Length)(Record  3 ----------------------)
 (Record Length)(Record  4 ---------------)

For Linux / Windows Cobols there normally three types

FB - Exactly the same as the Mainframe
VB - Similar to the mainframe, each Cobol does it differently though
Line-Sequential - Standard Linux / Windows Text file

In JRecord, I try to handle all these different file Types. The setFileOrganisation controls which IO routine is used.
JRecord includes the following options:

Constants.IO_DEFAULT - JRecord will decide the actual method based on other values (The Default Value). It is generally better to explicitly set the File-Organisation (or file-Structure).
Constants.IO_STANDARD_TEXT_FILE - Standard Windows/*nix/Mac text file using \n, \n\r etc as a record (or line) delimiter.
Constants.IO_UNICODE_TEXT - Standard Windows/*nix/Mac Unicode / double byte text file using \n, \n\r etc as a record (or line) delimiter. It ensures record are stored in character format (instead of bytes).
Constants.IO_FIXED_LENGTH - Every Record (or line) is a standard Fixed length based on the Maximum schema length.
Constants.IO_FIXED_LENGTH_CHAR - Fixed length character file (typically used for Fixed-Length unicode files).
Constants.IO_VB - Mainframe VB (Variable Record length file). Records consist of a Record-Length followed by the Record-Data.
Constants.IO_VB_DUMP - Raw Block format of a Mainframe-VB file. You get this format if you specify RECFM=U when reading it on the mainframe.
Constants.IO_VB_GNU_COBOL - GNU (open-Cobol) VB format.

If you are confused which structure to use, the Record Editor Generate will read your file and work out the FileOrganisation to use
and generate Java~JRecord code for you.

Behind the scenes JRecord has an interface IByteReader which has a key method read which read a line (or record) from the input file as an array of bytes.

public abstract byte[] read() throws IOException;

This leads to a logical class structure of

            + - Windows/Linux_Text_File_Reader Class    
            ! - FB-Reader                      Class                                   
            ! - Mainframe VB                   Class
IByteReader + - Fujitsu Cobol VB               Class        
            ! - GNU_Cobol VB                   Class
            + - Various Other IO               Class

Why does the Mainframe use FB / VB

The mainframe (and Cobol) use FB / VB for 2 reasons:

Performance - a lot of processing is used in checking if each character is a \n
Support For Binary Data - The FB/VB allow you to read/write binary data directly to the file

You will find the new binary protocols like Protocol Buffers
and Avro have there own VB like format for exactly the same reason as on the Mainframe.