JRecord / Support Requests / #10 Reading large files

Bruce Martin - 2017-07-23

I really need to know more about to give a definitive answer
i.e.

Is it cobol file (I presume yes).

Is Standard Text file or Cobol binary file

Is it one file or generic file

Is the program to run on a server, powerful pc, netbook

What is the fileStructure i.e. do System.out.println(line.getLayout().getFileStructure());

how long are the records

How big are the files going to be

There are solutions that will work well in some cases

Some background

There is an overhead of 40 odd bytes per line; If lines are small this can
be very significant and needs to be looked at (e.g. if the Record-Length=20, 66% of your memory is
wasted; while if the Record-Length=1000, only 4% is wasted).

Storeing multiple records in an array of bytes and generating Lines only when needed will save some space when the
Record Length is small.
e.g. if the records are of similar length you could store them as

Array of bytes: <- Max Record Length -><- Max Record Length -><- Max Record Length -><- Max Record Length -><- Max Record Length -> | Record 1 || Record 2 || Record 3 || Record 4 || Record 5 |

If file is a Mainframe Fixed width file, you can calculate the byte address of each record as

Record_Position = (Record_Number - 1) * Record_Length

and then use Random_Access files to read in Records (suggest reading in several thousand records at a time).

Possible Solutions

RecordEditor

Use the FileView class of the RecordEditor to store the data. It alread optermises storing the data

Advantages:

already exists and is tested.

Has good optermisation for Cobol style Fixed Width records.

FileView implements the TabelModel interface

Disadvantages:

Has a GPL license so can only be used internally (and not sold as part of a package).

Might not work so well on a server (not sure really).

RecordEditor does not use the current JRecord (it split from 7/8 years ago and the code has diverged).
It does not have IOBuilders for example. I can help with this if need be, there would only be a couple of calls.

FixedQidth file

Read Records in as need using a ?Random access file. Suggest reading in records 100's / thousands at time

Low Level JRecord Routines

JRecords Low-Level ByteReaders do keep track of the Bytes-Read precisely for this sort of purpose.
ByteReaders have a method:

public long getBytesRead()

You could represent the file like:

List<FileBlockDetails> FileData = ...; class FileBlockDetails { long blocksPositionInFile; int blockLength, numberOfRecordsInBlock; WeakReference<byte[]> blockData;

Either provide more details on the data / application or tell me which way you want to go
and I will investigate & give more detailed info.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.
- Immanuel Stephen - 2017-07-24
  
  please find the answers to your questions
  Is it cobol file (I presume yes).yes its a COBOL file transferred from mainframe to my desktop
  Is Standard Text file or Cobol binary file currently trying with a standard text file. In future could use binary file too
  Is it one file or generic file it is one file
  Is the program to run on a server, powerful pc, netbookThis application will run on regular laptop I use with 8 GB RAM **
  What is the fileStructure i.e. do System.out.println(line.getLayout().getFileStructure());File structure is 9
  how long are the records The file I use has a record length of 300 bytes
  How big are the files going to beas large as 1 or 1.5 GB**
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Anonymous
    
    Add attachments
    Cancel
    You seem to have CSS turned off. Please don't fill out this field.
    
    You seem to have CSS turned off. Please don't fill out this field.

Bruce Martin - 2017-07-23

assigned_to: Bruce Martin
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

have you looked at the RecordEditor; it should be able to display the file

You can read the file as byte array records like:

        String copybookName = "???";
        int fileType = Constants.IO_BIN_TEXT;
        ICobolIOBuilder iob = JRecordInterface1.COBOL
                .newIOBuilder(copybookName)
                    .setFileOrganization(fileType)
                    .setSplitCopybook(CopybookLoader.SPLIT_NONE)
                ;  
        LayoutDetail schema = iob.getLayout();

        AbstractByteReader byteReader = ByteIOProvider.getInstance().getByteReader(schema);
        byte[] record = null;

        byteReader.open("fileName");
        while ((record = byteReader.read()) != null) {
            // Store the record
            System.out.println("Bytes Read So far: " + byteReader.getBytesRead());
        }

You can get the bytes read
I would not read the whole file in the one go but wait

You can create a line from an array of bytes using the iobuilder

    AbstractLine newLine = iob.newLine(record);
      // where record is an array of bytes;

You can represent the file as and ArrayList of FIileBlock's

where a FileBlock is

public class FileBlock {
   final static int recordLength = 300; // or what ever it is
   final static int recordsInBlock = 256;
   long blocksPositionInFile;
   int numberOfRecordsInBlock=0;
   private WeakReference<byte[]> blockData = new WeakReference<byte[]>(new byte[recordLength * recordsInBlock]);

   public byte[] get(int idx) {
       byte[] data = getData();
       byte[] ret = new byte[recordLength];
       System.arraycopy(data, idx * recordLength, ret, 0, recordLength);
       return ret;
   }

    /**
     * @return
     */
    private byte[] getData() {
        byte[] data = blockData.get();
           if (data == null) {
                 // doRead does a random read to read the required data
               data = doRead(blocksPositionInFile, numberOfRecordsInBlock * recordLength);
               blockData = new WeakReference<byte[]>(data);
           }
        return data;
    }
}

by using WeakReference java can reclaim the block if needed and you can reread the block
when needed. An alternative might be to increase the block size to 300,000+ bytes and store in compressed format (gzip or snappy).

Anonymous

Add attachments
Cancel
You seem to have CSS turned off. Please don't fill out this field.

You seem to have CSS turned off. Please don't fill out this field.

Immanuel Stephen - 2017-07-27

I am yet to take a look at Recoreditor. But my intention is to develop a custom ui based on eclipse rcp. Also in the FileBlock approach suggested, if i create a arraylist of FileBlock wont that cause a MemoryOverflow error ?

Last edit: Immanuel Stephen 2017-07-27

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Bruce Martin - 2017-07-28

if i create a arraylist of FileBlock wont that cause a MemoryOverflow error ? No because of the WeakReference

The WeakReference is a special object, the data it points to is available for Garbage-Collection. Alternatively SoftReference is less agressive form of WeakReference.

One word of warning hold onto the first 2 byte-blocks seperatly so they do not get Garbage collected.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Bruce Martin - 2017-12-02

status: open --> pending
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Reading large files

Read Cobol data files in Java

Group

Searches

Help

#10 Reading large files

Discussion

Some background

Possible Solutions

RecordEditor

FixedQidth file

Low Level JRecord Routines