Menu

#10 Reading large files

v1.0_(example)
pending
None
1
2017-12-02
2017-07-22
No

I am using JRecord library for leading large files and displaying them in a UI. There are no issues for files with 200 till 800 MB.However when the file is larger ie., 1 GB or more, I get Memory overflow in the read method. how can I approach this ? one way is read few lines and read the next set whenever required. But how can I handle read back wards ?

AbstractLineReader reader = iobuilder.newReader(datafile);
while ((line = reader.read()) != null) {
                lines.add(line);
}

Discussion

  • Bruce Martin

    Bruce Martin - 2017-07-23

    I really need to know more about to give a definitive answer
    i.e.

    • Is it cobol file (I presume yes).
    • Is Standard Text file or Cobol binary file
    • Is it one file or generic file
    • Is the program to run on a server, powerful pc, netbook
    • What is the fileStructure i.e. do System.out.println(line.getLayout().getFileStructure());
    • how long are the records
    • How big are the files going to be

    There are solutions that will work well in some cases


    Some background

    There is an overhead of 40 odd bytes per line; If lines are small this can
    be very significant and needs to be looked at (e.g. if the Record-Length=20, 66% of your memory is
    wasted; while if the Record-Length=1000, only 4% is wasted).

    Storeing multiple records in an array of bytes and generating Lines only when needed will save some space when the
    Record Length is small.
    e.g. if the records are of similar length you could store them as

    Array of bytes:  <- Max Record Length -><- Max Record Length -><- Max Record Length -><- Max Record Length -><- Max Record Length ->
                     |      Record 1       ||      Record 2       ||      Record 3       ||      Record 4       ||      Record 5       |
    

    If file is a Mainframe Fixed width file, you can calculate the byte address of each record as

    Record_Position = (Record_Number - 1) * Record_Length

    and then use Random_Access files to read in Records (suggest reading in several thousand records at a time).


    Possible Solutions

    RecordEditor

    Use the FileView class of the RecordEditor to store the data. It alread optermises storing the data

    Advantages:

    • already exists and is tested.
    • Has good optermisation for Cobol style Fixed Width records.
    • FileView implements the TabelModel interface

    Disadvantages:

    • Has a GPL license so can only be used internally (and not sold as part of a package).
    • Might not work so well on a server (not sure really).
    • RecordEditor does not use the current JRecord (it split from 7/8 years ago and the code has diverged).
      It does not have IOBuilders for example. I can help with this if need be, there would only be a couple of calls.

    FixedQidth file

    Read Records in as need using a ?Random access file. Suggest reading in records 100's / thousands at time

    Low Level JRecord Routines

    JRecords Low-Level ByteReaders do keep track of the Bytes-Read precisely for this sort of purpose.
    ByteReaders have a method:

    public long getBytesRead()
    

    You could represent the file like:

    List<FileBlockDetails> FileData = ...;
    
    class FileBlockDetails {
       long blocksPositionInFile;
       int blockLength, numberOfRecordsInBlock;
       WeakReference<byte[]> blockData;
    

    Either provide more details on the data / application or tell me which way you want to go
    and I will investigate & give more detailed info.

     
    • Immanuel Stephen

      please find the answers to your questions
      Is it cobol file (I presume yes).yes its a COBOL file transferred from mainframe to my desktop
      Is Standard Text file or Cobol binary file currently trying with a standard text file. In future could use binary file too
      Is it one file or generic file it is one file
      Is the program to run on a server, powerful pc, netbookThis application will run on regular laptop I use with 8 GB RAM
      What is the fileStructure i.e. do System.out.println(line.getLayout().getFileStructure());File structure is 9
      how long are the records The file I use has a record length of 300 bytes
      How big are the files going to beas large as 1 or 1.5 GB

       
  • Bruce Martin

    Bruce Martin - 2017-07-23
    • assigned_to: Bruce Martin
     
  • Bruce Martin

    Bruce Martin - 2017-07-25

    have you looked at the RecordEditor; it should be able to display the file

    You can read the file as byte array records like:

            String copybookName = "???";
            int fileType = Constants.IO_BIN_TEXT;
            ICobolIOBuilder iob = JRecordInterface1.COBOL
                    .newIOBuilder(copybookName)
                        .setFileOrganization(fileType)
                        .setSplitCopybook(CopybookLoader.SPLIT_NONE)
                    ;  
            LayoutDetail schema = iob.getLayout();
    
            AbstractByteReader byteReader = ByteIOProvider.getInstance().getByteReader(schema);
            byte[] record = null;
    
            byteReader.open("fileName");
            while ((record = byteReader.read()) != null) {
                // Store the record
                System.out.println("Bytes Read So far: " + byteReader.getBytesRead());
            }
    

    You can get the bytes read
    I would not read the whole file in the one go but wait

    You can create a line from an array of bytes using the iobuilder

        AbstractLine newLine = iob.newLine(record);
          // where record is an array of bytes;
    

    You can represent the file as and ArrayList of FIileBlock's

    where a FileBlock is

    public class FileBlock {
       final static int recordLength = 300; // or what ever it is
       final static int recordsInBlock = 256;
       long blocksPositionInFile;
       int numberOfRecordsInBlock=0;
       private WeakReference<byte[]> blockData = new WeakReference<byte[]>(new byte[recordLength * recordsInBlock]);
    
       public byte[] get(int idx) {
           byte[] data = getData();
           byte[] ret = new byte[recordLength];
           System.arraycopy(data, idx * recordLength, ret, 0, recordLength);
           return ret;
       }
    
        /**
         * @return
         */
        private byte[] getData() {
            byte[] data = blockData.get();
               if (data == null) {
                     // doRead does a random read to read the required data
                   data = doRead(blocksPositionInFile, numberOfRecordsInBlock * recordLength);
                   blockData = new WeakReference<byte[]>(data);
               }
            return data;
        }
    }
    

    by using WeakReference java can reclaim the block if needed and you can reread the block
    when needed. An alternative might be to increase the block size to 300,000+ bytes and store in compressed format (gzip or snappy).


     
    • Immanuel Stephen

      I am yet to take a look at Recoreditor. But my intention is to develop a custom ui based on eclipse rcp. Also in the FileBlock approach suggested, if i create a arraylist of FileBlock wont that cause a MemoryOverflow error ?

       

      Last edit: Immanuel Stephen 2017-07-27
  • Bruce Martin

    Bruce Martin - 2017-07-28

    if i create a arraylist of FileBlock wont that cause a MemoryOverflow error ? No because of the WeakReference

    The WeakReference is a special object, the data it points to is available for Garbage-Collection. Alternatively SoftReference is less agressive form of WeakReference.

    One word of warning hold onto the first 2 byte-blocks seperatly so they do not get Garbage collected.

     
  • Bruce Martin

    Bruce Martin - 2017-12-02
    • status: open --> pending
     

Anonymous
Anonymous

Add attachments
Cancel





Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.