opencsv / Support Requests / #95 Out of memory issue with Parsing ?

Andrew Rucker Jones - 2021-09-14

There is nothing wrong with the way you're using opencsv. The graph you sent is also not inherently disturbing, since operating system and some applications (like Java) will use all of the memory they can get in order to be speedy, but they function just fine if they have less, or if applications request more memory than is currently free—they just clean up and hand over the memory requested.

An out-of-memory error is, of course, not the same as what I just described. You could start with the stack trace to determine where the application is running out of memory, because it will likely happen in exactly the place where the application continues to request memory. Beyond that, you would have to look into the JVM to determine where the memory is being held and not released.

You might consider using the Iterator form of CsvToBean, since it parses only one line at a time and creates one object at a time. As a result, it's slower, but if you always release the object created after you use it, memory usage should stay low.

Let us know if we can be of any other assistance.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Scott Conway - 2021-09-16

This! In fact we war about it on our sourceforge page.

Time vs. memory: The classic trade-off. If memory is not a problem, read using CsvToBean.parse() or CsvToBean.stream(), which will read all beans at once and are multi-threaded. If your memory is limited, use CsvToBean.iterator() and iterate over the input. Only one bean is read at a time, making multi-threading impossible and slowing down reading, but only one object is in memory at a time (assuming you process and release the object for the garbage collector immediately).

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dimitri SCOLE - 2021-09-16

Very interesting ! thanks you can close :)

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Andrew Rucker Jones - 2021-09-16

status: open --> closed

assigned_to: Andrew Rucker Jones
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dimitri SCOLE - 2021-11-05

Well even with one file of 120Mo the parsing take 1Go of ram, is it normal ?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Andrew Rucker Jones - 2021-11-05
  
  There's no way for us to know that about your environment. It depends too much on your data and usage. You might have to fire up a profiler and figure out where all of that memory is going.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Scott Conway - 2021-11-05
  
  Depending on your garbage collection strategies I can totally see it, and looking at the pics you sent I hope the String and the char[] are one in the same (internally a String has an char array inside of it).
  
  Think of it this way - you have a file that's 120mb in size we read in one line at a time as a string and parse it into an array of strings until we have finished a complete record and then we create an object out of that array of strings and then start over. So there is alot of strings created to get you that one object waiting to be garbage collected.
  
  If you are using the iterator you can try tuning the size of your eden and survivor space to be larger so that hopefully a complete record is parsed and object is created before all the strings are put in tenured memory that is only removed when a full GC is performed.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Dimitri SCOLE - 2021-11-05
    
    Yeah i'm trying this with iterator but I see exactly the same use of ram it goes over 1Go :(
    
    BUT if i'm using this GC it works very well ! : -XX:+UseParNewGC
    
    try (BufferedReader br = Files.newBufferedReader(path, charset)) { CsvToBean csvToBean =new CsvToBeanBuilder(br) .withType(typeParameterClass) .withIgnoreLeadingWhiteSpace(true) .withFieldAsNull(CSVReaderNullFieldIndicator.EMPTY_QUOTES) .withFieldAsNull(CSVReaderNullFieldIndicator.EMPTY_SEPARATORS) .withSeparator(separator) .withFilter(this::skipEmptyLines) .build(); Iterator it = csvToBean.iterator(); while (it.hasNext()) { Object type = it.next(); } return null; }
    
    Last edit: Dimitri SCOLE 2021-11-05
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dimitri SCOLE - 2021-11-05

Yup this is exactly what I 'm currently using with intellij

https://ibb.co/gZ7pnxq
https://ibb.co/LvRVWVJ

is it possible that object that we created from the file are too big ?

Last edit: Dimitri SCOLE 2021-11-05

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Andrew Rucker Jones - 2021-11-05
  
  Right, but that only says something is using memory. It doesn't say what objects are consuming the memory.
  
  I'll also say up front right now that I'm not willing to interpret profiler results. If you have a concrete bug or inefficiency to report, great.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dimitri SCOLE - 2021-11-05

Well I dont want you tom interprete the profiles :)

I,m just worried about the fact that there is probably a limit in the library to parse file like 50mo of data using the CsvToBeanBuilder

my question is, do you have already see that thing with other users (this big use of memory using opencsv ? with file like 50mo or 100mo )

Maybe I will be able to contact somebody else and see their experiences about your library.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dimitri SCOLE - 2021-11-12

Last edit: Dimitri SCOLE 2021-11-12

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Out of memory issue with Parsing ?

Group

Searches

Help

#95 Out of memory issue with Parsing ?

Discussion