Out of memory issue with Parsing ?
Brought to you by:
aruckerjones,
sconway
Hello,
i'm trying to parse files between size of 200Ko and 120mo I parse the file like this :
public <T> List getParsedVehicles(File importFile, Charset charset, Class<T> typeParameterClass, char separator) {
Path path = Paths.get(importFile.getPath());
try (BufferedReader br = Files.newBufferedReader(path, charset)) {
return new CsvToBeanBuilder(br)
.withType(typeParameterClass)
.withIgnoreLeadingWhiteSpace(true)
.withFieldAsNull(CSVReaderNullFieldIndicator.EMPTY_QUOTES)
.withFieldAsNull(CSVReaderNullFieldIndicator.EMPTY_SEPARATORS)
.withSeparator(separator)
.build()
.parse();
} catch (RuntimeException | IOException exception) {
String errorMessage = String.format("Unable to parse the file [%s]", importFile.getPath());
LOGGER.warn(errorMessage, exception);
throw new ParserException(errorMessage, exception);
}
}
i'm wondering if it's the good use of the library as we discover that our application who parse some files mutliple times per days restart due to out of memory issues. (datadog)
There is nothing wrong with the way you're using opencsv. The graph you sent is also not inherently disturbing, since operating system and some applications (like Java) will use all of the memory they can get in order to be speedy, but they function just fine if they have less, or if applications request more memory than is currently free—they just clean up and hand over the memory requested.
An out-of-memory error is, of course, not the same as what I just described. You could start with the stack trace to determine where the application is running out of memory, because it will likely happen in exactly the place where the application continues to request memory. Beyond that, you would have to look into the JVM to determine where the memory is being held and not released.
You might consider using the Iterator form of CsvToBean, since it parses only one line at a time and creates one object at a time. As a result, it's slower, but if you always release the object created after you use it, memory usage should stay low.
Let us know if we can be of any other assistance.
This! In fact we war about it on our sourceforge page.
Very interesting ! thanks you can close :)
Well even with one file of 120Mo the parsing take 1Go of ram, is it normal ?
There's no way for us to know that about your environment. It depends too much on your data and usage. You might have to fire up a profiler and figure out where all of that memory is going.
Depending on your garbage collection strategies I can totally see it, and looking at the pics you sent I hope the String and the char[] are one in the same (internally a String has an char array inside of it).
Think of it this way - you have a file that's 120mb in size we read in one line at a time as a string and parse it into an array of strings until we have finished a complete record and then we create an object out of that array of strings and then start over. So there is alot of strings created to get you that one object waiting to be garbage collected.
If you are using the iterator you can try tuning the size of your eden and survivor space to be larger so that hopefully a complete record is parsed and object is created before all the strings are put in tenured memory that is only removed when a full GC is performed.
Yeah i'm trying this with iterator but I see exactly the same use of ram it goes over 1Go :(
BUT if i'm using this GC it works very well ! : -XX:+UseParNewGC
Last edit: Dimitri SCOLE 2021-11-05
Yup this is exactly what I 'm currently using with intellij
https://ibb.co/gZ7pnxq
https://ibb.co/LvRVWVJ
is it possible that object that we created from the file are too big ?
Last edit: Dimitri SCOLE 2021-11-05
Right, but that only says something is using memory. It doesn't say what objects are consuming the memory.
I'll also say up front right now that I'm not willing to interpret profiler results. If you have a concrete bug or inefficiency to report, great.
Well I dont want you tom interprete the profiles :)
I,m just worried about the fact that there is probably a limit in the library to parse file like 50mo of data using the CsvToBeanBuilder
my question is, do you have already see that thing with other users (this big use of memory using opencsv ? with file like 50mo or 100mo )
Maybe I will be able to contact somebody else and see their experiences about your library.
Last edit: Dimitri SCOLE 2021-11-12