Menu

#95 Out of memory issue with Parsing ?

v1.0 (example)
closed
None
5
2021-11-12
2021-09-14
No

Hello,

i'm trying to parse files between size of 200Ko and 120mo I parse the file like this :

  public <T> List getParsedVehicles(File importFile, Charset charset, Class<T> typeParameterClass, char separator) {
        Path path = Paths.get(importFile.getPath());
        try (BufferedReader br = Files.newBufferedReader(path, charset)) {
            return new CsvToBeanBuilder(br)
                .withType(typeParameterClass)
                .withIgnoreLeadingWhiteSpace(true)
                .withFieldAsNull(CSVReaderNullFieldIndicator.EMPTY_QUOTES)
                .withFieldAsNull(CSVReaderNullFieldIndicator.EMPTY_SEPARATORS)
                .withSeparator(separator)
                .build()
                .parse();
        } catch (RuntimeException | IOException exception) {
            String errorMessage = String.format("Unable to parse the file [%s]", importFile.getPath());
            LOGGER.warn(errorMessage, exception);
            throw new ParserException(errorMessage, exception);
        }
    }

i'm wondering if it's the good use of the library as we discover that our application who parse some files mutliple times per days restart due to out of memory issues. (datadog)

1 Attachments

Discussion

  • Andrew Rucker Jones

    There is nothing wrong with the way you're using opencsv. The graph you sent is also not inherently disturbing, since operating system and some applications (like Java) will use all of the memory they can get in order to be speedy, but they function just fine if they have less, or if applications request more memory than is currently free—they just clean up and hand over the memory requested.

    An out-of-memory error is, of course, not the same as what I just described. You could start with the stack trace to determine where the application is running out of memory, because it will likely happen in exactly the place where the application continues to request memory. Beyond that, you would have to look into the JVM to determine where the memory is being held and not released.

    You might consider using the Iterator form of CsvToBean, since it parses only one line at a time and creates one object at a time. As a result, it's slower, but if you always release the object created after you use it, memory usage should stay low.

    Let us know if we can be of any other assistance.

     
  • Scott Conway

    Scott Conway - 2021-09-16

    This! In fact we war about it on our sourceforge page.

    Time vs. memory: The classic trade-off. If memory is not a problem, read using CsvToBean.parse() or CsvToBean.stream(), which will read all beans at once and are multi-threaded. If your memory is limited, use CsvToBean.iterator() and iterate over the input. Only one bean is read at a time, making multi-threading impossible and slowing down reading, but only one object is in memory at a time (assuming you process and release the object for the garbage collector immediately).

     
  • Dimitri SCOLE

    Dimitri SCOLE - 2021-09-16

    Very interesting ! thanks you can close :)

     
  • Andrew Rucker Jones

    • status: open --> closed
    • assigned_to: Andrew Rucker Jones
     
  • Dimitri SCOLE

    Dimitri SCOLE - 2021-11-05

    Well even with one file of 120Mo the parsing take 1Go of ram, is it normal ?

     
    • Andrew Rucker Jones

      There's no way for us to know that about your environment. It depends too much on your data and usage. You might have to fire up a profiler and figure out where all of that memory is going.

       
    • Scott Conway

      Scott Conway - 2021-11-05

      Depending on your garbage collection strategies I can totally see it, and looking at the pics you sent I hope the String and the char[] are one in the same (internally a String has an char array inside of it).

      Think of it this way - you have a file that's 120mb in size we read in one line at a time as a string and parse it into an array of strings until we have finished a complete record and then we create an object out of that array of strings and then start over. So there is alot of strings created to get you that one object waiting to be garbage collected.

      If you are using the iterator you can try tuning the size of your eden and survivor space to be larger so that hopefully a complete record is parsed and object is created before all the strings are put in tenured memory that is only removed when a full GC is performed.

       
      • Dimitri SCOLE

        Dimitri SCOLE - 2021-11-05

        Yeah i'm trying this with iterator but I see exactly the same use of ram it goes over 1Go :(

        BUT if i'm using this GC it works very well ! : -XX:+UseParNewGC

        try (BufferedReader br = Files.newBufferedReader(path, charset)) {
        
                    CsvToBean csvToBean =new CsvToBeanBuilder(br)
                        .withType(typeParameterClass)
                        .withIgnoreLeadingWhiteSpace(true)
                        .withFieldAsNull(CSVReaderNullFieldIndicator.EMPTY_QUOTES)
                        .withFieldAsNull(CSVReaderNullFieldIndicator.EMPTY_SEPARATORS)
                        .withSeparator(separator)
                        .withFilter(this::skipEmptyLines)
                        .build();
        
                    Iterator it = csvToBean.iterator();
        
                    while (it.hasNext()) {
                        Object type = it.next();
                    }
        
                    return null;
        
                } 
        
         

        Last edit: Dimitri SCOLE 2021-11-05
  • Dimitri SCOLE

    Dimitri SCOLE - 2021-11-05

    Yup this is exactly what I 'm currently using with intellij

    https://ibb.co/gZ7pnxq
    https://ibb.co/LvRVWVJ

    is it possible that object that we created from the file are too big ?

     

    Last edit: Dimitri SCOLE 2021-11-05
    • Andrew Rucker Jones

      Right, but that only says something is using memory. It doesn't say what objects are consuming the memory.

      I'll also say up front right now that I'm not willing to interpret profiler results. If you have a concrete bug or inefficiency to report, great.

       
  • Dimitri SCOLE

    Dimitri SCOLE - 2021-11-05

    Well I dont want you tom interprete the profiles :)

    I,m just worried about the fact that there is probably a limit in the library to parse file like 50mo of data using the CsvToBeanBuilder

    my question is, do you have already see that thing with other users (this big use of memory using opencsv ? with file like 50mo or 100mo )

    Maybe I will be able to contact somebody else and see their experiences about your library.

     
  • Dimitri SCOLE

    Dimitri SCOLE - 2021-11-12
     

    Last edit: Dimitri SCOLE 2021-11-12

Log in to post a comment.

MongoDB Logo MongoDB