Menu

#245 CSVReader readAll method exist attack risks

v1.0 (example)
closed-out-of-date
None
5
2023-12-10
2023-08-23
jack
No

The class CSVReader readAll method of OpenCSV is used to parse the attack file (the file contains only one line but many columns). As a result, the service memory is expanded by dozens of times, and the service memory overflows. The OpenCSV component does not have restrictions and no error information is reported. Our openCSV version is 5.6.

Discussion

  • Scott Conway

    Scott Conway - 2023-08-23

    Hello JC I am a little confused by your wording as one line many columns is the quinessential definition of a CSV file. Do you mean multiple rows?

    If so this is not an bug but is a known issue. The readAll literally does just that - reads the entirety of the file/data into memory. If you have too many rows you will run out of memory - simple as that. If you are dealing with large files we recommend you use the Iterator or read yourself one at a time.

     
  • jack

    jack - 2023-08-24

    My friend, we use a DoS attack file with only one line of data (49 MB). When readNext is used for parsing, the memory is expanded by more than 1 GB. Is it reasonable to enlarge the memory by dozens of times? In addition, openCSV does not expose the capability of verifying interception in advance (checking the size of a row of data).

     
  • Scott Conway

    Scott Conway - 2023-08-25

    Ahhh that is different. Your title stated you were using readAll not readNext. but a single line of data, single record, I understand a little better now.

    First off upgrade to 5.8 and let me know how that works. Also out of curiousity try both the csvParser and RFC4180Parser.

    in 5.8 I had put in some memory optimizations and then removed MOST of them as while I did cut down the number of memory allocations it also took twice as long to run as all the strings I was creating before was in the very short lived eden memory and thus was not causing much in the way of garbage collections whereas by creating fewer objects the resize of the StringBuffers was going into the slower garbage cleanup - though not a fullGC.

    That said I can see having multiple reallocations as we are line oriented and merge the lines together into a single record. You may have a single record but a column in the record could have multiple newlines, thus requiring multiple reads to build the final data thus it would be impossible to calculate the correct buffer sizes beforehand.

    BUT you are saying it is a single line - no new lines/carriage returns (?). If this is true if possible please send me a (compressed) csv file and a sample of your test program so I can run it in profiler to see what is happening.

    The only other thing I can think of is if you can turn it off then set the KeepCarriageReturn to false in the CSVReaderBuilder - which should be the default so if you are not setting it to true it should be false. But if true we use the standard reader to get the next line. if false we use the reader to read a single character at a time to ensure carriage returns are preserved in the data - so yeah a lot of garbage collection would go in that.

     
  • Scott Conway

    Scott Conway - 2023-12-10
    • status: open --> closed-out-of-date
    • assigned_to: J.C. Romanda --> Scott Conway
     
  • Scott Conway

    Scott Conway - 2023-12-10

    closed for lack of response.

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.