opencsv / Bugs / #245 CSVReader readAll method exist attack risks

Scott Conway - 2023-08-23

Hello JC I am a little confused by your wording as one line many columns is the quinessential definition of a CSV file. Do you mean multiple rows?

If so this is not an bug but is a known issue. The readAll literally does just that - reads the entirety of the file/data into memory. If you have too many rows you will run out of memory - simple as that. If you are dealing with large files we recommend you use the Iterator or read yourself one at a time.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

jack - 2023-08-24

My friend, we use a DoS attack file with only one line of data (49 MB). When readNext is used for parsing, the memory is expanded by more than 1 GB. Is it reasonable to enlarge the memory by dozens of times? In addition, openCSV does not expose the capability of verifying interception in advance (checking the size of a row of data).

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Scott Conway - 2023-08-25

Ahhh that is different. Your title stated you were using readAll not readNext. but a single line of data, single record, I understand a little better now.

First off upgrade to 5.8 and let me know how that works. Also out of curiousity try both the csvParser and RFC4180Parser.

in 5.8 I had put in some memory optimizations and then removed MOST of them as while I did cut down the number of memory allocations it also took twice as long to run as all the strings I was creating before was in the very short lived eden memory and thus was not causing much in the way of garbage collections whereas by creating fewer objects the resize of the StringBuffers was going into the slower garbage cleanup - though not a fullGC.

That said I can see having multiple reallocations as we are line oriented and merge the lines together into a single record. You may have a single record but a column in the record could have multiple newlines, thus requiring multiple reads to build the final data thus it would be impossible to calculate the correct buffer sizes beforehand.

BUT you are saying it is a single line - no new lines/carriage returns (?). If this is true if possible please send me a (compressed) csv file and a sample of your test program so I can run it in profiler to see what is happening.

The only other thing I can think of is if you can turn it off then set the KeepCarriageReturn to false in the CSVReaderBuilder - which should be the default so if you are not setting it to true it should be false. But if true we use the standard reader to get the next line. if false we use the reader to read a single character at a time to ensure carriage returns are preserved in the data - so yeah a lot of garbage collection would go in that.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Scott Conway - 2023-12-10

status: open --> closed-out-of-date

assigned_to: J.C. Romanda --> Scott Conway
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Scott Conway - 2023-12-10

closed for lack of response.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

CSVReader readAll method exist attack risks

Group

Searches

Help

#245 CSVReader readAll method exist attack risks

Discussion