CSVReader readAll method exist attack risks
Brought to you by:
aruckerjones,
sconway
The class CSVReader readAll method of OpenCSV is used to parse the attack file (the file contains only one line but many columns). As a result, the service memory is expanded by dozens of times, and the service memory overflows. The OpenCSV component does not have restrictions and no error information is reported. Our openCSV version is 5.6.
Hello JC I am a little confused by your wording as one line many columns is the quinessential definition of a CSV file. Do you mean multiple rows?
If so this is not an bug but is a known issue. The readAll literally does just that - reads the entirety of the file/data into memory. If you have too many rows you will run out of memory - simple as that. If you are dealing with large files we recommend you use the Iterator or read yourself one at a time.
My friend, we use a DoS attack file with only one line of data (49 MB). When readNext is used for parsing, the memory is expanded by more than 1 GB. Is it reasonable to enlarge the memory by dozens of times? In addition, openCSV does not expose the capability of verifying interception in advance (checking the size of a row of data).
Ahhh that is different. Your title stated you were using readAll not readNext. but a single line of data, single record, I understand a little better now.
First off upgrade to 5.8 and let me know how that works. Also out of curiousity try both the csvParser and RFC4180Parser.
in 5.8 I had put in some memory optimizations and then removed MOST of them as while I did cut down the number of memory allocations it also took twice as long to run as all the strings I was creating before was in the very short lived eden memory and thus was not causing much in the way of garbage collections whereas by creating fewer objects the resize of the StringBuffers was going into the slower garbage cleanup - though not a fullGC.
That said I can see having multiple reallocations as we are line oriented and merge the lines together into a single record. You may have a single record but a column in the record could have multiple newlines, thus requiring multiple reads to build the final data thus it would be impossible to calculate the correct buffer sizes beforehand.
BUT you are saying it is a single line - no new lines/carriage returns (?). If this is true if possible please send me a (compressed) csv file and a sample of your test program so I can run it in profiler to see what is happening.
The only other thing I can think of is if you can turn it off then set the KeepCarriageReturn to false in the CSVReaderBuilder - which should be the default so if you are not setting it to true it should be false. But if true we use the standard reader to get the next line. if false we use the reader to read a single character at a time to ensure carriage returns are preserved in the data - so yeah a lot of garbage collection would go in that.
closed for lack of response.