opencsv / Bugs / #270 RFC4180Parser is unpredictable with wrongly quoted csv

Scott Conway - 2026-02-20

assigned_to: Scott Conway
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Scott Conway - 2026-02-20

K - I will try and take a look at this but it will take some time as work and life are working together to take up my free time.

I want to thank you for sending tests, the RFC4180ParserSpec.groovy additions. I will be looking at that to address your claims that the RFC4180Parser is not compliant with the RFC4180 specification. This is my main concern given the parser's age (nine years) and the amount of use this parser has had that this is the second "bug" that the parser is not compliant that I can remember.... and they were both this year! The other was that the CRLF stated in the RFC4180 should apply to just CR as old mac files used that as a standard. So forgive my initial skepticism while I understand your first test.

If you are in a time crunch or we cannot come to an agreement of the specification then I give you the same suggestion as Bug #267 which is to create your own parser which extends the RFC4180Parser and adds the withStrictQuotes and the code that processes it. Then create your own parser builder that extends the RFC4180ParserBuilder that adds the withStrictQuotes. The only drawback is that you will have to hard code the exception language but that will give you a parser that works the way you want.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

okay - I did just take a quick copy of your first test and modified it to just use the existing parser so it is always expecting the lientent result and I now see the issues you are talking about and I am shocked this has stayed in so long without someone bringing it up.... well maybe not - it is most likely that all the data before was programatically generated correctly to the rfc4180 specification so no one ever tried a hand parsed or partially malformed csv file. But it does break both rule 6 and 7 so I will need to do some soul searching on the best way to handle this but will probably go your route.

That said it will take some time as I do want to do some additional testing to feel good as to why this took so long to find... plus the job and home life demands.

Thank you for the tests! A good test explains far more than pages of explanation and your first test nailed it!

Scott :)

P.S. just to show here is the modified first test.

   @Unroll
    def 'no strict quote parsing of #input should yield error #expectedError'(String input, String expectedError, String[] lenientResult) {
        given:
        RFC4180ParserBuilder builder = new RFC4180ParserBuilder()

        when:
        CSVReader csvReader = new CSVReaderBuilder(new StringReader(input)).withCSVParser(builder.build()).build()

        then:
        csvReader.readAll().get(0) == lenientResult

        where:
        input                                      | expectedError                                                                        | lenientResult
        '''"val1", "{x:""123"", y: ""456""}",""''' | '''The unquoted field [ "{x:""123""] contains quotes.'''                             | ['''val1''', ''' "{x:"123"''', ''' y: "456"}"''', ""]
        '''quote at end",""'''                     | '''The unquoted field [quote at end"] contains quotes.'''                            | ['''quote at end"''', ""]
        '''""double quote at start"",""'''         | '''The field ["double quote at start"] contains a single, unescaped double quote.''' | ['''"double quote at start"''', ""]
        '''"single loose " quote"",""'''           | '''The field [single loose " quote"] contains a single, unescaped double quote.'''   | ['''single loose " quote"''', ""]
    }

Papegaaij - 2026-02-20

Hi Scott,
Thanks for taking the time to look into this and all the time you've already spent on this library. I guess nobody tried to put a space before a quoted field until a colleague of mine did :) It took us quite some time to find out what was actually happening inside the parser at this point, but once we found the cause, it was easy to construct some failing tests. The RFC 4180 is actually quite strict about quotes, but I guess most CSV files are generated and either quote correctly or not quote at all. There was already a bug about this before: #157 . RFC4180ParserSpec contains a testcase for this.

The tests are great to show these kind of issues, especially the spock ones. It's a nice framework to work with. We use it in our application as well.

Take your time to get this properly resolved. For us its not that time critical. We do want this to be resolved eventually, but I expect most people using the CSV import will use generated CSV files, which will probably be RFC 4180 compliant. For non-compliant files, it still is very likely you will get an error, but the chances are the error is a bit vague or not entirely correct. For example, the row might end up with more columns and get skipped for that.

Best regards,
Emond

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Scott Conway - 2026-02-22

Sigh - I had a little free time and wanted to see if it was possible to use the validators that we added a few years back because of all the requests to add custom processing or custom validation - https://opencsv.sourceforge.net/#processors_and_validators. I knew it would not work but wanted to try the LineValidator.

The issue is the LineValidator validates a single line read by Java - not a single record/row. So a string with a newline inside it requires two Java reads and thus calls the validator twice for each segment, not once with the entire string. And that is because the reader builds the resulting array incrementally to reduce the memory overhead.

Just giving a heads up.

:)

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Papegaaij - 2026-03-02

I tried the very same thing, and faced the same issue. To use a LineValidator, you have to duplicate the entire parsing in the validator and it will need to be stateful to continue parsing on the next line. I think it's possible, but not a very good solution.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

RFC4180Parser is unpredictable with wrongly quoted csv

Group

Searches

Help

#270 RFC4180Parser is unpredictable with wrongly quoted csv

Discussion