Menu

#267 RFC4180ParserBuilder does not add quotes around field with CR

v1.0 (example)
open
None
5
2026-02-02
2026-01-30
No

When using the RFC-compliant parser to write CSV, it correctly quotes values if they contain \n, but it does not quote values that contain only \r. According to my reading of the RFC, this behavior is incorrect.

The RFC-4180 specification defines that CR can only exist within CSV if it is quoted.
Reference: https://datatracker.ietf.org/doc/html/rfc4180
Relevant section of the grammar:

   escaped = DQUOTE *(TEXTDATA / COMMA / CR / LF / 2DQUOTE) DQUOTE
   non-escaped = *TEXTDATA

Version: 5.12.0, JDK 21.
Minimal example (https://www.jdoodle.com/ia/1PuU):

import com.opencsv.CSVWriterBuilder;
import com.opencsv.RFC4180ParserBuilder;
import com.opencsv.RFC4180Parser;
import com.opencsv.ICSVWriter;

import java.io.StringWriter;


public class MyClass {
  public static void main(String args[]) {

    RFC4180Parser rfc4180Parser = new RFC4180ParserBuilder().build();
    StringWriter stringWriter = new StringWriter();
    ICSVWriter csvWriter = new CSVWriterBuilder(stringWriter)
      .withParser(rfc4180Parser)
      .build();

    try {
        csvWriter.writeNext(new String[] {
          "case1", "Hello there"
        }, false);
        csvWriter.writeNext(new String[] {
          "case2", "Hello\nthere"
        }, false);
        csvWriter.writeNext(new String[] {
          "case3", "Hello\r\nthere"
        }, false);
        csvWriter.writeNext(new String[] {
          "case4", "Hello\rthere"
        }, false);

        csvWriter.flush();
        csvWriter.close();
    } catch (Exception e) {
        System.out.println("Error:" + e.getMessage());
    } finally {
    }
    String output = stringWriter.toString();
    System.out.println(output);
  }
}

Discussion

  • Scott Conway

    Scott Conway - 2026-02-01
    • assigned_to: Scott Conway
     
  • Scott Conway

    Scott Conway - 2026-02-01

    So where in the RFC 4180 spec states that fields with escape characters must be enclosed with quotes?

    Rule 6 is states only CRLF

    1. Fields containing line breaks (CRLF), double quotes, and commas
      should be enclosed in double-quotes. For example:

      "aaa","b CRLF
      bb","ccc" CRLF
      zzz,yyy,xxx

      where

      CR = %x0D ;as per section 6.1 of RFC 2234 [2]
      LF = %x0A ;as per section 6.1 of RFC 2234 [2]

    CRLF = CR LF ;as per section 6.1 of RFC 2234 [2]

    In cases like this I recommend setting the applyQuotesToAll to true instead of false in the writeNext call.

     
  • Scott Conway

    Scott Conway - 2026-02-01

    And for further clarification on a large number of OS and/or programming languages LF is used as a new line not CRLF. But I have yet to identify one that solely uses CR.

     
  • Guido Josquin

    Guido Josquin - 2026-02-01

    Thanks for getting back so quickly. CR was only used in Classic MacOS and some quite old console applications, so it's mostly a thing of the past. That said, working with many clients of all backgrounds and countries, we've seen these line endings come in, and even much weirder situations too, like non ascii whitespace characters. This is why we tested the case. In the example fiddle, the System.out.println also prints both LF and CR as a newline, making the CSV appear invalid. I wondered about interoperability with other CSV clients and languages, which might expect this to be quoted. This is why we use the RFC-based parser, so we never have to argue what the "correct" CSV format is.

    Regarding the spec, the descriptive text indeed references only CRLF - it does not describe what happens to a lone LF or a lone CR. However, it is cleared up by the grammar section I shared above, that LF and CR are both meant to be escaped in quotes. The "non-escaped" word TEXTDATA can only contain specific ranges of "normal" characters:

    escaped = DQUOTE *(TEXTDATA / COMMA / CR / LF / 2DQUOTE) DQUOTE
    non-escaped = *TEXTDATA
     ...
    TEXTDATA =  %x20-21 / %x23-2B / %x2D-7E
    
     

    Last edit: Guido Josquin 2026-02-01
  • Scott Conway

    Scott Conway - 2026-02-01

    Yes - in cases like this my advise is to just turn quotes on all the time. It is still legal csv and that protects you from oddities in character sets from older operating systems or non ascii character sets.

     
  • Scott Conway

    Scott Conway - 2026-02-01

    When in doubt - quotes all about :D

     
  • Guido Josquin

    Guido Josquin - 2026-02-01

    Appreciate the advice, we will do that. That said, we would prefer our CSV output to look according to spec without needing to add quotes to every data point. All respect to free and open source software, I know there's not always capacity to address everything. I reported the bug because I think it would make sense for this library to implement the spec as written, and others may look here for the same issue. I think it should be confirmed as bug, or rejected if it is intended behavior. Whether it's actually going to be fixed or not is of course another matter!

     
  • Scott Conway

    Scott Conway - 2026-02-02

    Oh Guido, Guido, Guido, Guido, Guido. You are going back to stating something that is NOT in the specification Rule 6 specifies CRLF, comma or doublequote. it does not state CR, LF, or CRLF.

    It is not just because of the limited capacity, though I thank you for actually acknowledging that, it is the years before I created the RFC4180Parser that I got literal hate mail that opencsv, which existed BEFORE the RFC4180 specification did not support the RFC4180 specification and constantly showing how to configure the existing CSVParser to successfully parse the examples they gave until the day someone gave an example I could not. After that I am very much a stickler about the specification. Especially given the number of downloads that opencsv gets each month I don't want to piss off a thousand developers to add a non backwards compatible change to make Guido happy.

    That said what I absolutely allow and completely encourage is the extension of opencsv. Just create your own parser class that extends RFC4180Parser and overrides the isSurroundWithQuotes method that is in the AbstractCSVParser with code that also includes a check for carriage return. That or if the protected scope of isSurroundWithQuotes makes that impossible then extend AbstractCSVParser and just copy the code from RFC4180Parser. Though in the later case let me know and I will happily open up the scope on the method to public. I get several requests per year to open up scope on methods to allow people to write their own extensions and almost always happily do so quickly. And at that point you can pass your own parser into the CSVWriter and get the data in the way you wanted.

     
  • Guido Josquin

    Guido Josquin - 2026-02-02

    It's no biggy. We have different interpretations of the spec, and the spec itself is not quite self-consistent. Rule 6 as written contradicts the ABNF grammar in the same spec. In fact, if we would follow ABNF to the letter, it would say that \t (TAB) is entirely illegal inside a CSV value, because it's neither in TEXTDATA nor is it included as a quoted character. With a spec like this, it's going to make almost any reasonable implementation non-compliant...

    Your suggestion to implement isSurroundWithQuotes is a very nice one. We can achieve exactly what we are looking for this way. I included the code below, and updated the doodle here:
    https://www.jdoodle.com/ia/1PC5

    Lastly, let's close on the notion that anyone sending hate mail is a knob. Hate mail to someone who is offering public services for free... I don't even know if there is a word for that. I'm glad we could have a little civilized discussion and it's always okay to disagree on something. Thank you for your time!

    Code snippet

        char separator = ICSVParser.DEFAULT_SEPARATOR;
        char quoteChar = ICSVParser.DEFAULT_QUOTE_CHARACTER;
        CSVReaderNullFieldIndicator nullFieldIndicator = CSVReaderNullFieldIndicator.NEITHER;
    
        RFC4180Parser rfc4180Parser = new RFC4180Parser(quoteChar, separator, nullFieldIndicator) {
            /**
             * Alternative interpretation of RFC-4180, which includes lone \r and \n as quoted values.
             * Additionally, any character which is not strictly allowed by the ABNF grammar, is also quoted.
             * Otherwise, this follows the original implementation from AbstractCsvParser.
             * See https://sourceforge.net/p/opencsv/source/ci/master/tree/src/main/java/com/opencsv/AbstractCSVParser.java#l182
             */
            protected boolean isSurroundWithQuotes(String value, boolean forceSurround) {
                if (value == null) {
                    return nullFieldIndicator.equals(CSVReaderNullFieldIndicator.EMPTY_QUOTES);
                } else if (value.isEmpty() && nullFieldIndicator.equals(CSVReaderNullFieldIndicator.EMPTY_SEPARATORS)) {
                    return true;
                } else if (forceSurround) {
                    return true;
                }
    
                return value.chars().anyMatch(ch ->
                    !(
                        (ch >= 0x20 && ch <= 0x21) ||
                        (ch >= 0x23 && ch <= 0x2B) ||
                        (ch >= 0x2D && ch <= 0x7E)
                    )
                );
            }
        };
    
     

Log in to post a comment.