#59 Inconsistent Defaults

closed-wont-fix
Scott Conway
None
5
2011-02-06
2011-02-04
Clemens Drews
No

CSV written with the writer does not read with the reader.
Atta

Discussion

  • Clemens Drews
    Clemens Drews
    2011-02-04

    junit test highlighting the problem

     
    Attachments
  • Clemens Drews
    Clemens Drews
    2011-02-04

    output from the writer does not read with the reader. See attached junit.
    cause is: CSVWriter.DEFAULT_ESCAPE_CHARACTER != CSVReader.DEFAULT_ESCAPE_CHARACTER.
    Simple fix:
    CSVWriter line 50:
    - public static final char DEFAULT_ESCAPE_CHARACTER = '"';
    + public static final char DEFAULT_ESCAPE_CHARACTER = '\\';

     
  • Clemens Drews
    Clemens Drews
    2011-02-04

    • summary: Inconsisten Defaults --> Inconsistent Defaults
     
  • Scott Conway
    Scott Conway
    2011-02-05

    An interesting problem. I agree with you that if we create a default CSVWriter and CSVReader then what we read from what we write should equal.

    The issue is though that for writing you can escape a double quote with another double quote. So I cannot just make the change you suggest because there unit test in CSVWriter will fail. I will try and take a look at it this weekend and see if I can come up with a solution. I will almost definitely take your unit test and merge it with the CSVWriterTest.

     
  • Scott Conway
    Scott Conway
    2011-02-06

    I looked back at the code and looked at the problem and while I still agree with you in principle that whatever a default CSVWriter writes a default CSVReader should be able to read in practice but we should not change the default Reader or Writer.

    The reasons for this are:

    1. OpenCSV is already out and being used. Modifying the default behavior will force people to modify existing code and this we cannot allow. I should have caught on to this when I initally tried to change the code and three unit test failed but I was hoping to come up with a solution that could change the default behavior but still allow existing test to pass. Alas it was not so.

    2. Writing is different from reading. When writing there is only one thing we need to escape and that is the quotation mark. Everything else is up to the developer calling the writer to escape out themselves. Even then the developer can either turn off escaping the quotation or chosing their own escape character but it is considered a valid default to escape a double quote with another double quote.

    3. CSVReader should be able to handle double quote regardless of its own escape character. Looking through the code I found the following in CSVParser:

    if (c == this.escape) {
    if( isNextCharacterEscapable(nextLine, inQuotes || inField, i) ){
    sb.append(nextLine.charAt(i+1));
    i++;
    }
    } else if (c == quotechar) {
    if( isNextCharacterEscapedQuote(nextLine, inQuotes || inField, i) ){
    sb.append(nextLine.charAt(i+1));
    i++;
    }

    So if the character is not a escape character (because that was handled above) but its a quote character and the next character is one too then handle the current double quote as a escape character.

    Here again I will try and incorporate your unit test in but if it fails I believe its because of an failure in the above code in CSVParser not because of CSVReader or Writer.

     
  • Scott Conway
    Scott Conway
    2011-02-06

    Looking at your example I now see the issue. It isn't about the quote being used as a escape character its that you write out \\ and read in \ because the escape character for the default writer is " and the default escape character for the reader is \. Here again I have to go back to the first point of my previous argument. If we change the default behavior then other people will have to change existing code and we don't want that.

    sorry about that.

    Sincerely

    Scott :)

     
  • Scott Conway
    Scott Conway
    2011-02-06

    • assigned_to: nobody --> sconway
    • status: open --> closed-wont-fix