#53 Incorrect handling of line break, if quot-char=esc-char

closed-rejected
Scott Conway
None
5
2010-12-25
2010-09-21
Patrick Eggert
No

According to RFC 4180 "Common Format and MIME Type for Comma-Separated Values (CSV) Files" the quotation-character and the escape-character are both double quotes ("). In this case CVSParser does not read lines correctly:

Example cvs content:
"abc", "de
f", "ghi"

In this case opencvs reads two datasets (first: abc, de - second: f, ghi).

Problem seems to be that CVSParser.parseLine first checks if a read character is an escape character and then, if it is a quotation character. If one interchanges those two conditions (like in the attached file), it seems to work fine.

Discussion

  • Patrick Eggert
    Patrick Eggert
    2010-09-21

    CSVParser.java

     
    Attachments
  • This is the same issue I described in issue #3030747(my example used a comma instead of a line break but the issue is the same).

    The problem with swapping the order of the two conditions is that while it fixes the issue you're seeing, it creates some others.

    For example,
    a,b""c""d,e
    would parse as
    {a, bcd, e}
    when it should parse as
    {a,b"c"d,e}

    I've uploaded a patch that fixes the issue without creating any new ones(or at least without breaking anything tested by the unit tests).

     
  • Scott Conway
    Scott Conway
    2010-12-25

    • assigned_to: nobody --> sconway
    • status: open --> closed-rejected
     
  • Scott Conway
    Scott Conway
    2010-12-25

    I am closing this as rejected. As with 3030747 when creating a CSVParser you should not have escape, separator or quote character the same. I have modified the CSVParser to throw an UnsupportedOperationException in such an event.

    That being said the CSVParser does allow quote character to be escaped by using two quote characters one right after the other - so you do not have to define a Reader, Writer or Parser where the escape character is equal to the quote character.

    :)