Menu

#125 "\" escape charecter in data

v1.0 (example)
closed-fixed
5
2017-05-03
2016-01-17
No

Hi,
I found that my data contains some fields that look like this "\", this caused multilinereader to remain inQuotes and broke the data reading sequence
my solution was to allow escape char when in field by modifying line 355 in CSVParser.java
from: if (c == this.escape) {
to: if (c == this.escape && !inField) {

Sample data line:
"00001364N","RECEPT RECT 80 S STR PC .140","EM-POM ","PO","RM3220802125500","41113","AIRBORN INC (CONNECTORS)","\",,"4",,"20010725","20120919",""

I noticed there is another bug on data containing \ in the bugs list, this could be related

Please tell me if my local fix is incorrect

Many thanks for this tool
Much appreciated

Discussion

  • Scott Conway

    Scott Conway - 2016-01-18

    Hello Jaakov. In this case the error is in the data not the code. The problem is you have a field that contains only a single escape character, which I believe is what you want in the final output, but becasue the escape character is not escaped opencsv thinks you are escaping the double quote so it is picking up other fields as part of the field it should have closed.

    This is expected behavior as there are fields that you would want a quote character inside of it (i.e. "Drill with 1/4\" bit").

    What you need to do is if you want the escape character as part of your data then you need to escape it "\"

    If whatever is generating the data cannot do that on a couple of occasions, when the users data does not need escaping, I tell them to set the escape character to an impossible value. You can try the null character and see if that works.

    Out of curiousity I did try your fix just to see what would happen but since it was expected behavior to escape quotes and escape characters about a half dozen tests in CSVParserTest failed.

    I will look at bug 97 and see if that is still valid.

    I will keep this open for a while in case you want to reply.

    Good Luck

    Scott Conway :)

     
  • Jaakov Jalink

    Jaakov Jalink - 2016-01-19

    Hi Scott,
    Many thanks for the quick reply
    I agree I could have changed the escape character to avoid this issue
    I agree my bypass to resolve my specific issue is not accurate and is not the correct solution

    As you guessed I have no control over input format and CSV generator has no control over the fields data so I can't change the source to match openCSV behaviour

    However, if we consider excel as the standard creating CSV files and we consider that in many cases we can't change the way source CSV data is formatted then I would check how excel handles a field with \ or even \" as data
    I created a file in excel with the following data (consider space as cell separator for this example please):
    \" \ \ " ",
    CSV version on save is:
    "\""",\,\,"""",""","
    My conclusion is that excel translates as follows:
    \ = \
    " = ""
    If field data is surrounded by quotes the quote is immediately followed by comma or EOL or EOF
    Default escape character in this case seems to be " (making quote and escape characters identical)

    I tried using " as both quote and escape char, it's not currently supported and requires
    It seems like OpenCSV doesn't work according to these rules and defaults at the moment
    Maybe this behaviour should be changed to fit the excel default standard

    Regards,
    Jaakov

     
  • Scott Conway

    Scott Conway - 2016-01-20

    Hello Jaakov.

    This one is going to be a tough nut to crack. The issue is that the excel csv format treat quotes around the field as optional - which is okay except that they do it on a field per field basis so some fields have quotes and some don't. To make it worse they are not escaping the escape character. I tried several variations of the options and could not get it to work.

    I am going to keep this bug open but really it is an feature request - to support the excel csv format. To do this we need to extend the existing CSVParser class and make an ExcelCSVParser class and have the CSVParserBuilder return either a CSVParser or an ExcelCSVParser depending on methods called.

    This needs to be done because there is no way I can think of to modify the existing CSVParser without breaking existing tests, which means that existing users could have issues, and I absolutely refuse to do that. You can read the Submission section of the wiki to see how passionite I am about protecting the existing users.

    This is going to be an long term project as I am the only one on this project. Short term I would recommend the Apache Commons CSV - https://commons.apache.org/proper/commons-csv/ They support several different formats one of which is Excel. If you do try it out let me know how it works.

    Sincerely

    Scott :)

     
  • Arend v. Reinersdorff

    Hi,

    I'd like to point out that using double quotes as both quote and escape character ist more widespread than just Excel:

    1. CSVWriter also uses double quotes " as both the default quote and escape character. By default CSVWriter consistently uses quotes for all fields.
    2. RFC 4180 which aims to standardize CSV also uses double quotes " as both the quote and escape character. RFC 4180 treats quotes as optional like Excel, it only requires quotes for fields with special characters (double quote, comma, linebreaks).
     
  • Scott Conway

    Scott Conway - 2016-03-31

    Hello Arend.

    Good points on both. I have already downloaded several links on RFC4180 specs and using Jaakov's example string and some examples I have created using Excel I have created a failing a JUnit test case and have started coded a Excel CSV Parser. At a 5000 foot level it looks easier and I had hoped to have something out by now but between work and Boy Scouts I have not had the free time to code up the parser yet. Right now the best ETA I can give is June-July as April and early May are still very busy for me.

     
  • Xihui Chen

    Xihui Chen - 2016-05-22

    I believe this is a bug, because if I put \ in the csv content using CSVWriter and read it back with CSVReader, it cannot read it back. Here is my code:
    @Test
    public void testCSVWriter() throws IOException {
    CSVWriter writer = new CSVWriter(new FileWriter("./src/test/resources/writerTest.csv"));
    // feed in your array (or convert your data to an array)
    String[] entries = "first#se\#cond#third\".split("#");
    writer.writeNext(entries);
    entries = "second#se\#cond#third\".split("#");
    writer.writeNext(entries);

        writer.close();
    }
    

    @Test
    public void testCSVReader() throws IOException {
    CSVReader reader = new CSVReader(new FileReader("./src/test/resources/writerTest.csv"));
    String[] nextLine;
    while ((nextLine = reader.readNext()) != null) {
    // nextLine[] is an array of values from the line
    System.out.println("new line: " + Arrays.toString(nextLine));
    }
    }

     
  • Xihui Chen

    Xihui Chen - 2016-05-22

    BTW, older version of opencsv doesn't have this problem:

    <dependency>
        <groupId>net.sf.opencsv</groupId>
        <artifactId>opencsv</artifactId>
        <version>1.7</version>
    </dependency>
    
     
  • Joshua Turner

    Joshua Turner - 2016-10-20

    At the risk of "me too"-ing the thread, I can confirm that the behaviour of CSVWriter and CSVReader is asymmetrical. At issue is that the default behaviour of CSVReader is to using the backslash as an escape character, whereas CSVWriter does not.

    Affected use cases are CSV files containing JSON snippets, and CSV files containing Windows paths.

    The ABNF grammar in RFC4180 is quite specific; CSV does not have an escape character apart from the delimiter character.

    A workaround that addresses the issue temporarily is to construct the CSVReader with a delimiter that is known to never appear in the content:

    new au.com.bytecode.opencsv.CSVReader(myReader, ',' , '\"', '\u263A');
    
     
  • Scott Conway

    Scott Conway - 2017-03-04
    • status: open --> closed-fixed
     
  • Scott Conway

    Scott Conway - 2017-03-04

    This is fixed in the 3.9 release.

     
  • ffunnyman

    ffunnyman - 2017-04-08

    It looks like it is still an issue. For the escape char CSVReader defaults to ICSVParser.DEFAULT_ESCAPE_CHARACTER which is \ while CSVWriter defaults to "

     
  • Scott Conway

    Scott Conway - 2017-04-10

    Those were the original defaults set up way back when. We cannot change those now as it would distrupt all the current users (A painful scenario we were fully reminded of when we upgraded OpenCSV from java5 to Java7).

     
  • Sean Stallbaum

    Sean Stallbaum - 2017-05-03

    Scott,

    It looks like in v3.9 the logic is still in place to disallow the quote & escape character to be equal, which is allowed (by default) in the CSVWriter, would it make sense to remove / alter it so that we can indeed have the quote & escape characters be the same?

     

Log in to post a comment.

MongoDB Logo MongoDB