"\" escape charecter in data
Brought to you by:
aruckerjones,
sconway
Hi,
I found that my data contains some fields that look like this "\", this caused multilinereader to remain inQuotes and broke the data reading sequence
my solution was to allow escape char when in field by modifying line 355 in CSVParser.java
from: if (c == this.escape) {
to: if (c == this.escape && !inField) {
Sample data line:
"00001364N","RECEPT RECT 80 S STR PC .140","EM-POM ","PO","RM3220802125500","41113","AIRBORN INC (CONNECTORS)","\",,"4",,"20010725","20120919",""
I noticed there is another bug on data containing \ in the bugs list, this could be related
Please tell me if my local fix is incorrect
Many thanks for this tool
Much appreciated
Hello Jaakov. In this case the error is in the data not the code. The problem is you have a field that contains only a single escape character, which I believe is what you want in the final output, but becasue the escape character is not escaped opencsv thinks you are escaping the double quote so it is picking up other fields as part of the field it should have closed.
This is expected behavior as there are fields that you would want a quote character inside of it (i.e. "Drill with 1/4\" bit").
What you need to do is if you want the escape character as part of your data then you need to escape it "\"
If whatever is generating the data cannot do that on a couple of occasions, when the users data does not need escaping, I tell them to set the escape character to an impossible value. You can try the null character and see if that works.
Out of curiousity I did try your fix just to see what would happen but since it was expected behavior to escape quotes and escape characters about a half dozen tests in CSVParserTest failed.
I will look at bug 97 and see if that is still valid.
I will keep this open for a while in case you want to reply.
Good Luck
Scott Conway :)
Hi Scott,
Many thanks for the quick reply
I agree I could have changed the escape character to avoid this issue
I agree my bypass to resolve my specific issue is not accurate and is not the correct solution
As you guessed I have no control over input format and CSV generator has no control over the fields data so I can't change the source to match openCSV behaviour
However, if we consider excel as the standard creating CSV files and we consider that in many cases we can't change the way source CSV data is formatted then I would check how excel handles a field with \ or even \" as data
I created a file in excel with the following data (consider space as cell separator for this example please):
\" \ \ " ",
CSV version on save is:
"\""",\,\,"""",""","
My conclusion is that excel translates as follows:
\ = \
" = ""
If field data is surrounded by quotes the quote is immediately followed by comma or EOL or EOF
Default escape character in this case seems to be " (making quote and escape characters identical)
I tried using " as both quote and escape char, it's not currently supported and requires
It seems like OpenCSV doesn't work according to these rules and defaults at the moment
Maybe this behaviour should be changed to fit the excel default standard
Regards,
Jaakov
Hello Jaakov.
This one is going to be a tough nut to crack. The issue is that the excel csv format treat quotes around the field as optional - which is okay except that they do it on a field per field basis so some fields have quotes and some don't. To make it worse they are not escaping the escape character. I tried several variations of the options and could not get it to work.
I am going to keep this bug open but really it is an feature request - to support the excel csv format. To do this we need to extend the existing CSVParser class and make an ExcelCSVParser class and have the CSVParserBuilder return either a CSVParser or an ExcelCSVParser depending on methods called.
This needs to be done because there is no way I can think of to modify the existing CSVParser without breaking existing tests, which means that existing users could have issues, and I absolutely refuse to do that. You can read the Submission section of the wiki to see how passionite I am about protecting the existing users.
This is going to be an long term project as I am the only one on this project. Short term I would recommend the Apache Commons CSV - https://commons.apache.org/proper/commons-csv/ They support several different formats one of which is Excel. If you do try it out let me know how it works.
Sincerely
Scott :)
Hi,
I'd like to point out that using double quotes as both quote and escape character ist more widespread than just Excel:
Hello Arend.
Good points on both. I have already downloaded several links on RFC4180 specs and using Jaakov's example string and some examples I have created using Excel I have created a failing a JUnit test case and have started coded a Excel CSV Parser. At a 5000 foot level it looks easier and I had hoped to have something out by now but between work and Boy Scouts I have not had the free time to code up the parser yet. Right now the best ETA I can give is June-July as April and early May are still very busy for me.
I believe this is a bug, because if I put \ in the csv content using CSVWriter and read it back with CSVReader, it cannot read it back. Here is my code:
@Test
public void testCSVWriter() throws IOException {
CSVWriter writer = new CSVWriter(new FileWriter("./src/test/resources/writerTest.csv"));
// feed in your array (or convert your data to an array)
String[] entries = "first#se\#cond#third\".split("#");
writer.writeNext(entries);
entries = "second#se\#cond#third\".split("#");
writer.writeNext(entries);
@Test
public void testCSVReader() throws IOException {
CSVReader reader = new CSVReader(new FileReader("./src/test/resources/writerTest.csv"));
String[] nextLine;
while ((nextLine = reader.readNext()) != null) {
// nextLine[] is an array of values from the line
System.out.println("new line: " + Arrays.toString(nextLine));
}
}
BTW, older version of opencsv doesn't have this problem:
At the risk of "me too"-ing the thread, I can confirm that the behaviour of CSVWriter and CSVReader is asymmetrical. At issue is that the default behaviour of CSVReader is to using the backslash as an escape character, whereas CSVWriter does not.
Affected use cases are CSV files containing JSON snippets, and CSV files containing Windows paths.
The ABNF grammar in RFC4180 is quite specific; CSV does not have an escape character apart from the delimiter character.
A workaround that addresses the issue temporarily is to construct the CSVReader with a delimiter that is known to never appear in the content:
This is fixed in the 3.9 release.
It looks like it is still an issue. For the escape char CSVReader defaults to ICSVParser.DEFAULT_ESCAPE_CHARACTER which is \ while CSVWriter defaults to "
Those were the original defaults set up way back when. We cannot change those now as it would distrupt all the current users (A painful scenario we were fully reminded of when we upgraded OpenCSV from java5 to Java7).
Scott,
It looks like in v3.9 the logic is still in place to disallow the quote & escape character to be equal, which is allowed (by default) in the CSVWriter, would it make sense to remove / alter it so that we can indeed have the quote & escape characters be the same?