When data from a reader contains a carriage return and a line feed (\r\n), CSVReader strips the \r. This is a problem if you want to transform data while maintaining information about its original formatting. I'm not familiar enough with the codebase to submit a patch, but it seems code around lines 169 and 234 of CSVReader.java and line 280 of CSVParser.java may be eating our carriage returns.
I tested this against version 3.1 and was able to reproduce the issue with the attached code.
Here is the console output of the code for those who don't want to compile.
Entire string: one,'delicious
line',rite
Normal string replacement: one,'delicious<CRLF>line',rite
Replacement after readAll(), \r\n: delicious
line
Replacement after readAll(), \n: delicious<CRLF>line
Please let me know what I can do to help.
Good catch. I did a quick grep of the code and only saw code for the carriage return in the CSVWriter. I also checked the CSVReader code all the way back to version 1.8 and the reader code is essentially unchanged (no carriage return logic). I was about to conclude that this was how it was designed when I came across the following in the original docs:
<faq id="what-features">
<question>What features does opencsv support?</question>
<answer>
opencsv supports all the basic csv-type things you're likely to want to do:
Now while technically it could be argued that that when Glen Smith wrote this he was thinking \n instead of \r when he wrote this giving the multi line remark but I don't think so as that does, like you said, goes against the spirit of opencsv.
Seeing that there was no Unit tests on this I think this was missing from the beginning.
I was able to recreate the issue with a unit test.
@Test
public void bug106ParseLineWithCarriageReturnNewLineStrictQuotes() throws IOException {
}
needless to say nextLine[1] was 123\n4567 instead of 123\r\n4567.
The issue is the BufferedReader. We use that to read the CSV line by line then we patch the lines together. The Buffered reader is not differentiating between \r\n and \n.
It will take a little while to work this one out because I don't want to sacrifice too much performance on this. Coincidentally my latest project with openCSV is a set of performance tests so I could find bottle necks in the openCSV code. Once I have that done I will have a baseline I can use to test the fixes to make sure I pick the optimal fix.
:)
Assigning to me.
Mathew - I have a fix and it is merged into the trunk. If you can please do a git pull of the trunk and build it and let me know if this works for you.
By default the code will work as is. To keep carriage returns use the CSVReaderBuilder to build your reader using the withKeepCarriageReturn method like such:
CSVReaderBuilder builder = new CSVReaderBuilder(reader);
CSVReader = builder.withKeepCarriageReturn(true).build();
I am leaning more and more on the builders because otherwise I would have to double the number of constructors for each parameter I add.
:)
Fix has been released in version 3.3