opencsv / Bugs / #106 CSVParser replaces carriage returns with line feeds.

#106 CSVParser replaces carriage returns with line feeds.

Milestone: v1.0 (example)

Status: closed-fixed

Owner: Scott Conway

Labels: None

Priority: 3

Updated: 2015-03-08

Created: 2015-01-22

Creator: Mathew Woodyard

Private: No

When data from a reader contains a carriage return and a line feed (\r\n), CSVReader strips the \r. This is a problem if you want to transform data while maintaining information about its original formatting. I'm not familiar enough with the codebase to submit a patch, but it seems code around lines 169 and 234 of CSVReader.java and line 280 of CSVParser.java may be eating our carriage returns.

I tested this against version 3.1 and was able to reproduce the issue with the attached code.

Here is the console output of the code for those who don't want to compile.
Entire string: one,'delicious
line',rite
Normal string replacement: one,'delicious<CRLF>line',rite
Replacement after readAll(), \r\n: delicious
line
Replacement after readAll(), \n: delicious<CRLF>line

Please let me know what I can do to help.

1 Attachments

DeliciousCarriageReturns.java

Discussion

Scott Conway - 2015-01-25

Good catch. I did a quick grep of the code and only saw code for the carriage return in the CSVWriter. I also checked the CSVReader code all the way back to version 1.8 and the reader code is essentially unchanged (no carriage return logic). I was about to conclude that this was how it was designed when I came across the following in the original docs:

<faq id="what-features">
<question>What features does opencsv support?</question>
<answer>
opencsv supports all the basic csv-type things you're likely to want to do:

<ul> <li>Arbitrary numbers of values per line</li> <li>Ignoring commas in quoted elements</li> <li>Handling quoted entries with embedded carriage returns (ie entries that span multiple lines)</li> <li>Configurable separator and quote characters (or use sensible defaults)</li> <li>Read all the entries at once, or use an Iterator style model</li> <li>Creating csv files from String[] (ie. automatic escaping of embedded quote chars)</li> </ul> </answer></faq>

Now while technically it could be argued that that when Glen Smith wrote this he was thinking \n instead of \r when he wrote this giving the multi line remark but I don't think so as that does, like you said, goes against the spirit of opencsv.

Seeing that there was no Unit tests on this I think this was missing from the beginning.

I was able to recreate the issue with a unit test.

@Test
public void bug106ParseLineWithCarriageReturnNewLineStrictQuotes() throws IOException {

StringBuilder sb = new StringBuilder(CSVParser.INITIAL_READ_SIZE); sb.append("\"a\",\"123\r\n4567\",\"c\"").append("\n"); // "a","123\r\n4567","c" CSVReader c = new CSVReader(new StringReader(sb.toString()), CSVParser.DEFAULT_SEPARATOR, CSVParser.DEFAULT_QUOTE_CHARACTER, true); String[] nextLine = c.readNext(); assertEquals(3, nextLine.length); assertEquals("a", nextLine[0]); assertEquals(1, nextLine[0].length()); assertEquals("123\r\n4567", nextLine[1]); assertEquals("c", nextLine[2]);

}

needless to say nextLine[1] was 123\n4567 instead of 123\r\n4567.

The issue is the BufferedReader. We use that to read the CSV line by line then we patch the lines together. The Buffered reader is not differentiating between \r\n and \n.

It will take a little while to work this one out because I don't want to sacrifice too much performance on this. Coincidentally my latest project with openCSV is a set of performance tests so I could find bottle necks in the openCSV code. Once I have that done I will have a baseline I can use to test the fixes to make sure I pick the optimal fix.

:)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Scott Conway - 2015-01-25

assigned_to: Scott Conway

Priority: 5 --> 3
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Scott Conway - 2015-01-25

Assigning to me.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Scott Conway - 2015-02-22

Mathew - I have a fix and it is merged into the trunk. If you can please do a git pull of the trunk and build it and let me know if this works for you.

By default the code will work as is. To keep carriage returns use the CSVReaderBuilder to build your reader using the withKeepCarriageReturn method like such:

CSVReaderBuilder builder = new CSVReaderBuilder(reader);
CSVReader = builder.withKeepCarriageReturn(true).build();

I am leaning more and more on the builders because otherwise I would have to double the number of constructors for each parameter I add.

:)

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Scott Conway - 2015-03-08

status: open --> closed-fixed
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Scott Conway - 2015-03-08

Fix has been released in version 3.3

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

CSVParser replaces carriage returns with line feeds.

Group

Searches

Help

#106 CSVParser replaces carriage returns with line feeds.

Discussion