CSVReader cannot parse simpe CSV file with empty quoted cells properly:
"col1";"col2"
"";1
"";2
The third line is returned as ";2
JUnit test to reproduce the error:
@Test
public void testASingleQuoteAsDataElementWithEmptyField2() throws IOException {
StringBuilder sb = new StringBuilder(CSVParser.INITIAL_READ_SIZE);
sb.append("\"\";1").append("\n");// ;1
sb.append("\"\";2").append("\n");// ;2
CSVReader c = new CSVReader(new StringReader(sb.toString()), ';', '\"');
String[] nextLine = c.readNext();
assertEquals(2, nextLine.length);
assertEquals(0, nextLine[0].length());
assertEquals("1", nextLine[1]);
nextLine = c.readNext();
assertEquals(2, nextLine.length);
assertEquals(0, nextLine[0].length());
assertEquals("2", nextLine[1]);
}
I would like to read CSV files according to RFC4180 (http://tools.ietf.org/html/rfc4180).
My patch is attached:
- three quote mode controls the parsing: STRICT_QUOTE, TRICKY_QUOTE and RFC_4180
- the CSV file mentioned above can be parsed in RFC_4180 mode
thank you for your great software,
Béla Boros
The default settings for openCSV will take the empty double quote string as an escaped double quote (also per RFC_4180) so it then picks up the separator as part of the field.
To get the results you want turn off strict quotes. The following test passes and is what I believe you want.
@Test
public void issue93ParsingEmptyDoubleQuoteField() throws IOException {
CSVParserBuilder builder = new CSVParserBuilder();
CSVParser parser = builder.withStrictQuotes(false).build();
// "",2
String[] nextLine = parser.parseLineMulti("\"\",2");