A cell value ending in \ (backslash) causes issues in the CSVReader
Brought to you by:
aruckerjones,
sconway
If the value of a csv data cell is a backslash or if the value ends in a backslash, the CSVReader continues parsing until another backslash is found which results in multiple cells being treating as one cell.
Unit Test:
import static org.junit.Assert.assertEquals;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileReader;
import java.io.FileWriter;
import java.util.ArrayList;
import java.util.List;
import org.junit.Test;
import au.com.bytecode.opencsv.CSVReader;
import au.com.bytecode.opencsv.CSVWriter;
public class DataReaderTest {
@Test //this one does not work with opencsv-2.3 public void defaultWriterDefaultReader() throws Exception { File file = new File("./tmp/testing.csv"); file.getParentFile().mkdirs(); CSVWriter writer = new CSVWriter(new BufferedWriter(new FileWriter(file))); writer.writeAll(getTestData()); writer.close(); CSVReader reader = new CSVReader(new FileReader(file)); List<String[]> list = reader.readAll(); reader.close(); assertEquals(4, list.size()); } @Test public void customWriterDefaultReader() throws Exception { File file = new File("./tmp/testing.csv"); file.getParentFile().mkdirs(); CSVWriter writer = new CSVWriter(new BufferedWriter(new FileWriter(file)), ',', '"', '\\'); writer.writeAll(getTestData()); writer.close(); CSVReader reader = new CSVReader(new FileReader(file)); List<String[]> list = reader.readAll(); reader.close(); assertEquals(4, list.size()); } @Test public void defaultWriterCustomReader() throws Exception { File file = new File("./tmp/testing.csv"); file.getParentFile().mkdirs(); CSVWriter writer = new CSVWriter(new BufferedWriter(new FileWriter(file))); writer.writeAll(getTestData()); writer.close(); CSVReader reader = new CSVReader(new FileReader(file), ',', '"', '\0'); List<String[]> list = reader.readAll(); reader.close(); assertEquals(4, list.size()); } @Test public void CustomWriterCustomReader() throws Exception { File file = new File("./tmp/testing.csv"); file.getParentFile().mkdirs(); CSVWriter writer = new CSVWriter(new BufferedWriter(new FileWriter(file)), ',', '"', '\\'); writer.writeAll(getTestData()); writer.close(); CSVReader reader = new CSVReader(new FileReader(file), ',', '"', '\\'); List<String[]> list = reader.readAll(); reader.close(); assertEquals(4, list.size()); } private List<String[]> getTestData() { List<String[]> list = new ArrayList<String[]>(); list.add(new String[] {"quote\"", "escape\\", "normal"}); list.add(new String[] {"double \"quote\"", "middle \\escape", "regular"}); list.add(new String[] {"typical", "end escape\\", "ordinary"}); list.add(new String[] {"one", "two", "three"}); return list; }
}
Since the OpenCSV project seems to be defunct and non-responsive for 2+years, I've created a "reboot" of it, called simplecsv, with some differences. I tried your tests above against it. simplecsv passes the tests, as long as you specify allowUnbalancedQuotes() (one of the options I added) to the third test above. I added those tests to the simplecsv CsvReaderTest. You can take a look here: https://github.com/quux00/simplecsv
Very nice, thank you. I'll take a look at it.
On Tue, Dec 3, 2013 at 12:20 AM, Michael Peterson quux444@users.sf.netwrote:
Related
Bugs:
#97I have a csv file which may contain unbalanced quotes within csv cells. I was hoping that I could use simplecsv to handle the parsing. However, when a cell contains a single double quote, the parser (which I've configured with the 'allowUnbalancedQuotes' option), merges the cell with the rest of the cells in the line.
Specific example: (uses separator character: '|', output prints out the array of tokens for the given line)
input: blah|this is a long name for this" record|blah2
output: blah, this is a long name for this" record|blah2
correct output: blah, this is a long name for this" record, blah2
Is this happening because of the same problem underlying this bug report? (symptoms are similar). Is there a fix available? If not, is there a way to correctly parse the file using simplecsv? Thanks in advance for looking into this.
For the input you are showing you need to escape the inside quote. Either with the escape character (usually ) or an extra double quote.
Sorry it took so long to get to these I am still catching up. Fortunately Jaakov created 127 which is basically the same issue.
The problem is that when you are creating the file you created a field escape\. which when it is written out becomes escape\ so what you have done is inadvertantly escaped the comma.
Good luck.
Scott Conway :)