Tracker: Bugs

9 Opencsv parsing error for double quotes - ID: 3402853
Last Update: Comment added ( kazemnaderi )

I am using opencsv 2.3.
It handles parsing over 600,000 records in a csv file all fine except for the first record below:

"id","email","profile_id","first_name","last_name"
"475592","rbst218@yahoo.com","bc1er1163","\""CHia Sia Ta",""
"12345",''kazem.naderi@tt.com","bc234a","kazem","naderi"

It seems opencsv gets confused with the value of first_name (disymmetric double quotes at the beginnnig of the value)

You can save the above two lines in a file let's say badProfiles.csv and then using the code below try to parse the record. You will see that it cannot parse it correctly

public static List<UpakneeUserBean> parseCvSIntoBeans2(String filePath)
throws FileNotFoundException {
CSVReader reader = new CSVReader(new FileReader(filePath));
HeaderColumnNameMappingStrategy<UserBean> strat = new HeaderColumnNameMappingStrategy<UserBean>();
strat.setType(UserBean.class);
CsvToBean<UserBean> csv = new CsvToBean<UserBean>();
return csv.parse(strat, reader);
}

I have attached the UserBean.java file to this bug report.

Thanks for your help


kazem ( kazemnaderi ) - 2011-09-01 16:02:28 PDT

9

Open

None

Scott Conway

None

None

Public


Comments ( 6 )

Date: 2011-09-28 10:13:28 PDT
Sender: kazemnaderi

Hello,

Could you please let me know when you can fix this issue?


Thanks
Kazem



Date: 2011-09-08 10:35:11 PDT
Sender: kazemnaderi

Hi,
I do not think what you currently have in your system is a bug as a double
quote used to escape another double quote. By default your parser allows
using double quotes as text qualifier. In these cases usually double quotes
within a field get escape with another double quote (As Excel does the
same). Please see the references below:

http://creativyst.com/Doc/Articles/CSV/CSV01.htm
Fields that contain double quote characters must be surounded by
double-quotes, and the embedded double-quotes must each be represented by a
pair of consecutive double quotes.
So, John "Da Man" Doe would convert to "John ""Da Man""",Doe, 120 any
st.,...

http://www.csvreader.com/csv_format.php
In an Excel escaped CSV file, in fields containing a double quote, the
double quote must be escaped by replacing the single double quote with two
double quotes.
unix escaping VS Excel Escaping

http://en.wikipedia.org/wiki/Comma-separated_values
More sophisticated CSV implementations permit commas and other special
characters in a field value. Many implementations use " (double quote)
characters around values that contain reserved characters (such as commas,
double quotes, or newlines);

Going back to the issue we are dealing with I think still the parser has a
bug. The same way it accepts double quotes as an escape char for another
double quote it should be able to parse the user.CSV file successfully with
out any need to use a backslash.

Thanks again for you help on this

Best Regards
Kazem


Date: 2011-09-07 17:03:26 PDT
Sender: sconwayProject Admin

Ahh in that case the name should be "\\\"CHia Sia Ta". You can check out
the updated file from subversion to see the test. By default the
backslash is the escape character so you use a double backslash to create a
single backslash for output then a \" to get the ". because your input had
"" opencsv incorrectly assumed that you finished one filed and started
another - this caused it to think you had one line (with a field
containing a newline character) instead of two and that is why the assert
on the number of lines failed.

There is another defect in the system about allowing a double quote to
escape another double quote regardless of escape character but I have not
had a chance to look at it yet.

BTW thanks for the compliment. TDD has saved me many times and I am a firm
believer in it.

Let me know if this works for you and I will close out the defect.

:)


Date: 2011-09-07 10:49:26 PDT
Sender: kazemnaderi

Hello,

Thanks for looking into this. Sorry that the second line had a typo.
Before I continue I'd like to tell you that I am impresses with the way you
stick to Test Driven Development. I am a fan.

Getting back to the issue, I can see that you corrected it in user.csv.
The error comes from the first line. The first name in our system is
\"CHia Sia Ta (strange for being a first name). The backslash and the
double quote are part of the first name.

In the csv file the first name is specified as: ,"\""CHia Sia Ta",
As you can see the double quote is escaped with another double quote and I
believe this is correct.
This is what we have as the user.csv:

"id","email","profile_id","first_name","last_name"
"475592","rbst218@yahoo.com","bc1er1163","\""CHia Sia Ta",""
"12345","kazem.naderi@tt.com","bc234a","kazem","naderi"


The Parser fails on parsing the first name and your unit test fails on:

assertEquals(2, list.size());


I appreciate your help on this
Kazem


Date: 2011-09-05 20:49:51 PDT
Sender: sconwayProject Admin

Please check out the latest trunk version (106) and in the Integration test
there is now an issue3402853. Please look at the data in user.csv and the
unit test and tell me what should it be?

There were two problems with the data you put in the bug. The first was a
simple typo - for kazem email address you accidentally used two single
quotes instead of the double quote.

But for the first line you need to tell me what you are expecting the
output to be because the string you provided is definitly an error.

If you want the name to be "CHia Sia Ta then remove the extranous double
quote "\"CHia Sia Ta" if you want it to be ""CHia Sia Ta then you need to
escape both quotes "\"\"CHia Sia Ta"


Date: 2011-09-01 16:06:52 PDT
Sender: kazemnaderi

I changed the priority because this error breaks our scheduled job to parse
the CSV file and insert the result into DB. Currently we have to by pass
records similar to the one reported in this bug report entry


Attached File ( 1 )

Filename Description Download
UserBean.java Download

Changes ( 4 )

Field Old Value Date By
priority 8 2011-09-28 10:13:28 PDT kazemnaderi
assigned_to nobody 2011-09-07 10:49:26 PDT kazemnaderi
priority 5 2011-09-01 16:06:52 PDT kazemnaderi
File Added 422607: UserBean.java 2011-09-01 16:02:31 PDT kazemnaderi