Tracker: Bugs

5 parser - ID: 3439691
Last Update: Comment added ( hanbo2854 )

input: String line = "hi, \"hello, hanbo\", 30";
output:
token_1:hi
token_2: "hello, hanbo
token_3: 30

CSVParser csvParser = new CSVParser(
CSVParser.DEFAULT_SEPARATOR, CSVParser.DEFAULT_QUOTE_CHARACTER,
CSVParser.DEFAULT_ESCAPE_CHARACTER,
CSVParser.DEFAULT_STRICT_QUOTES,
false);

Input is not good for the patser, but, I think the ouput is alse bad.
Throw Exception, or output is : token_2: hello, hanbo

how to explain it?
thank you @


hanbo ( hanbo2854 ) - 2011-11-17 19:55:00 PST

5

Open

None

Nobody/Anonymous

None

None

Public


Comments ( 2 )

Date: 2011-11-20 22:31:08 PST
Sender: hanbo2854

line 228-244 in file CSVParser.java

update by next
// the tricky case of an embedded quote in the middle: a,bc"d"ef,g
if (!strictQuotes) {
if (i > 2 //not on the beginning of the line
&& nextLine.charAt(i - 1) != this.separator
//not at the beginning of an escape sequence
&& nextLine.length() > (i + 1) &&
(nextLine.charAt(i + 1) != this.separator
|| !inQuotes) //modify
//not at the end of an escape sequence
) {

if (ignoreLeadingWhiteSpace && sb.length() > 0
&& isAllWhiteSpace(sb)) {
sb.setLength(0); //discard white space
leading up to quote
} else {
sb.append(c);
continue; //add
}

}
}

It can work :
input: 1, \"2\",3
output:
token_1:1
token_2: "2"
token_3:3

But i think the format of the input string is invalid for the CSVParser
with setting ignore leading white space to false.


Date: 2011-11-18 13:24:59 PST
Sender: sconwayProject Admin

good catch. This is a bug - but not in the way you think it is.

The problem is that in the last parameter you set ignore leading white
space to false. In your input line there is a space between the separator
and the quote (\") so if you look at your result string for the second line
it is actually <space><quote>hello, hanbo

What it should be is <space><quote>hello, hanbo<quote> because a quote did
not start the token a quote does not end the token (its the comma
afterwards that does that. So the quote should be taken as a literal part
of the token same as the first quote.

If you turn on ignore leading white space (which is the default) then you
would get the string as you expected it.

Time permitting I will try and look into this before the holidays.


Attached File ( 1 )

Filename Description Download
ParserTest.java code file Download

Change ( 1 )

Field Old Value Date By
File Added 428720: ParserTest.java 2011-11-17 19:55:02 PST hanbo2854