RFC4180Parser is broken when a multi-line quoted field contains line ends...
Brought to you by:
aruckerjones,
sconway
When a multi-line quoted field contains a line (but not the last line) which ends with dquote escaped dquote, it terminates the field unexpectedly.
Fisrt, this works as expected:
a,"b""x
c",d
It get you 1 record with 3 fields, 1st field is a
, 3rd field is d
, and 2nd field (multi-line) is:
b"x
c
However, with a little change to the csv (removing that extra x), it will be broken:
a,"b""
c",d
It becomes 2 record, each with 2 fields:
First record is a
, b"
. Second record is c"
,b
. In this case it's broken.
At first I was going to say we fixed this but looking through the history it was a different issue (bug 165). I was able to recreate this with a simple unit test and will look at fixing this for the next release.
Okay - I have found a solution that causes the test to pass, and more importantly not cause the existing tests to fail.
I have merged it in trunk and it will go out in the next release.
Thanks for the fix Scott. I just ran into this issue as well. Do you know roughly when the next release is planned or is there a patch or something I can apply in the meantime?
Sorry no patches. Short term you can clone the project and build the jar file yourself.
I was going to do a release last weekend but I caught a cold. I am hoping for this weekend if life works out.
Scott :)
Haha. Thanks Scott!
Fix has been released please download version 4.5 and test it out.
Thanks again! I've verified the fix works for me.
Thanks for the quick verification.