Menu

#45 Why and how does CSVReader take multiple lines as single line when it discovers \" in a field?

v1.0 (example)
closed
5
2017-03-04
2016-09-18
No

I am using au.com.bytecode.opencsv.CSVReader to read A csv file and print all the records one by one. The code is behaving strange. It's printing a group of lines together as a single line. Then again it's printing next set of lines correctly.

Link to the CSV File

Please download the CSV file from the link above. My code is considering the first line as - from first non-header line to the line just above the line which has below content:

12/4/13: Changed AO to Chief Financial officer.","07/18/2016",

Also, my first data line contains \" in one of the fields. You can do Cntrl +F with \" to find it. If I remove \ from the field , it works fine. Now my question is what logic CSVReader is using to end the first line as specified above? Why is it taking the end of line just before the line which has below content:

12/4/13: Changed AO to Chief Financial officer.","07/18/2016",

It's taking a new line from '12/4/13.........' . Also, the individual lines below that are being taken as separate lines perfectly .

Code for your reference :

csvReader reader = new CSVReader(new FileReader(fileNameWithLocation), ',', '"', 1);

 ColumnPositionMappingStrategy<DomainObj> mappingStrategy = 
                            new ColumnPositionMappingStrategy<DomainObj>();

         mappingStrategy.setType(DomainObj.class);

          String[] nextLine;

            while ((nextLine = reader.readNext()) != null) 
            {
                    if (nextLine != null) 
                    log.debug("Next line : " + Arrays.toString(nextLine));
            }

Discussion

  • Scott Conway

    Scott Conway - 2016-09-18

    Hello Satej

    The reason is that it reads multiple lines is that we need to allow for data that does have new lines in the fields. So in quoted data when you reach the end of the line and it has not closed the field (no close quotation mark) opencsv will read the next line and keep filling in that line of data. You can see that is the case in your file by looking at the line above the one you listed - put together you will see it really does make one row of data.

    ,,"440063","DSH440063B","39066","DSH","True","01/01/2014","10/01/2016","12",,,"JOHNSON CITY MEDICAL CENTER","Regional Cancer Center @ Johnson City Medical Center","2205 Pavilion Drive","Suite 101","Kingsport","TN","37660","4641",,,,,,,,,,,,,,,,,,"Shane E. Hilton","Chief Financial Officer","4234311038",,"Trish Tanner","Corp. Director, Consumer Health Svcs","4233023532",,"TRISH TANNER","SYSTEM SERVICES DIRECTOR, PHARMACY SERVICES","10/10/2013","4233023532",,,,,,,,,,,,,,"08/07/2015","False",,"12/3/13 I'm not sure that AO/SBO is at high enough level, pls chk
    12/4/13: Changed AO to Chief Financial officer.","07/18/2016",

    Notice that the line above ended with pls chk but no closequote so opencsv will read the next line and append the first part of the data to the next.

    Quotes that are part of the data must be escaped - hence the \".

    Hope that helps.

    Scott Conway :)

     
  • Satej Laxman Koli

    Thanks a Lot Scott for responding quickly. I understood that because of the \" it will keep on adding the lines, but this should happen until it finds a closing \" . As the file does not have any closing \" , it should have gone till the end of the file to make it as a single line. If you could elaborate why it ended the line at 'pls chk' only , it will be great (Also , I haven't fully understood last part of the answer :) )

    Also, what should be the ideal code changes to fix this without changing the CSV file? I have made below change and it's working fine for me:

    CSVReader reader = new CSVReader(new FileReader(fileNameWithLocation), ',', '"', '\0',1);

    Got the above code from here : http://stackoverflow.com/questions/14819626/how-to-read-a-string-containing-a-using-opencsv

    It's taking \ as part of the field and that is what is expected for us.

    Thanks :)

    Satej....

     
  • Satej Laxman Koli

    Also, I would like to add that below is considered as a complete new line:
    12/4/13: Changed AO to Chief Financial officer.","07/18/2016",

     
  • Scott Conway

    Scott Conway - 2016-09-21

    Ahhh - now your question is making a little more sense. I apologize on two counts: first for not responding back quickly but work and life have both been a little busy the last couple of days and second for just looking at the line in the datafile you sent but not taking the whole question into context.

    Honestly I am surprised it made it that far. My guess is that if you looked through the parsed lines you would see that it messed up at the first line and never recovered. the reader you got from the previous article looks okay to me. The other option would be if you were the one who generated the data file I would either escape out the slash character (make it \ ) so opencsv knows that the slash is suppose to be part of the line instead of escaping out the quote character. that is what is going on when you created the reader is you nulled out the escape character.

    I was going to suggest upgrading to opencsv 3.8 but I am sure it would have the same issue.

    Scott :)

     
  • Satej Laxman Koli

    Thanks a lot Scott :) Somehow, I am not the generator of the file. But let me see how this can be resolved. And yes, I had checked it with 3.8 also and it had not worked.

    I appreciate your valuable suggestions and thank you again for the guidance.

     
  • Scott Conway

    Scott Conway - 2017-01-31

    Hello Satej - We have just pushed out 3.9 opencsv so please give that a try. Use the RFC4180Parser as it does not have/support an escape character so I do not believe it will experience the issue.

     
  • Scott Conway

    Scott Conway - 2017-03-04
    • status: open --> closed
    • assigned_to: Scott Conway
     
  • Scott Conway

    Scott Conway - 2017-03-04

    RFC4180Parser in 3.9 release should fix this issue.

     

Log in to post a comment.