Menu

#140 Data loss when line separator is "\r\n"

v1.0 (example)
closed-rejected
None
2
2017-06-16
2017-02-13
Aman Kumar
No

There is no option to customize line separator for CSVReader. Because of this I’m facing a problem of data loss.
I’m working on windows machine. I’ve a csv file of 3 columns with 1000 records with default record separator i.e. “,”. Each record is in single line and line separator is “\r\n” (as I can see while code debug). When I read this file using CSVReader and write in some other file using CSVWriter I get 1000 records of 3 columns but length of file is different form source file. So, If I debug this file I see the line separator Is “\n” instead of original “\r\n”. this is because by default CSVReader ignores “\r”. so it removes all “\r” form every line.
There is an option of keep_carriage_return for CSVReader, If I am using this option, in the resultant file, for every record last column contains “\r”. so it’s corrupting last column data for every record and changing line separator form “\r\n” to “\n”.
There is a need of CSVReader and CSVWriter constructor that can accept line separator also.

Discussion

  • Scott Conway

    Scott Conway - 2017-03-19

    Please send us a data sample that shows the issue you are having so we can better diagnose the problem and/or create a unit test to recreate it.

     
  • Aman Kumar

    Aman Kumar - 2017-03-28

    Hi Scott, Please find source file. I've parsed this file using opencsv with delimiter ",".

     
  • Aman Kumar

    Aman Kumar - 2017-03-28

    Hi Scott,

    Please find result. you can see data are same but there are some difference in length of source and result file. I'm also using '"' as quote char. there is an option of KEEP_CR. if I'm using this option, it's taking last column data with '"'(quote char).

     
  • Scott Conway

    Scott Conway - 2017-04-09

    Sorry it has been a while but work has been very demanding lately and will probably continue on that way for a while. Okay looking at your files I see what the problem is and its not dataloss, that was what was throwing me initially, because the values of the three columns are there. The new line that denotes the end of a line is NOT data. It is a separator that tells the system that one line has ended and another is beginning so treat this as a new set of data.

    The problem you are having is that you have a file that you have a file that has a different newline character than your java system properties is set for (System.getProperty("line.separator") ) and you want to preserve that in the output file (possibly without changing your system properties each time).

    We can and cannot help you here.

    Sorry for the vague answer but if you know what your line feed character is then the CSVWriter has an constructor where you pass in the line feed character and it prints that at the end of each line.

    If you do not know what your source is using but want to know that is where we really cannot help you. For us reading and writing are separate events and for us it is up to the person consuming openCSV to know the format of the file. If you were on a Unix/Linux/Mac system I would say use the file command:

    Macintosh-657:Downloads scott$ file source.txt
    source.txt: ASCII text, with CRLF line terminators
    Macintosh-657:Downloads scott$ file result.txt
    result.txt: ASCII text
    Macintosh-657:Downloads scott$

    But on windows the only way I can think of is to open a file reader and read until a new line character and see if the character before was a carriage return or not.

    Hope that helps you find the solution you are seeking.

    Scott :)

    P.S. The keepCR was added to the CSVReader because the readLine method in the BufferedReader that we use does not care what the system line feed is "Reads a line of text. A line is considered to be terminated by any one of a line feed ('\n'), a carriage return ('\r'), or a carriage return followed immediately by a linefeed."

     
  • Andrew Rucker Jones

    If there are no further comments on this ticket, I will close it in a couple of days.

     
  • Andrew Rucker Jones

    • status: open --> closed-rejected
    • assigned_to: Scott Conway
     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.