There is no option to customize line separator for CSVReader. Because of this I’m facing a problem of data loss.
I’m working on windows machine. I’ve a csv file of 3 columns with 1000 records with default record separator i.e. “,”. Each record is in single line and line separator is “\r\n” (as I can see while code debug). When I read this file using CSVReader and write in some other file using CSVWriter I get 1000 records of 3 columns but length of file is different form source file. So, If I debug this file I see the line separator Is “\n” instead of original “\r\n”. this is because by default CSVReader ignores “\r”. so it removes all “\r” form every line.
There is an option of keep_carriage_return for CSVReader, If I am using this option, in the resultant file, for every record last column contains “\r”. so it’s corrupting last column data for every record and changing line separator form “\r\n” to “\n”.
There is a need of CSVReader and CSVWriter constructor that can accept line separator also.
Please send us a data sample that shows the issue you are having so we can better diagnose the problem and/or create a unit test to recreate it.
Hi Scott, Please find source file. I've parsed this file using opencsv with delimiter ",".
Hi Scott,
Please find result. you can see data are same but there are some difference in length of source and result file. I'm also using '"' as quote char. there is an option of KEEP_CR. if I'm using this option, it's taking last column data with '"'(quote char).
Sorry it has been a while but work has been very demanding lately and will probably continue on that way for a while. Okay looking at your files I see what the problem is and its not dataloss, that was what was throwing me initially, because the values of the three columns are there. The new line that denotes the end of a line is NOT data. It is a separator that tells the system that one line has ended and another is beginning so treat this as a new set of data.
The problem you are having is that you have a file that you have a file that has a different newline character than your java system properties is set for (System.getProperty("line.separator") ) and you want to preserve that in the output file (possibly without changing your system properties each time).
We can and cannot help you here.
Sorry for the vague answer but if you know what your line feed character is then the CSVWriter has an constructor where you pass in the line feed character and it prints that at the end of each line.
If you do not know what your source is using but want to know that is where we really cannot help you. For us reading and writing are separate events and for us it is up to the person consuming openCSV to know the format of the file. If you were on a Unix/Linux/Mac system I would say use the file command:
Macintosh-657:Downloads scott$ file source.txt
source.txt: ASCII text, with CRLF line terminators
Macintosh-657:Downloads scott$ file result.txt
result.txt: ASCII text
Macintosh-657:Downloads scott$
But on windows the only way I can think of is to open a file reader and read until a new line character and see if the character before was a carriage return or not.
Hope that helps you find the solution you are seeking.
Scott :)
P.S. The keepCR was added to the CSVReader because the readLine method in the BufferedReader that we use does not care what the system line feed is "Reads a line of text. A line is considered to be terminated by any one of a line feed ('\n'), a carriage return ('\r'), or a carriage return followed immediately by a linefeed."
If there are no further comments on this ticket, I will close it in a couple of days.