We are using OpenCSV's CSVReader to parse a big message into Array of strings . Initially we are using default constructor to read all . With this default constructor we are missing '\' characters from the parsed strings since default constructor uses '\' this as Default escape character .
We went through the blog (http://sourceforge.net/p/opencsv/support-requests/5/) and modified the code accordingly as per solution and we passed '\0' as escape character hoping it will accept and parse all characters .
But when we pass '\0' as escape character we faced another big issue . Our Input string has 'NUL' (NUL : this is how its displayed in Notepad++ and in Unix box logs its displaying as '^@') . Whenever this character appears , CSVReader stopped reading next contents after the character .
Now the problem becomes big . Earlier we were trimming off backlash characters . But after this code change part of the mesage after 'NUL' character is missing at all .
Can someone help me like how to parse all characters using CSVReader .
I created the following unit test and it passes.
I am leaning towards the reader you are using.
Please send a sample file with maybe two fields and two lines each. The first with your ^@ character and the other with a null (the actual alphabetic characters can be random so no security concerns are raised. Then send a sample program that tries to parse it so I can see the reader you are using and the settings you are using.
Thanks
Another thing I recommend to rule out the reader is once you get the test above working comment out the CSVReader. Just have a simple program that calls the reader and writes the output so you see what your reader is doing.
If that works wrap your reader in a BufferedReader (which is what CSVReader does) and call the readLine method and print that output and see if that duplicates the issue.
package com.test.csvreader;
import java.io.BufferedReader;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import au.com.bytecode.opencsv.CSVParser;
import au.com.bytecode.opencsv.CSVReader;
public class TestCSVReader {
}
Hi
Please find the sample program I used to test the string with 'NUL' character . In this you can find three sys print statements .
1) Prints the actual message in the file
2) Prints the parsed output when Default Escape character '\' is used . When used this we will miss backslash characters present if any .
3) Prints the parsed output when \0 used as Escape character . In this output you can see CSVReader stopped parsing after 'NUL' character in the file .
Note : In the file when opened using Notepad++ we can see 'NUL' . But when the actual message is outputs to console , in console we are seeing it as ' ' (blank ) .
Please let me know if you need more details .
Thanks much for your help in advance .
Now I see what you are doing. Whatever character you have as an escape character if you have that in your original input and you want it there then you need to escape it <bg>. So in the case of the null you have to have two nulls to have a single null show up in the output. What your code was doing was escaping the second set of quotes this confused the parser and caused it to lose the rest of the data. </bg>
I attached a copy of your file with the double quote and wrote the following test to show what I was seeing.
private static final String TEST_FILE = "src/test/java/integrationTest/SR34/NULSpecialChar2.log";
private static final String DOUBLE_NULL_FILE = "src/test/java/integrationTest/SR34/NULSpecialChar3.log";
I did not see the ^@ in the file you sent. was that a purposeful or did that get translated into a space at the end of the last field during upload/download?
Hi Scott ,
http://en.wikipedia.org/wiki/Control_character
As per your recent reply , if we are using '\0' as escape character , there should be two 'NUL' characters available in file inorder one to appear in the output of parser .
As I already told if I am using Default escape character '\' 'NUL' characters are retained in the output but '\' is trimmed from the output and CSVReader was able to parse all the message atleast . where as in 'NUL' case CSVReader is stopped parsing after it encounters NUL character .
Since whatever character I am sending as a paramenter is being either trimmed or stopped parsing after that character .
Like is there any way we can tell CSVReader to consider all kind of characters ?
Or as per you do I need to modify the file with two NULs where ever I have one NUL , so that one will be escaped and other will be available in output .
Please let me know . Thank you very much for all the assistance you are providing .
I understand a little better what you are asking. I am sorry but opencsv requires four things: a Reader object, a separator character (default \ ) so we can tell that a new field has started, a quote character (default " ) which is tells us that everything between them is one field, and an escape character when dealing with quote or unprintable characters.
There is really no way to make any of them optional - sorry.
For what you are doing if you cannot easily expect where you are getting the file from to add the escape characters you need I would consider writing a preprocessor program that would add them in when needed.
Thanks Scott for your quick response . It seems we have to write a proprocess program or let the product itself publish the message with two backlashes wherever they have one backlash (if we go with Default Escape character as '\') . The same is happening for double quoute it seems . wherever the source system has one double quote (") , product is publishing it as two double quotes ("") , so that when Default Quote Charater as '"' , it will consider one double quote .