Menu

#33 Error reading Unicode encoded text file

open
nobody
None
5
2011-04-12
2011-04-12
Gary Lynch
No

I'm trying to read a csv file which is encoded using 16-bit Unicode characters rather than 8-bit ASCII, the error I'm getting is listed below. Is there any way of making CsvJdbc work with Unicode?

Thanks
Gary

java.sql.SQLException: Unexpected '"' in position 1. Line= "P a r t N o " , " D e s c r i p t i o n " , " C o u n t " , " L o c a t i o n "
at org.relique.jdbc.csv.CsvRawReader.parseCsvLine(CsvRawReader.java:302)
at org.relique.jdbc.csv.CsvRawReader.getNextDataLine(CsvRawReader.java:223)
at org.relique.jdbc.csv.CsvRawReader.<init>(CsvRawReader.java:140)
at org.relique.jdbc.csv.CsvStatement.executeQuery(CsvStatement.java:382)
at TestCSV.main(TestCSV.java:19)

Discussion

  • Gary Lynch

    Gary Lynch - 2011-04-12

    Unicode encoded text file

     
  • Mario Frasca

    Mario Frasca - 2011-06-13

    the driver actually does work with Unicode, but only for the utf-8 encoding.
    check the sample5.csv testdata file.
    what you noticed is that the utf-16 unicode encoding is not supported.

    I don't know what is more reasonable, either require you save your data in the utf-8 encoding, or add the code to (recognize and) handle utf-16 encoding. probably a simple `encoding` property could suffice, let me know how important this is to you. and if you are able to program it yourself and provide me with your patch. if your patch includes unit testing, it will be easier for me to include it in the driver.

     
  • Gary Lynch

    Gary Lynch - 2011-06-13

    Hi, thanks for the response.

    Unfortunately the data comes from an external system that will only output in UTF-16.

    I am now using opencsv (http://opencsv.sourceforge.net) which is sufficient for my needs and allows me to specify an encoding in the InputStreamReader.

    thanks again.

     
  • Simon Chenery

    Simon Chenery - 2012-01-06

    Set database property "charset" to define character set of the
    files you are reading. I successfully queried your example
    Sample_Schedule.txt CSV file using the following code:

    Class.forName("org.relique.jdbc.csv.CsvDriver");
    Properties props = new Properties();
    props.put("charset", "UTF-16");
    props.put("skipLeadingLines", "3");
    props.put("fileExtension", ".txt");
    props.put("suppressHeaders", "true");
    props.put("headerline", "C1,C2,C3,C4");
    Connection conn = DriverManager.getConnection("jdbc:relique:csv:/tmp", props);
    Statement stmt = conn.createStatement();
    ResultSet results = stmt.executeQuery("SELECT * FROM Sample_Schedule");

    I added a description of the charset property to the csvjdbc web page.

    Files changed:
    website/www/index.html

     
  • Nobody/Anonymous

    An intriguing discourse is worth comment. I believe that you should indite solon on this subject, it might not be a inhibition substance but generally grouping are not sufficiency to verbalise on specified topics. To the next. Cheers like your Adult web Step into this.
    <a href="http://www.zootoo.com/profile/storybow97/blog/entry/inwhichcanyouidentifylargesize" title="Chubby Ladies">Chubby Ladies</a>

     
Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.