UCanAccess / Discussion / Help: How to pass the DB character encoding the driver?

i-blis - 2015-10-28

I want to read a mdb file whose data is windows-1256 encoded. When using jackcess directly, I invoke the setCharset method on the DatabaseBuilder object and get the rows properly decoded.

When using the Ucanaccess driver (either directly from Java or with a GUI client like SQLWorkBench), I get but garbage: the driver probably assume that the data is UTF-8.

I was wondering if there were an undocumented connection property I could set. I grepped the source files and skimmed through the tests but couldn't find anything. Looking at DefaultJackcessOpener.java, it seems that you don't set any options (apart when handling the read-only/read-write stuff).

Is there any a way to pass the DB character encoding to the driver? Or, in worse case, to decode afterwards? Any idea, anyone?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Marco Amadei - 2015-10-28

Yes, it's your idea. You can pass a different opener implemetation as a connection parameter. See on the ucanaccess site how to do it, and please let me know your findings

Last edit: Marco Amadei 2015-10-28

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- i-blis - 2015-10-29
  
  Thank you for the swift answer. I am giving it a shot right now.
  
  package net.ucanaccess.jdbc; import java.io.File; import java.io.IOException; import java.nio.charset.Charset; import com.healthmarketscience.jackcess.Database; import com.healthmarketscience.jackcess.DatabaseBuilder; import net.ucanaccess.jdbc.JackcessOpenerInterface; public class JackcessWithCharsetOpener implements JackcessOpenerInterface { public Database open(File f, String cs) throws IOException { DatabaseBuilder db = new DatabaseBuilder(f); // dbd.setAutoSync(false); // do we want this? db.setCharset(Charset.forName(cs)); try { db.setReadOnly(false); return db.open(); } catch (IOException e) { db.setReadOnly(true); return db.open(); } } }
  
  Does it sound right?
  
  Also I was wondering, how do I pass arguments from the connection string to my custom class?
  
  "jdbc:ucanaccess:///path/to/dbfile.mdb;jackcessOpener=net.ucanaccess.jdbc.JackcessWithCharsetOpener"
  
  With properties?
  
  Last edit: i-blis 2015-10-29
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Marco Amadei - 2015-10-29

Yes, it sound right(but I wouldn't use the net.ucanaccess.jdbc package name) and if you want you can also put the
jackcessOpener=net.ucanaccess.jdbc.JackcessWithCharsetOpener
entry into the Properties to pass to the DriverManager.getConnection method.
Cheers Marco

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

i-blis - 2015-10-29

Thank you again, Marco.

Sorry for hijacking your namespace, I thought you would want to merge the class in your code base, as many users may want to be able to set the charset.

My question regarding passing arguments was: how to include the charset argument (in my case "windows-1256") with the connection string (so I can use this with any client)?

Thanks in advance.

I'll post the whole thing when I got it working.

Regards, Igor.

Last edit: i-blis 2015-10-29
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Marco Amadei - 2015-10-29

Fix the charset in your code, don't get it from the cs parameter: your implementation is related to one specific charset.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

I made a custom implementation for the charset. It gets loaded properly. Still I can't get the rows to be properly decoded. Setting the charset exactly in the same way when leveraging Jackcess directly gets the row being properly decoded. Any idea?

Here's the code again:

package ucaextension;

import java.io.File;
import java.io.IOException;
import java.nio.charset.Charset;
import com.healthmarketscience.jackcess.Database;
import com.healthmarketscience.jackcess.DatabaseBuilder;

import net.ucanaccess.jdbc.JackcessOpenerInterface;

public class JackcessWithCharsetW1256Opener implements JackcessOpenerInterface {
    public Database open(File f, String pwd) throws IOException {
        DatabaseBuilder db = new DatabaseBuilder(f);
        db.setCharset(Charset.forName("windows-1256"));
        try {
            db.setReadOnly(false);
            return db.open();
        } catch (IOException e) {
            db.setReadOnly(true);
            return db.open();
        }
   }
}

Marco Amadei - 2015-10-30

mmmm... it should work. May you upload a little part of your db(just copy a table for test with same fake data, but with any character you see improperly decoded)?
If I've to do something more, I'm near to the release and I can do it soon.

Last edit: Marco Amadei 2015-10-30

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

i-blis - 2015-10-30

Oh, sure: https://curl.io/get/cmggvjx1/1bfca37a09849ef5c147c854796bf789fd617ec0

Column "bk" of table "0bok" is Arabic text encoded as Windows-1256. Would be nice if you could have look... Thanks again.

Notes:

First row's bk should be "القرآن ونقض مطاعن الرهبان" (this is what I get with Jackcess) and I get "�� " with UCanAccess.

The mdb file is an Access 1997 file and Jackcess can't open it in read-write mode but only in read-only mode.

Last edit: i-blis 2015-10-30

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Gord Thompson - 2015-10-30
  
  Your code seems to be working fine for me. Did you remember to append
  
  ;jackcessOpener=ucaextension.JackcessWithCharsetW1256Opener
  
  to your connection URL?
  
  Last edit: Gord Thompson 2015-10-30
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

i-blis - 2015-11-02

Yes, I did. As I said, it finds the class. But decoding doen't happen.

Did you do something special when building and deploying?

Thank again.

PS: Now I notice that decoding happens fine when using SQLWorkBench but not directly when calling it from Java (which is what I am basically doing). Any idea, pointers?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Gord Thompson - 2015-11-02
  
  Did you do something special when building and deploying?
  
  No, I just added your class to my project and modified the connection URL.
  
  One thing to check is the character encoding of your .java file(s). I've sometimes gotten strange String results when running my code in Eclipse if my .java files were encoded as "Cp1252". Changing the encoding of the .java files to "UTF-8" usually makes such problems go away.
  
  👍
  1
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Abdelrazek Nageh - 2019-02-06
    
    thank you..(Gord Thompson) I had tried to change Encoding of *.java files from UTF-8 to Arabic > windows-1256( I'm using notepad++ ) and that it's successfully
    
    Last edit: Abdelrazek Nageh 2019-02-06
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

i-blis - 2015-11-03

In fact my JVM code is in Clojure (not Java) and I never had any encoding problem. As I said, when leveraging Jackcess directly the rows get decoded properly (and display fine).

Thanks for you help.

I'll post my findings once I identified the culprit.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Gord Thompson - 2015-11-03
  
  Perhaps try compiling your custom opener into a .class file and adding it to your CLASSPATH ...?
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Gintaras - 2017-12-12

I am having strange behaviour by trying this solution,
by setting up UTF-8 in Folowing class:

package kketsy.model; import com.healthmarketscience.jackcess.Database; import com.healthmarketscience.jackcess.DatabaseBuilder; import java.io.File; import java.io.IOException; import java.nio.charset.Charset; import net.ucanaccess.jdbc.JackcessOpenerInterface; public class JackcessWithCharsetUTF8Opener implements JackcessOpenerInterface { @Override public Database open(File file, String string) throws IOException { DatabaseBuilder db = new DatabaseBuilder(file); db.setCharset(Charset.forName("UTF-8")); try { db.setReadOnly(false); return db.open(); } catch (IOException e) { db.setReadOnly(true); return db.open(); } } }

I do get an error:

0 [main] DEBUG com.healthmarketscience.jackcess.impl.DatabaseImpl - Could not find expected index on table MSysObjects (Db=IncomeData.accdb)
20 [main] DEBUG com.healthmarketscience.jackcess.impl.DatabaseImpl - Could not find expected index on table MSysObjects (Db=IncomeData.accdb)
40 [main] DEBUG com.healthmarketscience.jackcess.impl.DatabaseImpl - Could not find expected index on table MSysObjects (Db=IncomeData.accdb)
40 [main] DEBUG com.healthmarketscience.jackcess.impl.DatabaseImpl - Could not find expected index on table MSysObjects (Db=IncomeData.accdb)

if

db.setCharset(Charset.forName("UTF-8"));

i am changing to:

db.setCharset(null);

Error messages is not appearing, but some of the data with UTF-8 encoding showing up in question marks,

Please could you help me out as in the MS Access db i have all letters/simbols showing correctly, but in SQL recordset some becomes (???????)

i am using netbeans 8.2 | ucanaccess 4.0.2 | .accdb access 2007-2013 file format |

Let me know if you need any more information
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Gord Thompson - 2017-12-13
  
  Modern versions of Access save text values as Unicode but not with UTF-8 encoding, so forcing Jackcess to use UTF-8 is not going to work for a well-formed "Access 2017-2013" database file. Can you post a sample database that we can use to reproduce the issue?
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Gintaras - 2017-12-13

Hello, There is clients information. i cannot share the data.. does there is a different way to check it?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Gord Thompson - 2017-12-13
  
  We wouldn't need the whole database, just one table with one row, along with a Minimum, Complete and Verifiable Example that illustrates the issue of characters displaying as ??????. If Access is displaying the characters properly then the strings are almost certainly encoded correctly as Unicode (but not UTF-8), in which case Jackcess should be able to retrieve them using its default charset.
  
  Last edit: Gord Thompson 2017-12-13
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

How to pass the DB character encoding the driver?

A pure Java JDBC driver for Microsoft Access database files

Forums

Help

How to pass the DB character encoding the driver?

How to pass the DB character encoding the driver?

A pure Java JDBC driver for Microsoft Access database files

Forums

Help

How to pass the DB character encoding the driver? document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

How to pass the DB character encoding the driver?