I want to read a mdb file whose data is windows-1256 encoded. When using jackcess directly, I invoke the setCharset method on the DatabaseBuilder object and get the rows properly decoded.
When using the Ucanaccess driver (either directly from Java or with a GUI client like SQLWorkBench), I get but garbage: the driver probably assume that the data is UTF-8.
I was wondering if there were an undocumented connection property I could set. I grepped the source files and skimmed through the tests but couldn't find anything. Looking at DefaultJackcessOpener.java, it seems that you don't set any options (apart when handling the read-only/read-write stuff).
Is there any a way to pass the DB character encoding to the driver? Or, in worse case, to decode afterwards? Any idea, anyone?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Yes, it's your idea. You can pass a different opener implemetation as a connection parameter. See on the ucanaccess site how to do it, and please let me know your findings
Last edit: Marco Amadei 2015-10-28
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Yes, it sound right(but I wouldn't use the net.ucanaccess.jdbc package name) and if you want you can also put the
jackcessOpener=net.ucanaccess.jdbc.JackcessWithCharsetOpener
entry into the Properties to pass to the DriverManager.getConnection method.
Cheers Marco
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Sorry for hijacking your namespace, I thought you would want to merge the class in your code base, as many users may want to be able to set the charset.
My question regarding passing arguments was: how to include the charset argument (in my case "windows-1256") with the connection string (so I can use this with any client)?
Thanks in advance.
I'll post the whole thing when I got it working.
Regards, Igor.
Last edit: i-blis 2015-10-29
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I made a custom implementation for the charset. It gets loaded properly. Still I can't get the rows to be properly decoded. Setting the charset exactly in the same way when leveraging Jackcess directly gets the row being properly decoded. Any idea?
mmmm... it should work. May you upload a little part of your db(just copy a table for test with same fake data, but with any character you see improperly decoded)?
If I've to do something more, I'm near to the release and I can do it soon.
Last edit: Marco Amadei 2015-10-30
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Yes, I did. As I said, it finds the class. But decoding doen't happen.
Did you do something special when building and deploying?
Thank again.
PS: Now I notice that decoding happens fine when using SQLWorkBench but not directly when calling it from Java (which is what I am basically doing). Any idea, pointers?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Did you do something special when building and deploying?
No, I just added your class to my project and modified the connection URL.
One thing to check is the character encoding of your .java file(s). I've sometimes gotten strange String results when running my code in Eclipse if my .java files were encoded as "Cp1252". Changing the encoding of the .java files to "UTF-8" usually makes such problems go away.
👍
1
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
thank you..(Gord Thompson) I had tried to change Encoding of *.java files from UTF-8 to Arabic > windows-1256( I'm using notepad++ ) and that it's successfully
Last edit: Abdelrazek Nageh 2019-02-06
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
In fact my JVM code is in Clojure (not Java) and I never had any encoding problem. As I said, when leveraging Jackcess directly the rows get decoded properly (and display fine).
Thanks for you help.
I'll post my findings once I identified the culprit.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
0 [main] DEBUG com.healthmarketscience.jackcess.impl.DatabaseImpl - Could not find expected index on table MSysObjects (Db=IncomeData.accdb)
20 [main] DEBUG com.healthmarketscience.jackcess.impl.DatabaseImpl - Could not find expected index on table MSysObjects (Db=IncomeData.accdb)
40 [main] DEBUG com.healthmarketscience.jackcess.impl.DatabaseImpl - Could not find expected index on table MSysObjects (Db=IncomeData.accdb)
40 [main] DEBUG com.healthmarketscience.jackcess.impl.DatabaseImpl - Could not find expected index on table MSysObjects (Db=IncomeData.accdb)
if
db.setCharset(Charset.forName("UTF-8"));
i am changing to:
db.setCharset(null);
Error messages is not appearing, but some of the data with UTF-8 encoding showing up in question marks,
Please could you help me out as in the MS Access db i have all letters/simbols showing correctly, but in SQL recordset some becomes (???????)
i am using netbeans 8.2 | ucanaccess 4.0.2 | .accdb access 2007-2013 file format |
Let me know if you need any more information
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Modern versions of Access save text values as Unicode but not with UTF-8 encoding, so forcing Jackcess to use UTF-8 is not going to work for a well-formed "Access 2017-2013" database file. Can you post a sample database that we can use to reproduce the issue?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
We wouldn't need the whole database, just one table with one row, along with a Minimum, Complete and Verifiable Example that illustrates the issue of characters displaying as ??????. If Access is displaying the characters properly then the strings are almost certainly encoded correctly as Unicode (but not UTF-8), in which case Jackcess should be able to retrieve them using its default charset.
Last edit: Gord Thompson 2017-12-13
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I want to read a mdb file whose data is windows-1256 encoded. When using jackcess directly, I invoke the setCharset method on the DatabaseBuilder object and get the rows properly decoded.
When using the Ucanaccess driver (either directly from Java or with a GUI client like SQLWorkBench), I get but garbage: the driver probably assume that the data is UTF-8.
I was wondering if there were an undocumented connection property I could set. I grepped the source files and skimmed through the tests but couldn't find anything. Looking at DefaultJackcessOpener.java, it seems that you don't set any options (apart when handling the read-only/read-write stuff).
Is there any a way to pass the DB character encoding to the driver? Or, in worse case, to decode afterwards? Any idea, anyone?
Yes, it's your idea. You can pass a different opener implemetation as a connection parameter. See on the ucanaccess site how to do it, and please let me know your findings
Last edit: Marco Amadei 2015-10-28
Thank you for the swift answer. I am giving it a shot right now.
Does it sound right?
Also I was wondering, how do I pass arguments from the connection string to my custom class?
"jdbc:ucanaccess:///path/to/dbfile.mdb;jackcessOpener=net.ucanaccess.jdbc.JackcessWithCharsetOpener"
With properties?
Last edit: i-blis 2015-10-29
Yes, it sound right(but I wouldn't use the net.ucanaccess.jdbc package name) and if you want you can also put the
jackcessOpener=net.ucanaccess.jdbc.JackcessWithCharsetOpener
entry into the Properties to pass to the DriverManager.getConnection method.
Cheers Marco
Thank you again, Marco.
Sorry for hijacking your namespace, I thought you would want to merge the class in your code base, as many users may want to be able to set the charset.
My question regarding passing arguments was: how to include the charset argument (in my case "windows-1256") with the connection string (so I can use this with any client)?
Thanks in advance.
I'll post the whole thing when I got it working.
Regards, Igor.
Last edit: i-blis 2015-10-29
Fix the charset in your code, don't get it from the cs parameter: your implementation is related to one specific charset.
I made a custom implementation for the charset. It gets loaded properly. Still I can't get the rows to be properly decoded. Setting the charset exactly in the same way when leveraging Jackcess directly gets the row being properly decoded. Any idea?
Here's the code again:
mmmm... it should work. May you upload a little part of your db(just copy a table for test with same fake data, but with any character you see improperly decoded)?
If I've to do something more, I'm near to the release and I can do it soon.
Last edit: Marco Amadei 2015-10-30
Oh, sure: https://curl.io/get/cmggvjx1/1bfca37a09849ef5c147c854796bf789fd617ec0
Column "bk" of table "0bok" is Arabic text encoded as Windows-1256. Would be nice if you could have look... Thanks again.
Notes:
First row's bk should be "القرآن ونقض مطاعن الرهبان" (this is what I get with Jackcess) and I get "������ ���� ����� �������" with UCanAccess.
The mdb file is an Access 1997 file and Jackcess can't open it in read-write mode but only in read-only mode.
Last edit: i-blis 2015-10-30
Your code seems to be working fine for me. Did you remember to append
to your connection URL?
Last edit: Gord Thompson 2015-10-30
Yes, I did. As I said, it finds the class. But decoding doen't happen.
Did you do something special when building and deploying?
Thank again.
PS: Now I notice that decoding happens fine when using SQLWorkBench but not directly when calling it from Java (which is what I am basically doing). Any idea, pointers?
No, I just added your class to my project and modified the connection URL.
One thing to check is the character encoding of your .java file(s). I've sometimes gotten strange String results when running my code in Eclipse if my .java files were encoded as "Cp1252". Changing the encoding of the .java files to "UTF-8" usually makes such problems go away.
Last edit: Abdelrazek Nageh 2019-02-06
In fact my JVM code is in Clojure (not Java) and I never had any encoding problem. As I said, when leveraging Jackcess directly the rows get decoded properly (and display fine).
Thanks for you help.
I'll post my findings once I identified the culprit.
Perhaps try compiling your custom opener into a .class file and adding it to your CLASSPATH ...?
I am having strange behaviour by trying this solution,
by setting up UTF-8 in Folowing class:
I do get an error:
if
i am changing to:
Error messages is not appearing, but some of the data with UTF-8 encoding showing up in question marks,
Please could you help me out as in the MS Access db i have all letters/simbols showing correctly, but in SQL recordset some becomes (???????)
i am using netbeans 8.2 | ucanaccess 4.0.2 | .accdb access 2007-2013 file format |
Let me know if you need any more information
Modern versions of Access save text values as Unicode but not with UTF-8 encoding, so forcing Jackcess to use UTF-8 is not going to work for a well-formed "Access 2017-2013" database file. Can you post a sample database that we can use to reproduce the issue?
Hello, There is clients information. i cannot share the data.. does there is a different way to check it?
We wouldn't need the whole database, just one table with one row, along with a Minimum, Complete and Verifiable Example that illustrates the issue of characters displaying as
??????
. If Access is displaying the characters properly then the strings are almost certainly encoded correctly as Unicode (but not UTF-8), in which case Jackcess should be able to retrieve them using its default charset.Last edit: Gord Thompson 2017-12-13