From: Rob H. <rh...@me...> - 2017-06-10 07:31:02
|
Hi, I am debugging a project to convert a Forum from WebWiz to phpBB. WebWiz stores its data in an Access DB and our phpBB forum will be on MySQL. I am using mdb-tools version 0.7.1 on Ubuntu 16.04LTS 64-bit. According to mdb-ver, the mdb file I am working with is JET4. The problem I am trying to solve involves forum post text that includes some characters outside the basic character set. A specific example is the "half space" character whose UTF-8 representation I believe is the 3-byte sequence E2 80 89. My problem is that when I use mdb-export, these characters end up being converted to   (Hex: C3 A2 E2 82 AC E2 80 B0). If I open this database in M$Access and use its export tool, I end up with the expected UTF-8 representation of these characters in my output file (E2 80 89). I've Googled extensively and tried various permutations of the MDB_JET3_CHARSET and MDBICONV environment variables without any change to the output. For example the following command mdb-export test-forum.mdb tblThread produces exactly the same output as: MDB_JET3_CHARSET="UTF-8" mdb-export test-forum.mdb tblThread other tries include: MDB_JET3_CHARSET=UTF-8 mdb-export test-forum.mdb tblThread MDB_JET3_CHARSET="utf-8" mdb-export test-forum.mdb tblThread MDB_JET3_CHARSET=utf-8 mdb-export test-forum.mdb tblThread MDB_JET3_CHARSET="utf-8" mdb-export test-forum.mdb tblThread MDB_JET3_CHARSET="CP1252" mdb-export test-forum.mdb tblThread MDB_JET3_CHARSET=CP1252 mdb-export test-forum.mdb tblThread In each case, the output is the same: normal text is exported correctly, but the extended characters seem to be double-encoded. As the original DB is 200MB, I have created a stripped down copy containing just one row in this table with the "message" field containing text that includes a number of these special characters. Is this a bug? I'm happy to PM a copy of my test DB (377K) if anyone wants to investigate further. Cheers, -- Rob Hills Waikiki, Western Australia Mobile: +61 (412) 904-357 |