From: Rob H. <rh...@me...> - 2017-06-15 10:44:08
|
Hi All, In case anyone else comes across the same problem in the future, we have found a solution. It's a bit unintuitive, but when I tried setting the */MDBICONV/* variable to CP1252, our extended characters exported correctly. So, the command that worked for us is: MDBICONV="CP1252" mdb-export test-forum.mdb tblThread On 10/06/17 15:08, Rob Hills wrote: > I am debugging a project to convert a Forum from WebWiz to phpBB. > WebWiz stores its data in an Access DB and our phpBB forum will be on > MySQL. > > I am using mdb-tools version 0.7.1 on Ubuntu 16.04LTS 64-bit. > According to mdb-ver, the mdb file I am working with is JET4. > > The problem I am trying to solve involves forum post text that > includes some characters outside the basic character set. A specific > example is the "half space" character whose UTF-8 representation I > believe is the 3-byte sequence E2 80 89. My problem is that when I > use mdb-export, these characters end up being converted to   (Hex: > C3 A2 E2 82 AC E2 80 B0). > > If I open this database in M$Access and use its export tool, I end up > with the expected UTF-8 representation of these characters in my > output file (E2 80 89). > > I've Googled extensively and tried various permutations of the > MDB_JET3_CHARSET and MDBICONV environment variables without any change > to the output. > > For example the following command > > mdb-export test-forum.mdb tblThread > > produces exactly the same output as: > > MDB_JET3_CHARSET="UTF-8" mdb-export test-forum.mdb tblThread > ... Hope this helps someone else... -- Rob Hills Waikiki, Western Australia Mobile: +61 (412) 904-357 |