From: אריאל ק. A. K. <kid...@gm...> - 2011-09-21 09:55:21
|
I hope this is the place to post such a problem. And I also hope my diagnosys is correct (that it's really is an encoding problem. I'm not sure). Well, I have a large mdb file, in which one of the fields contains strings like 0007-20101223-214033-שמות-בגדר_שם.mp3 or 0007-20110714-213442-יום_טוב_שני_של_גלויות.mp3 That is, part english, part numbers and part Hebrew (yes, that's hebrew, in case you can't see it in your browser). When I use mdb-export to extract data from this file, I get the numbers correctly, but only them. The hebrew and english parts are simply missing (even the '3' in the 'mp3' suffix). That is, when I extract the latter example I get only 0007-20110714-213442 I'll add that other fields contain only hebrew (e.g. יום טוב שני של גלויות, יב' תמוז, תשע'א in the example ebove), and they seem to be extracted correctly. That is, I get some gibberish which I guess is the correct data, only my terminal can't present it. I though it might be an encoding problem, so I've played a bit with MDB_ICONV, MDB_JET_CHARSET, MDB_JET3_CHARSET and MDB_JET4_CHARSET but it showed no difference. The file seems to be JET4 (so mdb-ver claims). I've no idea what encoding does it use (I don't know how to find out. Any ideas?), but I guess it's utf-8 (only a guess). I'll be grateful for any help! Ariel. |