|
From: אריאל ק. A. K. <kid...@gm...> - 2011-09-23 06:03:17
|
2011/9/21 Nirgal <con...@ni...>: > Jet4 always use unicode (UCS2) internally. > > Output should be utf-8, unless you set env var MDBICONV (there is no underscore). > I tried again and again, and MDBICONV seems to have no effect (a strange fact by itself). Any more ideas please? Maybe I'm wrong, and it isn't an encoding problem. Can something else cause emd-export to ignore half of the field? [ak@ch ~/]$ setenv MDBICONV UTF-8 [ak@ch ~/]$ mdb-export -QHd^ WebStructure.mdb FilePaths | grep '^130419\^' 130419^9817^0113-20000101-010645-45_^Hebrew|HWomen|HinuchYeladimShlomBayit|HinuchYeladim|R0113-5|R0113-2^01/01/00 00:00:00^84^20223203^0113^1^0^45 ��ז� �ט��י ��ט��, ��' ��ך, ךי'ב^0^0^0^0 [ar@ch ~/]$ setenv MDBICONV iso-8859-1 [ar@ch ~/]$ mdb-export -QHd^ WebStructure.mdb FilePaths | grep '^130419\^' 130419^9817^0113-20000101-010645-45_^Hebrew|HWomen|HinuchYeladimShlomBayit|HinuchYeladim|R0113-5|R0113-2^01/01/00 00:00:00^84^20223203^0113^1^0^45 ��ז� �ט��י ��ט��, ��' ��ך, ךי'ב^0^0^0^0 [ar@ch ~/]$ setenv MDBICONV nothingatall [ar@ch ~/]$ mdb-export -QHd^ WebStructure.mdb FilePaths | grep '^130419\^' 130419^9817^0113-20000101-010645-45_^Hebrew|HWomen|HinuchYeladimShlomBayit|HinuchYeladim|R0113-5|R0113-2^01/01/00 00:00:00^84^20223203^0113^1^0^45 ��ז� �ט��י ��ט��, ��' ��ך, ךי'ב^0^0^0^0 [ar@ch ~/]$ See? MDBICONV has no effect. The 10th field (it's hebrew) seems the same (even if your terminal doesn't show hebrew, you can see there's no difference), and the 3rd field is still truncated. Only the numbers appear. Any help please?!? > > On Wednesday 21 September 2011 09:55:12 אריאל קלגסבלד Ariel Klagsbald wrote: >> I hope this is the place to post such a problem. And I also hope my >> diagnosys is correct (that it's really is an encoding problem. I'm not >> sure). >> >> Well, I have a large mdb file, in which one of the fields contains strings like >> >> 0007-20101223-214033-שמות-בגדר_שם.mp3 >> >> or >> >> 0007-20110714-213442-יום_טוב_שני_של_גלויות.mp3 >> >> That is, part english, part numbers and part Hebrew (yes, that's >> hebrew, in case you can't see it in your browser). >> >> When I use mdb-export to extract data from this file, I get the >> numbers correctly, but only them. The hebrew and english parts are >> simply missing (even the '3' in the 'mp3' suffix). That is, when I >> extract the latter example I get only >> >> 0007-20110714-213442 >> >> I'll add that other fields contain only hebrew (e.g. >> יום טוב שני של גלויות, יב' תמוז, תשע'א >> in the example ebove), and they seem to be extracted correctly. That >> is, I get some gibberish which I guess is the correct data, only my >> terminal can't present it. >> >> I though it might be an encoding problem, so I've played a bit with >> MDB_ICONV, MDB_JET_CHARSET, MDB_JET3_CHARSET and MDB_JET4_CHARSET but >> it showed no difference. >> The file seems to be JET4 (so mdb-ver claims). I've no idea what >> encoding does it use (I don't know how to find out. Any ideas?), but I >> guess it's utf-8 (only a guess). >> >> I'll be grateful for any help! >> Ariel. >> >> ------------------------------------------------------------------------------ >> All the data continuously generated in your IT infrastructure contains a >> definitive record of customers, application performance, security >> threats, fraudulent activity and more. Splunk takes this data and makes >> sense of it. Business sense. IT sense. Common sense. >> http://p.sf.net/sfu/splunk-d2dcopy1 >> _______________________________________________ >> mdbtools-dev mailing list >> mdb...@li... >> https://lists.sourceforge.net/lists/listinfo/mdbtools-dev >> > |