From: VAN D. W. <Wim...@eu...> - 2002-10-17 15:57:14
|
Thanks, unfortunately, the Unicode::Map8 module won't compile on our solaris 8 = box. But otoh, I found a workaround by someone else. I managed to get this = function to do just what I want. In case anybody needs it :=20 sub utf8_to_latin1 #converts from UTF8 to Latin1 { my $string =3D shift; my $format=3D$ENV{"UCFORMAT"}||('%lx'); $string =3D~ s/([\xC0-\xDF])([\x80-\xBF])/sprintf = ("%c",hex(sprintf($format, unpack("c",$1)<<6&0x07C0|unpack("c",$2)&0x003F)))/ge; $string =3D~ = s/([\xE0-\xEF])([\x80-\xBF])([\x80-\xBF])/sprintf("%c", hex(sprintf($format, unpack("c",$1)<<12&0xF000|unpack("c",$2)<<6&0x0FC0|unpack("c",$3)&0x003F= )))/ge; $string =3D~ s/([\xF0-\xF7])([\x80-\xBF])([\x80-\xBF])([\x80-\xBF])/sprintf ("%c", = hex (sprintf($format, unpack("c",$1)<<18&0x1C0000|unpack("c",$2)<<12&0x3F000| unpack("c",$3)<<6&0x0FC0|unpack("c",$4)&0x003F)))/ge; return $string; } Regards, Wim -----Original Message----- From: Mar...@ml... [mailto:Mar...@ml...] Sent: 17 October 2002 15:16 To: per...@li... Subject: Re: Dumping Unicode values into ascii text >Hi, >I'm trying to fetch values that are unicode(latin1) encoded, and dump = them into a csv textfile. >I uploaded them myself, and I'm surprised at how difficult I find it = to retrieve them again >The value I'm interested holds a "=E9". I uploaded in like this : >my $u=3DUnicode::String::latin1($site); >and then used $u as the value of the attribute to be uploaded. >This worked perfect and the values are perfectly readable with an ldap = browser or the console (we're using iplanet DS >here) i think they are readable because the ldap browser encodes unicode = also >Now the question is, how can I get it back in it's original format? >The original word was Br=E9tigny; the way it's stored now is = Br=C3=A9tigny. >It tried using >$site =3D Unicode::String::latin1("$u")->utf8 >and >$site =3D Unicode::String::utf8("$u") >and other combinations, but the problem is I don't even know to what = specific format I have to decode the Unicode >to... you should convert the string to iso-8859-1 try the perlmodule Unicode::MapUTF8; following lines will do the work. use Unicode::MapUTF8 qw(from_utf8) $the_latin_encoded_string =3D from_utf8({ -string =3D> = $the_utf8_string, -charset =3D> "ISO-8859-1" }); sure this only works when the input from the ldap is encoded in a = right was to test it you can add a value manually in utf8 format f.e. = h=E9ll=F2 and read it with perl out... >Any help or pointers would be much appreciated. >Best regards, >-- >Wim Van Dijck >MIS - Internet Team - Eurocontrol >Support bacteria - they're the only culture some people have. greets Martin ------------------------------------------------------- This sf.net email is sponsored by: viaVerio will pay you up to $1,000 for every account that you consolidate with us. http://ad.doubleclick.net/clk;4749864;7604308;v? http://www.viaverio.com/consolidator/osdn.cfm |