From: Knut S. <knu...@se...> - 2001-06-05 08:21:18
|
> Date: Fri, 1 Jun 2001 19:21:40 -0700 (PDT) > From: Christoph Neumann <en...@ap...> > To: LDAP Mailing List <per...@li...> > Subject: International Characters > I chose the directoryString (UTF-8 format) type since it should allow the > international characters. However, when I try to insert the string 'im > k=E4ppele 8' I get the error: > > apuhomestreet: value #0 contains invalid data Hi Christoph, did you encode your data as UTF8-string? The 0xe4 above (ä = ae) looks like you try to add a latin1 string, but this is not a legal UTF8 byte sequence. I also had this problem with OpenLDAP 2.x but not with Netscape or ControlData. It looks to me, that it depends on the schema, ControlData (real X.500) defines attributes with more then on legal encoding (T.61, latin1, UTF8) and prefixes it with '{<encoding>}' like passwords. The server tries to determine the right encoding if it is not supplied, and falls back to T.61 if it is not clear. I don't know if this is conform to any LDAP-spec. -Knut __________________________________________ SecureNet GmbH - http://www.secure-net.de/ |
From: Chris R. <chr...@me...> - 2001-06-05 09:00:55
|
Knut Sander <knu...@se...> wrote: >> Date: Fri, 1 Jun 2001 19:21:40 -0700 (PDT) >> From: Christoph Neumann <en...@ap...> >> To: LDAP Mailing List <per...@li...> >> Subject: International Characters > = >> I chose the directoryString (UTF-8 format) type since it should allow = the >> international characters. However, when I try to insert the string 'im >> k=3DE4ppele 8' I get the error: >> = >> apuhomestreet: value #0 contains invalid data > = > Hi Christoph, > = > did you encode your data as UTF8-string? The 0xe4 above (=E4 =3D ae) = looks > like you try to add a latin1 string, but this is not a legal UTF8 byte > sequence. > = > I also had this problem with OpenLDAP 2.x but not with Netscape or > ControlData. It looks to me, that it depends on the schema, ControlData > (real X.500) defines attributes with more then on legal encoding (T.61, > latin1, UTF8) and prefixes it with '{<encoding>}' like passwords. The > server tries to determine the right encoding if it is not supplied, and > falls back to T.61 if it is not clear. I don't know if this is conform > to any LDAP-spec. That isn't legal for the standard syntaxes in LDAPv3 - everything *must* be UTF-8. It would be OK for LDAPv2, except that UTF-8 is not a valid encoding of a character set for LDAPv2. Netscape will typically accept any illegal garbage values and return them verbatim, so it isn't a good test of what is correct or not :-( Cheers, Chris |
From: Christoph N. <en...@ap...> - 2001-06-05 17:06:15
|
On Tue, 5 Jun 2001, Knut Sander wrote: > > I chose the directoryString (UTF-8 format) type since it should allow t= he > > international characters. However, when I try to insert the string 'im > > k=3DE4ppele 8' I get the error: > >=20 > > apuhomestreet: value #0 contains invalid data >=20 > Hi Christoph, >=20 > did you encode your data as UTF8-string? The 0xe4 above (=E4 =3D ae) look= s > like you try to add a latin1 string, but this is not a legal UTF8 byte > sequence. Hm...that seems to be correct. I checked the output from the "debug". =20 This is the string I am sending to the server: 0040 04 13: STRING =3D 'apuhomestreet' 004F 31 14: SET { 0051 04 12: STRING 0053 : 69 6D 20 6B E4 70 70 65 6C 65 20 38 __ __ __ __ im k.ppele= 8 005F : } Any recommendation on which encoding I should user in LDAP to support international characters? Is UTF8 really the way to go? If UTF8 is the way to go, how should I go about converting data that is=20 in iso-8859-1 to UTF8? A quick search on CPAN turned up "Unicode::MapUTF8" and "use utf8" pragma in perl 5.7. Anyone have experience with either of these? Also, where might I find good documentation on how these character sets are defined? Thanks for all the help and insight. - Christoph |
From: Knut S. <knu...@se...> - 2001-06-05 17:43:07
|
Christoph Neumann wrote: > On Tue, 5 Jun 2001, Knut Sander wrote: > > > I chose the directoryString (UTF-8 format) type since it should allow the > > > international characters. However, when I try to insert the string 'im > > > k=E4ppele 8' I get the error: > > > > > > apuhomestreet: value #0 contains invalid data > > > > Hi Christoph, > > > > did you encode your data as UTF8-string? The 0xe4 above (ä = ae) looks > > like you try to add a latin1 string, but this is not a legal UTF8 byte > > sequence. > > Hm...that seems to be correct. I checked the output from the "debug". > This is the string I am sending to the server: > 0040 04 13: STRING = 'apuhomestreet' > 004F 31 14: SET { > 0051 04 12: STRING > 0053 : 69 6D 20 6B E4 70 70 65 6C 65 20 38 __ __ __ __ im k.ppele 8 > 005F : } ok - E4 for ae is latin1, UTF8 needs 2 bytes for this. > Any recommendation on which encoding I should user in LDAP to support > international characters? Is UTF8 really the way to go? I think it is way =) > If UTF8 is the way to go, how should I go about converting data that is > in iso-8859-1 to UTF8? A quick search on CPAN turned up > "Unicode::MapUTF8" and "use utf8" pragma in perl 5.7. Anyone have > experience with either of these? > > Also, where might I find good documentation on how these character sets > are defined? I used Unicode::String, take a look at the example on perldoc Unicode::String, it work well and is easy to handle. I have good experiences by building the en/decoding into the application specific LDAP-layer (you allways need this for larger applications =) 'use utf8' in perl 5.6/7 may do this job now on the fly, but I did not play with it until now, because I can't use 5.6 on productive systems at the moment =(. Some pointer for this would be welcome =) - Knut __________________________________________ SecureNet GmbH - http://www.secure-net.de/ |
From: Christoph N. <en...@ap...> - 2001-06-05 17:51:25
|
On Tue, 5 Jun 2001, Knut Sander wrote: > Christoph Neumann wrote: > > On Tue, 5 Jun 2001, Knut Sander wrote: > > > > I chose the directoryString (UTF-8 format) type since it should all= ow the > > > > international characters. However, when I try to insert the string= 'im > > > > k=3DE4ppele 8' I get the error: > > > > > > > > apuhomestreet: value #0 contains invalid data > > > > > > Hi Christoph, > > > > > > did you encode your data as UTF8-string? The 0xe4 above (=E4 =3D ae) = looks > > > like you try to add a latin1 string, but this is not a legal UTF8 byt= e > > > sequence. > >=20 > > Hm...that seems to be correct. I checked the output from the "debug". > > This is the string I am sending to the server: > > 0040 04 13: STRING =3D 'apuhomestreet' > > 004F 31 14: SET { > > 0051 04 12: STRING > > 0053 : 69 6D 20 6B E4 70 70 65 6C 65 20 38 __ __ __ __ im k.p= pele 8 > > 005F : } >=20 > ok - E4 for ae is latin1, UTF8 needs 2 bytes for this. >=20 > > Any recommendation on which encoding I should user in LDAP to support > > international characters? Is UTF8 really the way to go? > I think it is way =3D) > =20 > > If UTF8 is the way to go, how should I go about converting data that is > > in iso-8859-1 to UTF8? A quick search on CPAN turned up > > "Unicode::MapUTF8" and "use utf8" pragma in perl 5.7. Anyone have > > experience with either of these? > >=20 > > Also, where might I find good documentation on how these character sets > > are defined? >=20 > I used Unicode::String, take a look at the example on perldoc > Unicode::String, it work well and is easy to handle. I have good > experiences by building the en/decoding into the application specific > LDAP-layer (you allways need this for larger applications =3D) Great! I checked it out. It looks like a great library. > 'use utf8' in perl 5.6/7 may do this job now on the fly, but I did not > play with it until now, because I can't use 5.6 on productive systems at > the moment =3D(. You and find more information about the new pragma at: http://search.cpan.org/doc/JHI/perl-5.7.1/lib/utf8.pm that also links to: http://search.cpan.org/doc/JHI/perl-5.7.1/pod/perlunicode.pod > Some pointer for this would be welcome =3D) >=20 > - Knut |
From: Christoph N. <en...@ap...> - 2001-06-05 19:00:14
|
Hello all, Not to deviate off-topic too much, but I'm extending the LDAP schema for the university where I work. I'm curious if there is a compelling reason to use the ASCII "IA5String" (1.3.6.1.4.1.1466.115.121.1.26) instead of the UTF8 "directoryString" (1.3.6.1.4.1.1466.115.121.1.15) type. Is there some sort of performance gain or added benefit of using the ASCII string? Or is the advantage to simply use the directory to enforce the data? - Christoph |