|
From: Jens V. <je...@zo...> - 2002-04-11 12:39:40
|
michael,
thanks for the answer, that helped a bit.
in handling these various kinds of strings (both UTF-8 encoded unicode =
and=20
latin-1 encoded unicode for web browser consumption) i always end up=20
running into trouble at some point because in some situations strings =
get=20
encoded more than once. does anyone know of a quick and fast test to=20
determine whether a string is already encoded in a certain encoding? my=20=
knowledge of regular expressions (which i assume it would take for that) =
is=20
extremely limited at best.
jens
On Wednesday, April 10, 2002, at 01:17 , Michael Str=F6der wrote:
> Jens,
>
> Sorry for answering that late.
>
> Jens Vagelpohl wrote:
>> i have a product that uses python-ldap and i'm trying to make sure=20
>> everything works when non-ascii characters are used in a DN. from =
what i=20
>> have been reading about OpenLDAP it either wants pure ASCII passed to =
it=20
>> (for search terms, DNs etc) or UTF-8-encoded unicode strings.
>
> Depends on the attribute. BTW: ASCII is a real subset of UTF-8. Or =
better=20
> said: The character entities encoded in ASCII are mapped to the very =
same=20
> encoding in UTF-8.
>
>> my question is: does python-ldap do any automatic string conversions?
>
> No! And I refused a patch which does. It cannot be done without =
applying=20
> knowledge about the schema (syntax of an attribute). Review the =
archives.
>
>> i get search results just fine using a non-ascii search term when i =
do=20
>> not convert the term myself and hand it to ldap.search_s, but i never =
get=20
>> results if i convert the string by myself and then hand it to the=20
>> search_s method.
>
> If you have a Unicode object with a LDAP search filter than you have =
to=20
> encode that before calling method search_s().
>
> Example (valid on my Linux console with ISO-8859-1):
>
> filter =3D unicode('cn=3D*Str=F6der*','iso-8859-1')
> l.search_s(search_root,ldap.SCOPE_SUB,filter.encode('utf-8'))
>
> Note that filter is a Unicode object created by passing a string and =
the=20
> known character set to the unicode() function.
>
> Ciao, Michael.
>
|