|
From: <mi...@st...> - 2002-04-11 16:51:26
|
Jens Vagelpohl wrote: > in handling these various kinds of strings (both UTF-8 encoded unicode > and latin-1 encoded unicode for web browser consumption) i always end up > running into trouble at some point because in some situations strings > get encoded more than once. Especially when implementing web applications you have to take great care to define the charset used. Use <form accept-charset="utf-8" ..> or similar to also define the charset of the form input data. (This does not prevent e.g. StarOffice from sending ISO-8859-1 data.) Also set the charset of the output in the HTTP header *and* the <head> section. > does anyone know of a quick and fast test to > determine whether a string is already encoded in a certain encoding? my > knowledge of regular expressions (which i assume it would take for that) > is extremely limited at best. Hmm, in some situations a try: unicode() except UnicodeError: might help. But I do not recommend such a solution (although sometimes used in web2ldap) and you have to apply specific knowledge about the character sets/encodings. From my personal experience it's much work to make an application Unicode-aware. But you should try to clean up your application design. It's worth the effort. Ciao, Michael. |