From: Damyan I. <di...@cr...> - 2005-11-24 09:31:51
|
Erik G=C3=BCnther wrote: > tor 2005-11-24 klockan 09:05 +0200 skrev Damyan Ivanov:=20 >=20 >>Shimon Rura wrote: >> >>>Erik, >>> >>>Unfortunately, I don't think there is a perfect solution to this. The= >> >>Whatever encoding a browses uses to send data, it is mandatory to suppl= y >>correct Content-Type header, right? Can't this be used when determining= request >>encoding? >=20 > Hmmm I did some checks on that and the only Content-Type header are fro= m > the server to the browser. The other way around I can only find > Accept-Charset. That aren't the same. You're right. I've had to check this beforehand. I see "Content-Type: application/x-www-form-urlencoded" header for POST requests, but even in it, there's no charset mentioned. :-( Here's excerpt from RFC 2070 - Internationalization of the Hypertext Mark= up Language. No ideal solution, though :-/ 5.2. Form submission The HTML 2.0 form submission mechanism, based on the "application/x- www-form-urlencoded" media type, is ill-equipped with regard to internationalization. In fact, since URLs are restricted to ASCII characters, the mechanism is akward even for ISO-8859-1 text. Section 2.2 of [RFC1738] specifies that octets may be encoded using the "%HH" notation, but text submitted from a form is composed of characters, not octets. Lacking a specification of a character encoding scheme, the "%HH" notation has no well-defined meaning. The best solution is to use the "multipart/form-data" media type described in [RFC1867] with the POST method of form submission. This mechanism encapsulates the value part of each name-value pair in a body-part of a multipart MIME body that is sent as the HTTP entity; each body part can be labeled with an appropriate Content-Type, including if necessary a charset parameter that specifies the character encoding scheme. The changes to the DTD necessary to support this method of form submission have been incorporated in the DTD included in this specification. A less satisfactory solution is to add a MIME charset parameter to the "application/x-www-form-urlencoded" media type specifier sent along with a POST method form submission, with the understanding that the URL encoding of [RFC1738] is applied on top of the specified character encoding, as a kind of implicit Content-Transfer-Encoding. One problem with both solutions above is that current browsers do not generally allow for bookmarks to specify the POST method; this should be improved. Conversely, the GET method could be used with the form data transmitted in the body instead of in the URL. Nothing in the protocol seems to prevent it, but no implementations appear to exist at present. How the user agent determines the encoding of the text entered by the user is outside the scope of this specification. NOTE -- Designers of forms and their handling scripts should be aware of an important caveat: when the default value of a field (the VALUE attribute) is returned upon form submission (i.e. the user did not modify this value), it cannot be guaranteed to be transmitted as a sequence of octets identical to that in the source document -- only as a possibly different but valid encoding of the same sequence of text elements. This may be true even if the encoding of the document containing the form and that used for submission are the same. Differences can occur when a sequence of characters can be represented by various sequences of octets, and also when a composite sequence (a base character plus one or more combining diacritics) can be represented by either a different but equivalent composite sequence or by a fully precomposed character. For instance, the UCS-2 sequence 00EA+0323 (LATIN SMALL LETTER E WITH CIRCUMFLEX ACCENT + COMBINING DOT BELOW) may be transformed into 1EC7 (LATIN SMALL LETTER E WITH CIRCUMFLEX ACCENT AND DOT BELOW), into 0065+0302+0323 (LATIN SMALL LETTER E + COMBINING CIRCUMFLEX ACCENT + COMBINING DOT BELOW), as well as into other equivalent composite sequences. > * New option in Config.xml that means use this charset if possible > output_charset =3D "UTF-8"=20 >=20 > * When a request arrives we first look in the session for _encoding to > see what encoding the request most likely be in and the change encoding= > to default_input_charset. This only needs to be done on request with > parameters (when QUERY_STRING or REQUEST_METHOD=3D'post'). If no sessio= n > or _encoding exists then use output_charset.=20 >=20 > If output_charset, default_input_charset and default_output_charset are= > utf-8 then there are pretty small chances that a conversion ever is > needed.=20 >=20 > If no output_charset exists in the config file the use the same behavio= r > we have to day, with no input_conversion. To me (and I don't use pkit extensively), your proposal seems appropriate= , it should work except when the browser supports neither cookies, nor utf-8, = which I guess is very uncommon situation and isnot handled right now either. Greetings, dam --=20 Damyan Ivanov Creditreform Bulgaria di...@cr... http://www.creditreform.bg/ phone: +359(2)928-2611, 929-3993 fax: +359(2)920-0994 mob. +359(88)856-6067 da...@ja.../Gaim |