From: Chris H. <ch...@ha...> - 2003-10-13 20:29:29
|
See embedded... Chris "Craig Raw" replied: > Hi Chris, > > Thanks for your reply. Character set issues are much more complex than one > would think, as each piece of software the information passes through has a > chance to screw up the information, from the browser to the web server to > the database. This is complicated by having the System.out and log writing > also defaulting to the default character set, which may in turn lose > information. In the end, I found the mysql connection needed to have certain > flags set to enable Unicode support (useUnicode=true) and the character set > (characterEncoding=ISO-8859-1), a fact initially hidden by the logs. > > As to your solution, it certainly does not seem to hurt, but doesn't seem to > be necessary either, so I can't upadte your list of working browsers. Is > there a special case where one needs to do this? As you probably know, there is no provision in the HTTP protocol supporting GET for the encoding used to be communicated from the browser to the server. The page author and servlet writer therefore have to find some way of ensuring that the encoding used can reliably be predicted. It is a common misconception that, if you specify the character encoding used to encode the page, the browser will always use that same encoding when generating a request from that page. This is not so; most browsers permit users to select a different character encoding with which to view the page - the browser will use that encoding to return the data, and the server has no way of knowing (with GET) that a different encoding has been applied. When using POST you do at least have an HTTP header to tell you what encoding was used, but it is still possible for the user to select an inappropriate encoding, or one that the Server is not capable of decoding. I believe the use of the 'accept-charset' is the _only_ way of reliably getting International characters back from a web page using GET and also provides a foolproof method for use with POST, since it blocks attempts by the user to change the encoding. > > Thanks again, > Craig > > > ----- Original Message ----- > From: "Chris Haynes" <ch...@ha...> > To: <jet...@li...> > Sent: Sunday, October 12, 2003 10:13 AM > Subject: Re: [Jetty-support] Character encoding in request > > > > Right, > > > > There does appear to be a better way. > > > > I described it here and on one of the W3C International lists a few > > weeks ago and invided feedback from a wider range of browsers that I > > had available(feedback anyone?). > > > > In the FORM element add the attribute > > > > <form ... accept-charset="UTF-8" > > > > > Then in the Servlet use > > request.setCharacterEncoding( "UTF-8" ) > > before accessing any request parameters (as you have been doing). > > > > This seems to work for both POST and GET for all the browsers I have > > access to. > > > > It also seems to encode the request in UTF-8 regardless of the > > encoding you used in the page itself - which is good. > > > > Let me know if it works for you. > > > > If nobody reports any problems in the next week, I'll update the FAQ. > > > > Chris > > > > > > ----- Original Message ----- > > From: "Craig Raw" <cr...@qu...> > > To: <jet...@li...> > > Sent: Saturday, October 11, 2003 4:49 PM > > Subject: [Jetty-support] Character encoding in request > > > > > > > Hi, > > > > > > I am having problems retrieving non US-ASCII characters from a POST > > form > > > submission in Jetty with JBoss 3.2.1. The characters in question are > > those > > > such as 'é' (0xE9) and similar around this hex range. All of these > > > characters are coming through from the ServletRequest as '?' > > characters. > > > > > > This problem does not occur on a Windows box with default charset > > Cp1252. > > > The server with the problem is running Redhat 7.3 with a default > > charset of > > > ASCII. The form submission is of type POST. The same problem happens > > if I > > > enter in the request parameters on the browser URL field (with %HH > > escaping) > > > to perform a GET request. > > > > > > I have read http://jetty.mortbay.org/jetty/doc/international.html, > > but it > > > does not offer any advice beyond using POST. > > > > > > I have read http://www.anassina.com/struts/i18n/i18n.html and tried > > the > > > SetCharacterEncoding filter mentioned, calling > > > request.setCharacterEncoding( "UTF-8" ) before any request > > parameters are > > > read. Thereafter, request.getCharacterEncoding() does return > > "UTF-8", but > > > the '?' question marks come through as before. > > > > > > Any help would be greatly appreciated. > > > > > > Thanks > > > Craig > > > |