|
From: Lukáš M. <lma...@gm...> - 2008-03-20 08:37:05
|
Hello, we've been using and testing Wayback for several years in WebArchiv.cz and we're familiar with the fact, that so far IA's done a lot of effort in i18n especially in last releases. In particular, we appreciate support for language properties and configuration of individual jsp pages, nevertheless we're still facing issues with utf-8 encoding. I'd like to ask for experiences from others (non-ascii countries) how they solved this issue. In general, with a new release, we have to always make following changes (with assumption that we usually store our language properties in utf-8): 1. Convert all jsp into utf-8 2. Add meta tag "<meta equiv="Content-Type" content="text/html; charset=UTF-8">" to JSP in order to browser can recognize right encoding 3. Add directive <%@ page language="java" pageEncoding="utf-8" contentType="text/html;charset=utf-8"%>' to each JSP to say that server should send response in UTF-8 4. if we also want to send a unicode text from form to server we have to implement a filter that sets encoding to request (req.setCharacterEncoding(encoding);) With respect to this changes, we're able to customize each release, however it might help to other non-english speaking countries to incorporate this into wayback. Or is there any other intent how to treat this issue? Thanks in advance for reply. Best Regards -- Lukas Matejka WebArchiv.cz CZ National Library |
|
From: Brad T. <br...@ar...> - 2008-04-01 02:24:38
|
Hi Lukas, Thanks for the detailed report of the problem. I'm not sure I understand all the issues you've mentioned: #2 and #3, are straightforward and should have been obvious before now, thanks! To make sure I've understood and corrected the issue, I changed the META tag's declared encoding to "utf-8" and added the <%@ page ... %> directive as you suggested to .../templates/UI-header.jsp, which is referenced by all the other html generating jsp files. I'm not sure I understand what you mean with your first point, though.. I just checked and didn't see any non-ASCII characters in any of the .jsp files included with Wayback, so I'm not sure how they would be converted to UTF-8. A colleague just suggested that Windows may include a special 2 byte header to all text files indicating the encoding. Is this the change you're talking about? Am I missing something else obvious? I'm also not sure what you're saying with #4. I understand that there can sometimes be complications with user submitted data arriving at the server using the wrong encoding, and thus losing information, but in this case, wouldn't the best option be to assume the encoding declared in the users HTTP request was correct? Said another way, wouldn't it be too late to alter the encoding after the request has been received by the server? Can you illustrate this problem further, or point me at some online docs describing the problem and solution? Thanks again! Brad Lukáš Matějka wrote: > Hello, > > we've been using and testing Wayback for several years in WebArchiv.cz > and we're familiar with the fact, that so far IA's done a lot of effort > in i18n especially in last releases. In particular, we appreciate > support for language properties and configuration of individual jsp > pages, nevertheless we're still facing issues with utf-8 encoding. I'd > like to ask for experiences from others (non-ascii countries) how they > solved this issue. > > In general, with a new release, we have to always make following changes > (with assumption that we usually store our language properties in utf-8): > > 1. Convert all jsp into utf-8 > 2. Add meta tag "<meta equiv="Content-Type" content="text/html; > charset=UTF-8">" to JSP in order to browser can recognize right encoding > 3. Add directive <%@ page language="java" pageEncoding="utf-8" > contentType="text/html;charset=utf-8"%>' to each JSP to say that server > should send response in UTF-8 > 4. if we also want to send a unicode text from form to server we have to > implement a filter that sets encoding to request > (req.setCharacterEncoding(encoding);) > > With respect to this changes, we're able to customize each release, > however it might help to other non-english speaking countries to > incorporate this into wayback. > Or is there any other intent how to treat this issue? > > Thanks in advance for reply. > > Best Regards > -- > Lukas Matejka > WebArchiv.cz > CZ National Library > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Archive-access-discuss mailing list > Arc...@li... > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss > |
|
From: Lukáš M. <lma...@gm...> - 2008-04-01 09:44:39
|
Hi Brad, please see my comments below. > Hi Lukas, > > Thanks for the detailed report of the problem. I'm not sure I > understand all the issues you've mentioned: > > #2 and #3, are straightforward and should have been obvious before > now, thanks! To make sure I've understood and corrected the issue, I > changed the META tag's declared encoding to "utf-8" and added the <%@ > page ... %> directive as you suggested to .../templates/UI-header.jsp, > which is referenced by all the other html generating jsp files. > Due to my experience directive <%@ page ... %> must be added to all JSPs that are includedb , this directive is probably saying writer to use proper output encoding. META tag is necessary only in header file ../templates/UI-header.jsp. > I'm not sure I understand what you mean with your first point, > though.. I just checked and didn't see any non-ASCII characters in any > of the .jsp files included with Wayback, so I'm not sure how they > would be converted to UTF-8. A colleague just suggested that Windows > may include a special 2 byte header to all text files indicating the > encoding. Is this the change you're talking about? Am I missing > something else obvious? > Yes, you're absolutely right your JSPs are only ASCII, this was ment only for case that somebody wants to use another JSP as a customization in own language. > I'm also not sure what you're saying with #4. I understand that there > can sometimes be complications with user submitted data arriving at > the server using the wrong encoding, and thus losing information, but > in this case, wouldn't the best option be to assume the encoding > declared in the users HTTP request was correct? Said another way, > wouldn't it be too late to alter the encoding after the request has > been received by the server? > Let's assume that I want to send non-ASCII data in a form from client to the server (this is rare situtation for us, because most of URLs are in ASCII. On the other hand if we use full text searching than this would be useful). We have to say to JVM's reader to ensure processing input in UTF-8 and make sure all requests pass through the filter. Frankly speaking I'm not sure why reader don't use utf8 from HTTP request, but I think that there are some browsers like old IE explorer that don't send request encoding at all. > Can you illustrate this problem further, or point me at some online > docs describing the problem and solution? > I found some.. http://cagan327.blogspot.com/2006/05/utf-8-encoding-fix-tomcat-jsp-etc.html Regards Lukas Matejka > Thanks again! > > Brad > > > Lukáš Matějka wrote: >> Hello, >> >> we've been using and testing Wayback for several years in >> WebArchiv.cz and we're familiar with the fact, that so far IA's done >> a lot of effort in i18n especially in last releases. In particular, >> we appreciate support for language properties and configuration of >> individual jsp pages, nevertheless we're still facing issues with >> utf-8 encoding. I'd like to ask for experiences from others >> (non-ascii countries) how they solved this issue. >> >> In general, with a new release, we have to always make following >> changes (with assumption that we usually store our language >> properties in utf-8): >> >> 1. Convert all jsp into utf-8 >> 2. Add meta tag "<meta equiv="Content-Type" content="text/html; >> charset=UTF-8">" to JSP in order to browser can recognize right encoding >> 3. Add directive <%@ page language="java" pageEncoding="utf-8" >> contentType="text/html;charset=utf-8"%>' to each JSP to say that >> server should send response in UTF-8 >> 4. if we also want to send a unicode text from form to server we have >> to implement a filter that sets encoding to request >> (req.setCharacterEncoding(encoding);) >> >> With respect to this changes, we're able to customize each release, >> however it might help to other non-english speaking countries to >> incorporate this into wayback. >> Or is there any other intent how to treat this issue? >> >> Thanks in advance for reply. >> >> Best Regards >> -- >> Lukas Matejka >> WebArchiv.cz >> CZ National Library >> >> ------------------------------------------------------------------------- >> >> This SF.net email is sponsored by: Microsoft >> Defy all challenges. Microsoft(R) Visual Studio 2008. >> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >> _______________________________________________ >> Archive-access-discuss mailing list >> Arc...@li... >> https://lists.sourceforge.net/lists/listinfo/archive-access-discuss >> > > |