From: <ji...@co...> - 2007-02-28 12:20:50
|
[ http://jira.codehaus.org/browse/JETTY-244?page=3Dcom.atlassian.jira.p= lugin.system.issuetabpanels:comment-tabpanel#action_88762 ]=20 Filip Jirs=E1k commented on JETTY-244: ------------------------------------ Commenting out the whole if statement was first thing I tried and it doesn'= t work - it throws ArrayIndexOutOfBounds exception sometime. It is because = there isn't test for buffer length in case of one byte unicode character if ((code & 0xffffff80) =3D=3D 0)=20 { // 1b buffer[bytes++]=3D(byte)(code); } When while if statements at multibyte cases (//2b and bigger) would be remo= ved there must be condition for buffer length in one byte case statement if ((code & 0xffffff80) =3D=3D 0)=20 { // 1b if (bytes >=3D buffer.length) { chunk=3Di; break; } buffer[bytes++]=3D(byte)(code); } In my patch I choose another solution. In my code chunk is shortcuted only = in case we are not on buffer end. With this option there must be one more = condition within multibyte characters, but there is no need for condition w= ithin one byte character. I think in HTTP environment there will be much mo= re onebyte characters, so I think no condition in onbyte case and two condi= tions in multibyte cases is more efficient. > OutputWriter handle multibyte UTF-8 chars wrong > ----------------------------------------------- > > Key: JETTY-244 > URL: http://jira.codehaus.org/browse/JETTY-244 > Project: Jetty > Issue Type: Bug > Components: HTTP > Reporter: Filip Jirs=E1k > Assigned To: Greg Wilkins > Fix For: 6.1.2rc1 > > Attachments: OutputWriter-utf-8.diff, ServletTest1java > > > There is problem in the way how multibyte UTF-8 characters are handled at= end of chunk in the method org.mortbay.jetty.AbstractGenerator.OutputWrite= r.write(char[] s,int offset, int length). > When multibyte UTF-8 character (for example =E1 - \u00E1) is last charact= er which can fit into "bytes" buffer, it is printed two times to output. On= e times at the end of buffer, but than this code > if (chunk-i>buffer.length-bytes) > chunk=3Dbuffer.length-bytes+i; > cuts the chunk (it is right in the other places - we spend two or more by= tes form "bytes" buffer, so we must shorten number of chars which can fir t= eh buffer). But when this cut occurs at the end of "for (int i =3D 0; i < c= hunk; i++)" cycle, this shortcuting of chunk appears like we didn't write l= ast char into buffer. So it is written again in next cycle of OutputWrite.w= rite() call. > I think condition > if (chunk-i>buffer.length-bytes) > chunk=3Dbuffer.length-bytes+i; > should be properly > if (chunk-i>buffer.length-bytes && buffer.length-bytes>0) > chunk=3Dbuffer.length-bytes+i; --=20 This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: htt= p://jira.codehaus.org/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira |