[jetty-discuss] [jira] Commented: (JETTY-244) OutputWriter handle multibyte UTF-8 chars wrong

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

    [ http://jira.codehaus.org/browse/JETTY-244?page=3Dcom.atlassian.jira.p=
lugin.system.issuetabpanels:comment-tabpanel#action_88762 ]=20

Filip Jirs=E1k commented on JETTY-244:
------------------------------------

Commenting out the whole if statement was first thing I tried and it doesn'=
t work - it throws ArrayIndexOutOfBounds exception sometime. It is because =
there isn't test for buffer length in case of one byte unicode character

 if ((code & 0xffffff80) =3D=3D 0)=20
     {
      // 1b
      buffer[bytes++]=3D(byte)(code);
}

When while if statements at multibyte cases (//2b and bigger) would be remo=
ved there must be condition for buffer length in one byte case statement

 if ((code & 0xffffff80) =3D=3D 0)=20
     {
      // 1b
     if (bytes >=3D buffer.length) {
        chunk=3Di;
        break;
     }
     buffer[bytes++]=3D(byte)(code);
}

In my patch I choose another solution. In my code chunk is shortcuted only =
in case we are not on buffer end.  With this option there must be one more =
condition within multibyte characters, but there is no need for condition w=
ithin one byte character. I think in HTTP environment there will be much mo=
re onebyte characters, so I think no condition in onbyte case and two condi=
tions in multibyte cases is more efficient.

> OutputWriter handle multibyte UTF-8 chars wrong
> -----------------------------------------------
>
>                 Key: JETTY-244
>                 URL: http://jira.codehaus.org/browse/JETTY-244
>             Project: Jetty
>          Issue Type: Bug
>          Components: HTTP
>            Reporter: Filip Jirs=E1k
>         Assigned To: Greg Wilkins
>             Fix For: 6.1.2rc1
>
>         Attachments: OutputWriter-utf-8.diff, ServletTest1java
>
>
> There is problem in the way how multibyte UTF-8 characters are handled at=
 end of chunk in the method org.mortbay.jetty.AbstractGenerator.OutputWrite=
r.write(char[] s,int offset, int length).
> When multibyte UTF-8 character (for example =E1 - \u00E1) is last charact=
er which can fit into "bytes" buffer, it is printed two times to output. On=
e times at the end of buffer, but than this code
> if (chunk-i>buffer.length-bytes)
>   chunk=3Dbuffer.length-bytes+i;
> cuts the chunk (it is right in the other places - we spend two or more by=
tes form "bytes" buffer, so we must shorten number of chars which can fir t=
eh buffer). But when this cut occurs at the end of "for (int i =3D 0; i < c=
hunk; i++)" cycle, this shortcuting of chunk appears like we didn't write l=
ast char into buffer. So it is written again in next cycle of OutputWrite.w=
rite() call.
> I think condition
> if (chunk-i>buffer.length-bytes)
>   chunk=3Dbuffer.length-bytes+i;
> should be properly
> if (chunk-i>buffer.length-bytes && buffer.length-bytes>0)
>   chunk=3Dbuffer.length-bytes+i;

--=20
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: htt=
p://jira.codehaus.org/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira