From: oleg <oo...@ua...> - 2020-09-03 11:52:48
|
Hello! We are having some difficulties when using the ns_http command with sites using 8-bit encoding. The ns_http command does not convert the received data, so we must use the 'encoding convertfrom' command. Sometimes converted strings become corrupted. For example, there is a server with output encoding iso-8859-2: if the server passes 'äöüŁ', then after conversion we get 'äöüŁ' (correct); if the server passes 'ÄÖÜŁ', then after conversion we get 'ÄÖ#' (corrupted). See attached ns_http.test1 for example (test 1.2 fails). Such strings can be found in any 8-bit encoding (to see run attached http_charsets.test with 'pairsTest' constraint enabled). The source for the ns_http command (tclhttp.c) shows that the problem is using the Tcl_NewStringObj on binary input data (8-bit chars). Two solutions come up: 1) Using Tcl_NewByteArrayObj instead of Tcl_NewStringObj; 2) Using Tcl_ExternalToUtf before using Tcl_NewStringObj, i.e. built-in 'encoding convertfrom'. Attached tclhttp.c.binary-externaltoutf patch modifies the ns_http command: 1) the -binary switch is added to the queue/wait/run sub-commands to use of Tcl_NewByteArrayObj on text pages; 2) without -binary the text page will be converted according to the Content-Type header. Note that the second change requires the TCLHTTP_USE_EXTERNALTOUTF to be defined at compile time. The fixed ns_http command can be tested with the attached ns_http.test2 (see 1.2.1 and 1.2.2). More intensive testing of changes can be done with the http_charsets.test (note commented pairsTest constraint). Also I replaced the 'nstest :: http-0.9 -encoding xxx' with 'ns_http run' in existing encoding.test (see encoding_ns_http.test). All data transformations are successfully performed without explicit decoding. Automatic data decoding is convenient to use, but it changes the behavior of ns_http on 8-bit inputs. These changes could break existing code if someone uses ns_http to inter with 8-bit sites (with risk of data corruption). To use the patched version of ns_http, either remove the 'encoding convertfrom' or add the -binary switch. It should be noted that the -binary switch followed by 'encoding convertfrom' will also be useful for 8-bit sites with missing or incorrect Content-Type. Regards, Oleg Oleinick. PS. Attached files: ns_http.test1 - tests for the current version, shows corruption of 8-bit text; ns_http.test2 - tests for the patched version, shows the correct receipt of 8-bit text; tclhttp.c.binary-externaltoutf.patch - patch for changing the ns_http command, adds the -binary switch and text data auto-decoding; http_charsets.test - tests for ns_http, suitable for both the current and the patched version; encoding_ns_http.test - like existing encoding.test, with 'nstest :: http-0.9 -encoding xxx' replaces by new 'ns_http run'; |
From: Zoran V. <zv...@ar...> - 2020-09-04 09:02:57
|
On Thu, 3 Sep 2020 14:52:44 +0300 oleg <oo...@ua...> wrote: Hi! Thanks for looking into this. > The ns_http command does not convert the received data, so we must use > the 'encoding convertfrom' command. I see you added the Ns_GetTypeEncoding(cType) test for non-binary content and then use Tcl_ExternalToUtfDString() to convert. This seems OK for me. What is the purpose of signalling the binary content way up from the command line? For this we have the Ns_IsBinaryMimeType(cType) test. > Sometimes converted strings become > corrupted. For example, there is a server with output encoding > iso-8859-2 This is I guess out of the scope as we rely on Tcl to do encoding conversions. So if this comes bad, then you must post a bug to the Tcl project. Cheers Zoran |
From: oleg <oo...@ua...> - 2020-09-04 12:07:25
|
On Fri, 4 Sep 2020 13:55:05 +0200 Zoran Vasiljevic <zv...@ar...> wrote: > Does the -binary option alone solves your problem? Yes. In fact, I have been using -binary for two years. Oleg. |
From: Zoran V. <zv...@ar...> - 2020-09-04 12:31:25
|
On Fri, 4 Sep 2020 15:07:32 +0300 oleg <oo...@ua...> wrote: > Yes. In fact, I have been using -binary for two years. OK. I see no problem with that. Actually I cannot see any other way of handling such cases, to be honest. If this is still open when I come back from holidays in about two weeks, I will put that in. Cheer's Zoran |
From: Gustaf N. <ne...@wu...> - 2020-09-04 09:39:29
|
Hi Oleg, since HTTP has means to include encodings, which NaviServer uses acting as a server, it should behave the same way when acting as a client and not burdening the application to dig into the content-type charsets to call the right conversion stuff. A "-binary" flag still makes sense in cases there is no content-type given or to let the developer overrule other mechanisms. The usage of the "-binary" flag + convertfrom/to should always be applicable. Having NaviServer versions leading to different results depending on compile flags is not a good idea. To get a more detailed understanding, i have to dig into your examples to understand whether this is indeed a problem on the Tcl side or in NaviServer, ... but for this, i need a certain block of time, which is hard to get for me right now. -g On 03.09.20 13:52, oleg wrote: > Hello! > > We are having some difficulties when using the ns_http command with > sites using 8-bit encoding. > > The ns_http command does not convert the received data, so we must use > the 'encoding convertfrom' command. Sometimes converted strings become > corrupted. For example, there is a server with output encoding > iso-8859-2: > if the server passes 'äöüŁ', then after conversion we get 'äöüŁ' > (correct); > if the server passes 'ÄÖÜŁ', then after conversion we get 'ÄÖ#' > (corrupted). > See attached ns_http.test1 for example (test 1.2 fails). > > Such strings can be found in any 8-bit encoding (to see run attached > http_charsets.test with 'pairsTest' constraint enabled). > The source for the ns_http command (tclhttp.c) shows that the problem is > using the Tcl_NewStringObj on binary input data (8-bit chars). > > Two solutions come up: > 1) Using Tcl_NewByteArrayObj instead of Tcl_NewStringObj; > 2) Using Tcl_ExternalToUtf before using Tcl_NewStringObj, i.e. built-in > 'encoding convertfrom'. > > Attached tclhttp.c.binary-externaltoutf patch modifies the ns_http > command: > 1) the -binary switch is added to the queue/wait/run sub-commands to use > of Tcl_NewByteArrayObj on text pages; > 2) without -binary the text page will be converted according to the > Content-Type header. > > Note that the second change requires the TCLHTTP_USE_EXTERNALTOUTF to > be defined at compile time. > > The fixed ns_http command can be tested with the attached ns_http.test2 > (see 1.2.1 and 1.2.2). More intensive testing of changes can be done > with the http_charsets.test (note commented pairsTest > constraint). > Also I replaced the 'nstest :: http-0.9 -encoding xxx' with 'ns_http > run' in existing encoding.test (see encoding_ns_http.test). All data > transformations are successfully performed without explicit decoding. > > Automatic data decoding is convenient to use, but it changes the > behavior of ns_http on 8-bit inputs. These changes could break existing > code if someone uses ns_http to inter with 8-bit sites (with risk of > data corruption). To use the patched version of ns_http, either remove > the 'encoding convertfrom' or add the -binary switch. > > It should be noted that the -binary switch followed by 'encoding > convertfrom' will also be useful for 8-bit sites with missing or > incorrect Content-Type. > > Regards, > Oleg Oleinick. > > PS. Attached files: > > ns_http.test1 - tests for the current version, shows corruption of > 8-bit text; > > ns_http.test2 - tests for the patched version, shows the correct > receipt of 8-bit text; > > tclhttp.c.binary-externaltoutf.patch - patch for changing the ns_http > command, adds the -binary switch and text data auto-decoding; > > http_charsets.test - tests for ns_http, suitable for both the current > and the patched version; > > encoding_ns_http.test - like existing encoding.test, with 'nstest :: > http-0.9 -encoding xxx' replaces by new 'ns_http run'; > > > _______________________________________________ > naviserver-devel mailing list > nav...@li... > https://lists.sourceforge.net/lists/listinfo/naviserver-devel |