From: oleg <oo...@ua...> - 2020-09-03 11:52:48
|
Hello! We are having some difficulties when using the ns_http command with sites using 8-bit encoding. The ns_http command does not convert the received data, so we must use the 'encoding convertfrom' command. Sometimes converted strings become corrupted. For example, there is a server with output encoding iso-8859-2: if the server passes 'äöüŁ', then after conversion we get 'äöüŁ' (correct); if the server passes 'ÄÖÜŁ', then after conversion we get 'ÄÖ#' (corrupted). See attached ns_http.test1 for example (test 1.2 fails). Such strings can be found in any 8-bit encoding (to see run attached http_charsets.test with 'pairsTest' constraint enabled). The source for the ns_http command (tclhttp.c) shows that the problem is using the Tcl_NewStringObj on binary input data (8-bit chars). Two solutions come up: 1) Using Tcl_NewByteArrayObj instead of Tcl_NewStringObj; 2) Using Tcl_ExternalToUtf before using Tcl_NewStringObj, i.e. built-in 'encoding convertfrom'. Attached tclhttp.c.binary-externaltoutf patch modifies the ns_http command: 1) the -binary switch is added to the queue/wait/run sub-commands to use of Tcl_NewByteArrayObj on text pages; 2) without -binary the text page will be converted according to the Content-Type header. Note that the second change requires the TCLHTTP_USE_EXTERNALTOUTF to be defined at compile time. The fixed ns_http command can be tested with the attached ns_http.test2 (see 1.2.1 and 1.2.2). More intensive testing of changes can be done with the http_charsets.test (note commented pairsTest constraint). Also I replaced the 'nstest :: http-0.9 -encoding xxx' with 'ns_http run' in existing encoding.test (see encoding_ns_http.test). All data transformations are successfully performed without explicit decoding. Automatic data decoding is convenient to use, but it changes the behavior of ns_http on 8-bit inputs. These changes could break existing code if someone uses ns_http to inter with 8-bit sites (with risk of data corruption). To use the patched version of ns_http, either remove the 'encoding convertfrom' or add the -binary switch. It should be noted that the -binary switch followed by 'encoding convertfrom' will also be useful for 8-bit sites with missing or incorrect Content-Type. Regards, Oleg Oleinick. PS. Attached files: ns_http.test1 - tests for the current version, shows corruption of 8-bit text; ns_http.test2 - tests for the patched version, shows the correct receipt of 8-bit text; tclhttp.c.binary-externaltoutf.patch - patch for changing the ns_http command, adds the -binary switch and text data auto-decoding; http_charsets.test - tests for ns_http, suitable for both the current and the patched version; encoding_ns_http.test - like existing encoding.test, with 'nstest :: http-0.9 -encoding xxx' replaces by new 'ns_http run'; |