From: Gustaf N. <ne...@wu...> - 2020-09-04 09:39:29
|
Hi Oleg, since HTTP has means to include encodings, which NaviServer uses acting as a server, it should behave the same way when acting as a client and not burdening the application to dig into the content-type charsets to call the right conversion stuff. A "-binary" flag still makes sense in cases there is no content-type given or to let the developer overrule other mechanisms. The usage of the "-binary" flag + convertfrom/to should always be applicable. Having NaviServer versions leading to different results depending on compile flags is not a good idea. To get a more detailed understanding, i have to dig into your examples to understand whether this is indeed a problem on the Tcl side or in NaviServer, ... but for this, i need a certain block of time, which is hard to get for me right now. -g On 03.09.20 13:52, oleg wrote: > Hello! > > We are having some difficulties when using the ns_http command with > sites using 8-bit encoding. > > The ns_http command does not convert the received data, so we must use > the 'encoding convertfrom' command. Sometimes converted strings become > corrupted. For example, there is a server with output encoding > iso-8859-2: > if the server passes 'äöüŁ', then after conversion we get 'äöüŁ' > (correct); > if the server passes 'ÄÖÜŁ', then after conversion we get 'ÄÖ#' > (corrupted). > See attached ns_http.test1 for example (test 1.2 fails). > > Such strings can be found in any 8-bit encoding (to see run attached > http_charsets.test with 'pairsTest' constraint enabled). > The source for the ns_http command (tclhttp.c) shows that the problem is > using the Tcl_NewStringObj on binary input data (8-bit chars). > > Two solutions come up: > 1) Using Tcl_NewByteArrayObj instead of Tcl_NewStringObj; > 2) Using Tcl_ExternalToUtf before using Tcl_NewStringObj, i.e. built-in > 'encoding convertfrom'. > > Attached tclhttp.c.binary-externaltoutf patch modifies the ns_http > command: > 1) the -binary switch is added to the queue/wait/run sub-commands to use > of Tcl_NewByteArrayObj on text pages; > 2) without -binary the text page will be converted according to the > Content-Type header. > > Note that the second change requires the TCLHTTP_USE_EXTERNALTOUTF to > be defined at compile time. > > The fixed ns_http command can be tested with the attached ns_http.test2 > (see 1.2.1 and 1.2.2). More intensive testing of changes can be done > with the http_charsets.test (note commented pairsTest > constraint). > Also I replaced the 'nstest :: http-0.9 -encoding xxx' with 'ns_http > run' in existing encoding.test (see encoding_ns_http.test). All data > transformations are successfully performed without explicit decoding. > > Automatic data decoding is convenient to use, but it changes the > behavior of ns_http on 8-bit inputs. These changes could break existing > code if someone uses ns_http to inter with 8-bit sites (with risk of > data corruption). To use the patched version of ns_http, either remove > the 'encoding convertfrom' or add the -binary switch. > > It should be noted that the -binary switch followed by 'encoding > convertfrom' will also be useful for 8-bit sites with missing or > incorrect Content-Type. > > Regards, > Oleg Oleinick. > > PS. Attached files: > > ns_http.test1 - tests for the current version, shows corruption of > 8-bit text; > > ns_http.test2 - tests for the patched version, shows the correct > receipt of 8-bit text; > > tclhttp.c.binary-externaltoutf.patch - patch for changing the ns_http > command, adds the -binary switch and text data auto-decoding; > > http_charsets.test - tests for ns_http, suitable for both the current > and the patched version; > > encoding_ns_http.test - like existing encoding.test, with 'nstest :: > http-0.9 -encoding xxx' replaces by new 'ns_http run'; > > > _______________________________________________ > naviserver-devel mailing list > nav...@li... > https://lists.sourceforge.net/lists/listinfo/naviserver-devel |