From: Gustaf N. <ne...@wu...> - 2022-05-19 19:19:54
|
Hi David, we have not a global and per-server parameter called "formfallbackcharset", the flag for "ns_getform" and "ns_parsequery" is now called "fallbackcharset". In many cases, using e.g. the per-server parameter should be sufficient to handle incorrect queries... still missing: "multipart/form-data" handling and documentation updates, error code all the best -gn On 18.05.22 22:00, Gustaf Neumann wrote: > > Dear David, > > i've committed the option "-fallbackencodings" for the commands > "ns_getform" and "ns_parsequery". The implementation covers > "ns_getform", where the data is provided as > "application/x-www-form-urlencoded" either when parsing from memory > or from the spool file. The "multipart/form-data" implementation (also > separate for memory and spoolfile) is not yet covered. > > We can also consider a global parameter for the configuration file > (like e.g. FormFallbackEncodings). Probably, we should use the term > "charset" instead of "encoding", since "charset" is the MIME term, > also used for e.g. "URLCharset", while "encoding" is the Tcl name. > > Although the names might still change, you might test whether this > works for your test cases. > > -gn > > On 16.05.22 16:16, David Osborne wrote: >> Hi Gustaf, >> >> I spotted that *ns_getform *takes a charset argument from looking at >> the source code. >> The options for overriding charsets at the moment seem to be: >> >> *ns_getform iso8859-1 >> * >> * >> * >> *ns_urlcharset iso8859-1* >> *ns_getform >> * >> * >> * >> *ns_conn urlencoding iso8859-1 >> * >> *ns_getform * >> >> We experimented with some code which tried to trap errors from >> *ns_getform*, and where the error was due to "invalid UTF-8", try a >> fallback charset. >> All 3 of the above techniques worked OK when the Content-Type header >> leaves the charset /unspecified/. >> >> The main issues we had were: >> >> 1. When a *charset=utf-8* is present in the *Content-Type* header, >> this overrides ([1]) any encoding we pass with using the 3 techniques >> above. >> In those cases we have to manipulate the headers' ns_set to remove or >> change the charset. >> eg. >> *Content-Type: application/x-www-form-urlencoded; charset=utf-8* >> transform to -> >> *Content-Type: application/x-www-form-urlencoded* >> or >> *Content-Type: application/x-www-form-urlencoded; charset=windows-1252* >> >> 2. Trapping the specific "invalid UTF-8" error - this method seems >> fragile - would be nice if there was an *errorCode *we would trap. >> *::try { >> * >> * ns_getform* >> *} on error {msg options} {* >> * if { [string match "*contains invalid UTF-8" $msg] } {* >> * # change Content_type charset (if present)* >> * # try fallback charset* >> * } else {* >> * # rethrow error* >> * }* >> *}* >> >> But I think this presents us with a way forward in cases where client >> apps are not getting the encoding correct. >> >> [1] >> https://bitbucket.org/naviserver/naviserver/annotate/master/nsd/form.c?at=master#form.c-170 >> >> |