You can subscribe to this list here.
2005 |
Jan
|
Feb
(53) |
Mar
(62) |
Apr
(88) |
May
(55) |
Jun
(204) |
Jul
(52) |
Aug
|
Sep
(1) |
Oct
(94) |
Nov
(15) |
Dec
(68) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2006 |
Jan
(130) |
Feb
(105) |
Mar
(34) |
Apr
(61) |
May
(41) |
Jun
(92) |
Jul
(176) |
Aug
(102) |
Sep
(247) |
Oct
(69) |
Nov
(32) |
Dec
(140) |
2007 |
Jan
(58) |
Feb
(51) |
Mar
(11) |
Apr
(20) |
May
(34) |
Jun
(37) |
Jul
(18) |
Aug
(60) |
Sep
(41) |
Oct
(105) |
Nov
(19) |
Dec
(14) |
2008 |
Jan
(3) |
Feb
|
Mar
(7) |
Apr
(5) |
May
(123) |
Jun
(5) |
Jul
(1) |
Aug
(29) |
Sep
(15) |
Oct
(21) |
Nov
(51) |
Dec
(3) |
2009 |
Jan
|
Feb
(36) |
Mar
(29) |
Apr
|
May
|
Jun
(7) |
Jul
(4) |
Aug
|
Sep
(4) |
Oct
|
Nov
(13) |
Dec
|
2010 |
Jan
|
Feb
|
Mar
(9) |
Apr
(11) |
May
(16) |
Jun
|
Jul
|
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
2011 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2012 |
Jan
(7) |
Feb
(3) |
Mar
|
Apr
|
May
|
Jun
(3) |
Jul
|
Aug
|
Sep
|
Oct
(92) |
Nov
(28) |
Dec
(16) |
2013 |
Jan
(9) |
Feb
(2) |
Mar
|
Apr
(4) |
May
(4) |
Jun
(6) |
Jul
(14) |
Aug
(12) |
Sep
(4) |
Oct
(13) |
Nov
(1) |
Dec
(6) |
2014 |
Jan
(23) |
Feb
(19) |
Mar
(10) |
Apr
(14) |
May
(11) |
Jun
(6) |
Jul
(11) |
Aug
(15) |
Sep
(41) |
Oct
(95) |
Nov
(23) |
Dec
(11) |
2015 |
Jan
(3) |
Feb
(9) |
Mar
(19) |
Apr
(3) |
May
(1) |
Jun
(3) |
Jul
(11) |
Aug
(1) |
Sep
(15) |
Oct
(5) |
Nov
(2) |
Dec
|
2016 |
Jan
(7) |
Feb
(11) |
Mar
(8) |
Apr
(1) |
May
(3) |
Jun
(17) |
Jul
(12) |
Aug
(3) |
Sep
(5) |
Oct
(19) |
Nov
(12) |
Dec
(6) |
2017 |
Jan
(30) |
Feb
(23) |
Mar
(12) |
Apr
(32) |
May
(27) |
Jun
(7) |
Jul
(13) |
Aug
(16) |
Sep
(6) |
Oct
(11) |
Nov
|
Dec
(12) |
2018 |
Jan
(1) |
Feb
(5) |
Mar
(6) |
Apr
(7) |
May
(23) |
Jun
(3) |
Jul
(2) |
Aug
(1) |
Sep
(6) |
Oct
(6) |
Nov
(10) |
Dec
(3) |
2019 |
Jan
(26) |
Feb
(15) |
Mar
(9) |
Apr
|
May
(8) |
Jun
(14) |
Jul
(10) |
Aug
(10) |
Sep
(4) |
Oct
(2) |
Nov
(20) |
Dec
(10) |
2020 |
Jan
(10) |
Feb
(14) |
Mar
(29) |
Apr
(11) |
May
(25) |
Jun
(21) |
Jul
(23) |
Aug
(12) |
Sep
(19) |
Oct
(6) |
Nov
(8) |
Dec
(12) |
2021 |
Jan
(29) |
Feb
(9) |
Mar
(8) |
Apr
(8) |
May
(2) |
Jun
(2) |
Jul
(9) |
Aug
(9) |
Sep
(3) |
Oct
(4) |
Nov
(12) |
Dec
(13) |
2022 |
Jan
(4) |
Feb
|
Mar
(4) |
Apr
(12) |
May
(15) |
Jun
(7) |
Jul
(10) |
Aug
(2) |
Sep
|
Oct
(1) |
Nov
(8) |
Dec
|
2023 |
Jan
(15) |
Feb
|
Mar
(23) |
Apr
(1) |
May
(2) |
Jun
(10) |
Jul
|
Aug
(22) |
Sep
(19) |
Oct
(2) |
Nov
(20) |
Dec
|
2024 |
Jan
(1) |
Feb
|
Mar
(16) |
Apr
(15) |
May
(6) |
Jun
(4) |
Jul
(1) |
Aug
(1) |
Sep
|
Oct
(13) |
Nov
(18) |
Dec
(6) |
2025 |
Jan
(12) |
Feb
|
Mar
(2) |
Apr
(1) |
May
(11) |
Jun
(5) |
Jul
(4) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Gustaf N. <ne...@wu...> - 2022-07-06 12:08:06
|
Dear all, The presentations are now on the conference website, including video and slides (on the program page). There are also photo impressions from the conference (on the starting page). https://openacs.org/conf2022/info/ all the best -g |
From: Gustaf N. <ne...@wu...> - 2022-06-29 22:27:04
|
Dear all The program of the EuroTcl and OpenACS conference starting tomorrow is here: https://openacs.org/conf2022/info/schedule Live-streams: Talks on June 30: https://learn.wu.ac.at/eurotcl2022/calendar/cal-item-view?cal_item_id=613703025 Talks on July 1: https://learn.wu.ac.at/eurotcl2022/calendar/cal-item-view?cal_item_id=613704587 Note, that the times are Vienna local time (CEST) all the best -g -- Univ.Prof. Dr. Gustaf Neumann Head of the Institute of Information Systems and New Media of Vienna University of Economics and Business Program Director of MSc "Information Systems" |
From: Gustaf N. <ne...@wu...> - 2022-06-14 12:35:16
|
Dear all, The release of NaviServer 4.99.24 [1] is available at sourceforge. See below for the list of changes. The code was tested with Ubuntu 20.04, Rocky Linux 8.5, FreeBSD 13.1, macOS 11.6.6 (Intel). Many thanks to the contributors of this release: Antonio Pisano David Osborne Gustaf Neumann Zoran Vasiljevic All the best! -gustaf neumann [1] https://sourceforge.net/projects/naviserver/files/naviserver/4.99.24/ ======================================= NaviServer 4.99.24, released 2022-06-14 ======================================= 77 files changed, 3242 insertions(+), 1100 deletions(-) New Features: - Improved security * Added protection against certain attacks in ns_dbquotevalue Due to the corrected conversion to external UTF-8 in db-output, new potential attack vectors appeared that were protected earlier via the Tcl-internal 'modified UTF-8'. E.g., the binary null character is stored as an overlong (two-byte) encoding of null (0xc0 0x80), so that an actual (embedded) null byte (0x00) never appears in the string. Due to the conversion, the internal representation is translated back to the binary null character. Embedded null byte characters can lead to non-terminated string literals via ns_dbquotevalue. In the updated version of NaviServer, ns_dbquotevalue raises an exception when this occurs. Therefore, the function can be used as well as an input checker (together with "try"). * Raise an exception when trying to use "ns_urldecode" to produce invalid UTF-8 Background: several (external) functions expect valid UTF-8 to be passed in and crash if this is not the case. One such example is tDOM. These nasty byte sequences are used more intensively by vulnerability scanners. Therefore, ns_urldecode raises now an exception, when it tries to convert to invalid UTF-8. It is still possible to use ns_urldecode to convert to other charsets. ns_urldecode -charset iso8859-1 -part path "/mot%C3or" When urldecode() is called internally and would produce invalid UTF-8, it truncates the string (and writes a warning to the system log). Note that the enw fallback charset - Fallback charsets In case, a conversion to UTF-8 fails due to invalid byte sequences, one can now provide a fallback charset for a second attempt of decoding this data. This feature is useful for websites that have to deal with requests containing invalid (form) data, typically from legacy applications. The fallback charset can be provided as optional parameter "-fallbackcharset" for the command "ns_getform", "ns_parsequery" and "ns_urldecode": ns_getform ?-fallbackcharset fallbackcharset? ?charset? ns_parsequery ?-charset charset? ?-fallbackcharset fallbackcharset? ?--? querystring ns_urldecode ?-charset charset? ?-fallbackcharset fallbackcharset? ?-part part? ?--? string In case, the parameter is not specified, it can be also be provided to the form-processing commands "ns_getform" and "ns_parsequery" via configuration variables: * per-server configuration parameter "formFallbackCharset" (in the section "ns/server/$server"), or as * global server configuration parameter "formFallbackCharset" (in the section "ns/parameters"). The highest precedence has the optional parameter, followed by the per-server configuration parameter and the global configuration parameter. - Provide a hint when cache-entry is too large for caching Background: the size of the entry is typically determined after the execution of a potentially expensive query. During the eval of the command, the cache entry is locked and forces a serialization. However, this means that in these cases the situation is worse than without a cache, where some queries can be executed in parallel. We faced the situation of an expected slowdown of the server with many "create entry collision", where due to application matters, an entry was becoming too large. This situation is not easy to debug, especially in stress situations. The log message would have helped a log to identify the cause. - Added support for multibyte numeric entities This change supports conversion of numeric entities representing multibyte characters into HTML in "ns_striphtml" and "ns_unquotehtml". Technically, the numeric entities represent Unicode code points, which are transformed into UTF-8 serialization. Every entity represents a single code point; The values can be provided in decimal or hexadecimal notation. Before this change, only single byte numeric entities were supported. ASCII control characters (decimal 0-31) are ignored as before. - New and extended commands: * ns_unquotehtml /text/ This command is the inverse operation of "ns_quotehtml". It replaces the named and numeric entities in the provided string with the native values. The command is similar to "ns_striphtml", but "ns_striphtml" removes as well other HTML markup which might not be desired in all cases. This change fixes as well a bug with numeric entities (the old code assumed, these are starting directly with a number after the ampersand) and it adds support for numeric entities with hexadecimal values (so far with the same value range as for decimal numeric entities). * ns_subnetmatch /subnet/ /ipaddr/ Determine, if a provided IP address (IPv4 or IPv6) is included in a subnet specification, which is provided in CIDR notation. The command makes internal NaviServer functionality available at the Tcl level. The regression test was extended to cover the functionality. The command ns_subnetmatch validates the provided subnet specification (IPv4 or IPv6 address followed by slash and number of significant bits) and the provided IP address and tests whether the IP address is in the implied range. The command returns a boolean value as the result. When comparing an IPv4 and IPv6 address/CIDR specification or vice versa, the result is always false. The function can be use when e.g. restricting access to certain functionality to some subnets. The function can be used as well to check, whether an IP address is an IPv4 or IPv6 address. Examples: % ns_subnetmatch 137.208.0.0/16 137.208.116.31 1 % ns_subnetmatch 137.208.0.0/16 112.207.16.33 0 % ns_subnetmatch 2001:628:404:74::31/64 [ns_conn peeraddr] ... # Is IP address a valid IPv6 address? % ns_subnetmatch ::/0 $ip # Is IP address a valid IPv4 address? % ns_subnetmatch 0.0.0.0/0 $ip * ns_connchan: Added new subcommand "ns_connchan connect" "ns_connchan connect" is similar to "ns_connchan open", except that it does not send an HTTP request (HTTP method, URL, and header fields) but just opens the connection. It can be used for some non-HTTP communication over TCP and TLS over the ns_connchan infrastructure. * ns_parseheader, Ns_ParseHeader(): return the field number (index) of the parsed entry Previously, there was no explicit feedback, what field of an "ns_set" has been parsed by "ns_parseheader". Now, in success cases, the function returns the index of the new/modified entry. This function made it possible to generalize and simplify the Tcl-level parsing of "multipart/form-data" significantly. Additionally, a new optional argument "-prefix" was added. When specified, it adds the specified prefix to the key. * ns_setcookie, ns_deletecookie Mozilla and Chrome changed the default value for SameSite of cookies from "none" to "lax" in February and Aug 2020. Cookies that explicitly set SameSite=None must also set the "Secure" attribute. In order to mirror this change of policy in NaviServer and to reduce necessary code changes, the default behavior for setting or deleting cookie is now samesite "lax" (when "-samesite" is not explicitly specified). When trying to set a cookie with "-samesite none" without the "-secure" flag, a warning is generated, and the "-samesite lax" is assumed, since major browsers announced that these will reject these cookies soon. API changes: - ns_getform, ns_parsequery, and ns_urldecode New optional parameter "-fallbackcharset". See above for details. - ns_parsequery: added option "-charset" and raise exception on failure The new option "-charset" can be used to add a charset for the result encoding of the passed-in HTTP query. In case the charset is UTF8 (default on most platforms), and the content is invalid UTF-8, an exception is raised (similar to ns_urldecode). This can be addressed by parameter "-fallbackcharset" (See above for details). - ns_deletecookie: added support "-samesite" flag for ns_deletecookie. Since "ns_deletecookie" sets internally a cookie, some browsers might ignore in the future certain cookie requests (e.g. when "-samesite" is not used or set to "none" on non-secure connections). - ns_trim enhancements: The new option "-prefix ..." can be used to strip a string (such as ">> ") from every line starting with it. - Potential incompatibilities * "ns_urldecode" and "ns_getform" will raise an exception when invalid UTF-8 data is tried to be interpreted as UTF-8 and no fallback charset is provided. Invalid UTF-8 data causes trouble with external components such as Tdom or databases and opens vulnerability vectors. Performance Improvements: - Improve "cachingmode none" "ns_cache_eval" works as follows: 1) create a temporary cache entry for the key 2) lock the cache-key (to avoid multiple parallel executions) 3) execute the query 4) store the result for the entry on success 5) unlock the cache-key Previously, "cachingmode none" was simply avoiding to store the cached values (step 4), but was serializing calls for a cache key as in default caching modes. This was leading easily to cache entry collisions. Now, "cachingmode none" is avoiding all steps 1..5 (therefore no serialization and no cache collisions). See also:https://openacs.org/forums/message-view?message_id=5665480 Bug Fixes: - Improved robustness of "ns_parseurl" for handling query parameters and fragments for partial URLs * fix over-eager collecting of URL components in tail * extended regression test - Fixed Ns_ResetFileVec NOT to invalidate residual Ns_FileVec buffer.q (caused problems under Windows). - ns_striphtml: Fixed probably very old bug for markup immediately after an entity This bug fix handles cases, where e.g. two entities are in a text right next to each other, like e.g. in the string "hello<>world". The old code was correctly decoding the first entity, but output the second one literally. - Fixed compilation for C++, which was introduced in 4.99.23 to avoid usage of reserved C identifiers Many thanks to Brendan Graves for reporting the problem. - Added missing named entities "apos" and "quote". These have been missing since ages. - Provide an error message when the configured locale is not installed on the host. This change causes NaviServer to abort, when the configured locale is not installed on the host. Typically, this locale is e.g. used by "ns_strcoll" for determining the default collating order. The configuration file for the regression testing sets the environment variable LANG to "en_US.UTF-8". This means that for running the stock regression test, this locale must be installed on the system. Before this change, NaviServer could crash at runtime when trying to access the default locale (as e.g. in "ns_strcoll") - Added support for "_charset_" field for default charset in multipart/form-data (RFC 7578, section 4.6) RFC 7578 (July 2015) defines an optional "_charset_" entry in the form (typically provided as hidden form field) to specify the charset of text entries. This is now supported as well by NaviServer. This is apparently a seldom used feature. Documentation improvements: --------------------------- - Improved the following man pages: doc/src/manual/admin-install.man doc/src/naviserver/ns_conn.man doc/src/naviserver/ns_connchan.man doc/src/naviserver/ns_cookie.man doc/src/naviserver/ns_crypto.man doc/src/naviserver/ns_getform.man doc/src/naviserver/ns_http.man doc/src/naviserver/ns_httptime.man doc/src/naviserver/ns_log.man doc/src/naviserver/ns_parseheader.man doc/src/naviserver/ns_parsequery.man doc/src/naviserver/ns_parseurl.man doc/src/naviserver/ns_rlimit.man doc/src/naviserver/ns_urldecode.man doc/src/naviserver/ns_urlencode.man doc/src/naviserver/ns_valid_utf8.man doc/src/naviserver/textutil-cmds.man nsdb/doc/mann/ns_db.man Configuration Changes: ---------------------- - Updated OpenACS sample configuration file * reflect recent Oracle (tested with Oracle 19c) * added documentation for "StaticCSP", "CookieNamespace", "NsShutdownWithNonZeroExitCode", "LogIncludeUserId" Code Changes: ------------- - Set Tcl error code "NS_INVALID_UTF8" for errors due to invalid UTF-8 - Changed Tcl error code "NSCACHE" to "NS_CACHE". Now all NaviServer-specific error codes start with the prefix "NS_". - Extended regression test - Improve Tcl version compatibility * Removed -DTCL_NO_DEPRECATED from default CFLAGS to cope with recent deprecation in Tcl 8.7a5 - Code Cleanup . Do not declare reserved C identifiers . Improved type cleanness . Refactored file-based multipart form parser to make logic explicit (Tcl code) - Improved comments, fixed typos - Marked "ns_set_precision" as deprecated, since there is no reason why not setting the Tcl variable ::tcl_precision directly. - Don't hard-wire port for HTTPS testing to 8443 The setup code looks now for a free port for HTTPS connections starting with 8443, and remembers the free port in the configuration value "tls_listenport" and "tls_listenurl". This is now fully analogous to the setup of the plain HTTP testing (setting "listenport" and "listenurl") - Silence warning with recent versions of gcc when certain values of _FORTIFY_SOURCE/-Wstringop-overflow are set. Changes in NaviServer Modules: ============================== 22 files changed, 8447 insertions(+), 1429 deletions(-) - general: Updated obsolete ChangeLog files and replaced these by automatically generated ones. nsdbpg: ------- - Raise exception when a value for a bind variable contains a NUL character. This value is explicitly forbidden in text strings passed to PostgreSQL. - Let "ns_pg" report available subcommands even when handle is not specified. This makes the command compatible with the "icanuse" feature in OpenACS. nsoracle: --------- - Added support for output columns of type SQLT_TIMESTAMP or SQLT_TIMESTAMP_TZ This change fixes a bug, where SQL queries of the form SELECT TO_TIMESTAMP(sysdate) FROM dual lead to errors for the form: Database operation "getrow" failed (exception 1406, "nsoracle.c:3659:Ns_OracleGetRow: error in `OCIStmtFetch ()': ORA-01406: fetched column value was truncated The driver needs for several output types special rules, where the timestamp cases were not supported so far. It is also recommended to set the according environment variables specifying the output format in the configuration server of NaviServer, such as e.g. set ::env(NLS_TIMESTAMP_FORMAT) "YYYY-MM-DD HH24:MI:SS.FF6" set ::env(NLS_TIMESTAMP_TZ_FORMAT) "YYYY-MM-DD HH24:MI:SS.FF6 TZH:TZM" For testing in you local Oracle installation, you might test the output formats (and the required sizes with the following snippet for sqlplus: COLUMN localtimestamp format a40 COLUMN systimestamp format a40 COLUMN ts_bytes format a80 alter session set nls_timestamp_format='YYYY-MM-DD HH24:MI:SS.FF6'; select localtimestamp, length(localtimestamp), dump(localtimestamp) ts_bytes from dual; alter session set nls_timestamp_tz_format='YYYY-MM-DD HH24:MI:SS.FF6 TZH:TZM'; select systimestamp, length(systimestamp), dump(systimestamp) ts_bytes from dual; alter session set nls_timestamp_tz_format='YYYY-MM-DD HH24:MI:SS.FF6 TZR'; select systimestamp, length(systimestamp), dump(systimestamp) ts_bytes from dual; - Code cleanup, ensure silent compilation with standard compiler settings - Improved spelling ECDSA letsencrypt: ------------ - Added option to produce certificates with ECDSA Prior to this change, all certificates were using RSA keys. Since a while, keys based on elliptic curves became the preference of letsencrypt. - Improved spelling |
From: David O. <da...@qc...> - 2022-06-14 10:17:55
|
Thanks again Gustaf. We've been happily running rc2 for a few days now without problem. On Tue, 7 Jun 2022 at 11:13, Gustaf Neumann <ne...@wu...> wrote: > Dear all, > > i've uploaded RC2 to sourceforge. > > In short, the changes relative to RC1 are primarily the fallback charsets > (required to ease the situation for sites that have to deal with invalid > UTF-8) > and a change in cookie management to improve future-proofness > (some browsers announced that these will ignore same-site=none > without the secure flag, requiring secure connections). > > Please test if possible. The release should be in the near future. > > all the best > -gn > > https://sourceforge.net/projects/naviserver/files/naviserver/4.99.24/ > > > |
From: Gustaf N. <ne...@wu...> - 2022-06-09 17:48:37
|
The second sentence should read .... We need some time .... Sorry for the typo, age weakens eyes... all the best -gn On 09.06.22 19:21, Gustaf Neumann wrote: > Dear NaviServer Community, > > The registration for the joint OpenACS and EuroTcl conference ends in > one week. No need some time ahead of the event to reserve sufficient > capacities for the social events. > > all the best > > -gn > > https://openacs.org/conf2022/info/ -- Univ.Prof. Dr. Gustaf Neumann Head of the Institute of Information Systems and New Media of Vienna University of Economics and Business Program Director of MSc "Information Systems" |
From: Gustaf N. <ne...@wu...> - 2022-06-09 17:21:49
|
Dear NaviServer Community, The registration for the joint OpenACS and EuroTcl conference ends in one week. No need some time ahead of the event to reserve sufficient capacities for the social events. all the best -gn https://openacs.org/conf2022/info/ |
From: Gustaf N. <ne...@wu...> - 2022-06-07 10:12:48
|
Dear all, i've uploaded RC2 to sourceforge. In short, the changes relative to RC1 are primarily the fallback charsets (required to ease the situation for sites that have to deal with invalid UTF-8) and a change in cookie management to improve future-proofness (some browsers announced that these will ignore same-site=none without the secure flag, requiring secure connections). Please test if possible. The release should be in the near future. all the best -gn https://sourceforge.net/projects/naviserver/files/naviserver/4.99.24/ ======================================= NaviServer 4.99.24, released 2022-06-XX ======================================= 75 files changed, 3234 insertions(+), 1096 deletions(-) New Features: - Improved security * Added protection against certain attacks in ns_dbquotevalue Due to the corrected conversion to external UTF-8 in db-output, new potential attack vectors appeared that were protected earlier via the Tcl-internal 'modified UTF-8'. E.g., the binary null character is stored as an overlong (two-byte) encoding of null (0xc0 0x80), so that an actual (embedded) null byte (0x00) never appears in the string. Due to the conversion, the internal representation is translated back to the binary null character. Embedded null byte characters can lead to non-terminated string literals via ns_dbquotevalue. In the updated version of NaviServer, ns_dbquotevalue raises an exception when this occurs. Therefore, the function can be used as well as an input checker (together with "try"). * Raise an exception when trying to use "ns_urldecode" to produce invalid UTF-8 Background: several (external) functions expect valid UTF-8 to be passed in and crash if this is not the case. One such example is tDOM. These nasty byte sequences are used more intensively by vulnerability scanners. Therefore, ns_urldecode raises now an exception, when it tries to convert to invalid UTF-8. It is still possible to use ns_urldecode to convert to other charsets. ns_urldecode -charset iso8859-1 -part path "/mot%C3or" When urldecode() is called internally and would produce invalid UTF-8, it truncates the string (and writes a warning to the system log). Note that the enw fallback charset - Fallback charsets In case, a conversion to UTF-8 fails due to invalid byte sequences, one can now provide a fallback charset for a second attempt of decoding this data. This feature is useful for websites that have to deal with requests containing invalid (form) data, typically from legacy applications. The fallback charset can be provided as optional parameter "-fallbackcharset" for the command "ns_getform", "ns_parsequery" and "ns_urldecode": ns_getform ?-fallbackcharset fallbackcharset? ?charset? ns_parsequery ?-charset charset? ?-fallbackcharset fallbackcharset? ?--? querystring ns_urldecode ?-charset charset? ?-fallbackcharset fallbackcharset? ?-part part? ?--? string In case, the parameter is not specified, it can be also be provided to the form-processing commands "ns_getform" and "ns_parsequery" via configuration variables: * per-server configuration parameter "formFallbackCharset" (in the section "ns/server/$server"), or as * global server configuration parameter "formFallbackCharset" (in the section "ns/parameters"). The highest precedence has the optional parameter, followed by the per-server configuration parameter and the global configuration parameter. - Provide a hint when cache-entry is too large for caching Background: the size of the entry is typically determined after the execution of a potentially expensive query. During the eval of the command, the cache entry is locked and forces a serialization. However, this means that in these cases the situation is worse than without a cache, where some queries can be executed in parallel. We faced the situation of an expected slowdown of the server with many "create entry collision", where due to application matters, an entry was becoming too large. This situation is not easy to debug, especially in stress situations. The log message would have helped a log to identify the cause. - Added support for multibyte numeric entities This change supports conversion of numeric entities representing multibyte characters into HTML in "ns_striphtml" and "ns_unquotehtml". Technically, the numeric entities represent Unicode code points, which are transformed into UTF-8 serialization. Every entity represents a single code point; The values can be provided in decimal or hexadecimal notation. Before this change, only single byte numeric entities were supported. ASCII control characters (decimal 0-31) are ignored as before. - New and extended commands: * ns_unquotehtml /text/ This command is the inverse operation of "ns_quotehtml". It replaces the named and numeric entities in the provided string with the native values. The command is similar to "ns_striphtml", but "ns_striphtml" removes as well other HTML markup which might not be desired in all cases. This change fixes as well a bug with numeric entities (the old code assumed, these are starting directly with a number after the ampersand) and it adds support for numeric entities with hexadecimal values (so far with the same value range as for decimal numeric entities). * ns_subnetmatch /subnet/ /ipaddr/ Determine, if a provided IP address (IPv4 or IPv6) is included in a subnet specification, which is provided in CIDR notation. The command makes internal NaviServer functionality available at the Tcl level. The regression test was extended to cover the functionality. The command ns_subnetmatch validates the provided subnet specification (IPv4 or IPv6 address followed by slash and number of significant bits) and the provided IP address and tests whether the IP address is in the implied range. The command returns a boolean value as the result. When comparing an IPv4 and IPv6 address/CIDR specification or vice versa, the result is always false. The function can be use when e.g. restricting access to certain functionality to some subnets. The function can be used as well to check, whether an IP address is an IPv4 or IPv6 address. Examples: % ns_subnetmatch 137.208.0.0/16 137.208.116.31 1 % ns_subnetmatch 137.208.0.0/16 112.207.16.33 0 % ns_subnetmatch 2001:628:404:74::31/64 [ns_conn peeraddr] ... # Is IP address a valid IPv6 address? % ns_subnetmatch ::/0 $ip # Is IP address a valid IPv4 address? % ns_subnetmatch 0.0.0.0/0 $ip * ns_connchan: Added new subcommand "ns_connchan connect" "ns_connchan connect" is similar to "ns_connchan open", except that it does not send an HTTP request (HTTP method, URL, and header fields) but just opens the connection. It can be used for some non-HTTP communication over TCP and TLS over the ns_connchan infrastructure. * ns_parseheader, Ns_ParseHeader(): return the field number (index) of the parsed entry Previously, there was no explicit feedback, what field of an "ns_set" has been parsed by "ns_parseheader". Now, in success cases, the function returns the index of the new/modified entry. This function made it possible to generalize and simplify the Tcl-level parsing of "multipart/form-data" significantly. Additionally, a new optional argument "-prefix" was added. When specified, it adds the specified prefix to the key. * ns_setcookie, ns_deletecookie Mozilla and Chrome changed the default value for SameSite of cookies from "none" to "lax" in February and Aug 2020. Cookies that explicitly set SameSite=None must also set the "Secure" attribute. In order to mirror this change of policy in NaviServer and to reduce necessary code changes, the default behavior for setting or deleting cookie is now samesite "lax" (when "-samesite" is not explicitly specified). When trying to set a cookie with "-samesite none" without the "-secure" flag, a warning is generated, and the "-samesite lax" is assumed, since major browsers announced that these will reject these cookies soon. API changes: - ns_getform, ns_parsequery, and ns_urldecode New optional parameter "-fallbackcharset". See above for details. - ns_parsequery: added option "-charset" and raise exception on failure The new option "-charset" can be used to add a charset for the result encoding of the passed-in HTTP query. In case the charset is UTF8 (default on most platforms), and the content is invalid UTF-8, an exception is raised (similar to ns_urldecode). This can be addressed by parameter "-fallbackcharset" (See above for details). - ns_deletecookie: added support "-samesite" flag for ns_deletecookie. Since "ns_deletecookie" sets internally a cookie, some browsers might ignore in the future certain cookie requests (e.g. when "-samesite" is not used or set to "none" on non-secure connections). - ns_trim enhancements: The new option "-prefix ..." can be used to strip a string (such as ">> ") from every line starting with it. - Potential incompatibilties * "ns_urldecode" and "ns_getform" will raise an exception when invalid UTF-8 data is tried to be interpreted as UTF-8. Such data cause trouble with external components such as Tdom or databases and opens vulnerability vectors. Performance Improvements: - Improve "cachingmode none" "ns_cache_eval" works as follows: 1) create a temporary cache entry for the key 2) lock the cache-key (to avoid multiple parallel executions) 3) execute the query 4) store the result for the entry on success 5) unlock the cache-key Previously, "cachingmode none" was simply avoiding to store the cached values (step 4), but was serializing calls for a cache key as in default caching modes. This was leading easily to cache entry collisions. Now, "cachingmode none" is avoiding all steps 1..5 (therefore no serialization and no cache collisions). See also:https://openacs.org/forums/message-view?message_id=5665480 Bug Fixes: - Improved robustness of "ns_parseurl" for handling query parameters and fragments for partial URLs * fix over-eager collecting of URL components in tail * extended regression test - Fixed Ns_ResetFileVec NOT to invalidate residual Ns_FileVec buffer.q (caused problems under Windows). - ns_striphtml: Fixed probably very old bug for markup immediately after an entity This bug fix handles cases, where e.g. two entities are in a text right next to each other, like e.g. in the string "hello<>world". The old code was correctly decoding the first entity, but output the second one literally. - Fixed compilation for C++, which was introduced in 4.99.23 to avoid usage of reserved C identifiers Many thanks to Brendan Graves for reporting the problem. - Added missing named entities "apos" and "quote". These have been missing since ages. - Provide an error message when the configured locale is not installed on the host. This change causes NaviServer to abort, when the configured locale is not installed on the host. Typically, this locale is e.g. used by "ns_strcoll" for determining the default collating order. The configuration file for the regression testing sets the environment variable LANG to "en_US.UTF-8". This means that for running the stock regression test, this locale must be installed on the system. Before this change, NaviServer could crash at runtime when trying to access the default locale (as e.g. in "ns_strcoll") - Added support for "_charset_" field for default charset in multipart/form-data (RFC 7578, section 4.6) RFC 7578 (July 2015) defines an optional "_charset_" entry in the form (provided typically as hidden form field) to specify the charset of text entries. This is now supported as well by NaviServer. This is apparently a seldom used feature. Documentation improvements: --------------------------- - Improved the following man pages: doc/src/manual/admin-install.man doc/src/naviserver/ns_conn.man doc/src/naviserver/ns_connchan.man doc/src/naviserver/ns_crypto.man doc/src/naviserver/ns_http.man doc/src/naviserver/ns_httptime.man doc/src/naviserver/ns_log.man doc/src/naviserver/ns_parseheader.man doc/src/naviserver/ns_parsequery.man doc/src/naviserver/ns_parseurl.man doc/src/naviserver/ns_rlimit.man doc/src/naviserver/ns_subnetmatch.man doc/src/naviserver/ns_urldecode.man doc/src/naviserver/ns_urlencode.man doc/src/naviserver/textutil-cmds.man nsdb/doc/mann/ns_db.man Configuration Changes: ---------------------- - Updated OpenACS sample configuration file * reflect recent Oracle (tested with Oracle 19c) * added documentation for "StaticCSP", "CookieNamespace", "NsShutdownWithNonZeroExitCode", "LogIncludeUserId" Code Changes: ------------- - Set Tcl error code "NS_INVALID_UTF8" for errors due to invalid UTF-8 - Changed Tcl error code "NSCACHE" to "NS_CACHE". Now all NaviServer-specific error codes start with the prefix "NS_". - Extended regression test - Improve Tcl version compatibility * Removed -DTCL_NO_DEPRECATED from default CFLAGS to cope with recent deprecation in Tcl 8.7a5 - Code Cleanup . Do not declare reserved C identifiers . Improved type cleanness . Refactored file-based multipart form parser to make logic explicit (Tcl code) - Improved comments, fixed typos - Marked "ns_set_precision" as deprecated, since there is no reason why not setting the Tcl variable ::tcl_precision directly. - Don't hard-wire port for https testing to 8443 The setup code looks now for a free port for HTTPS connections starting with 8443, and remembers the free port in the configuration value "tls_listenport" and "tls_listenurl". This is now fully analogous to the setup of the plain HTTP testing (setting "listenport" and "listenurl") - Silence warning with recent versions of gcc when certain values of _FORTIFY_SOURCE/-Wstringop-overflow are set. Changes in NaviServer Modules: ============================== ... On 03.06.22 20:03, Gustaf Neumann wrote: > > Dear David, > > the automated shortening for the invalid strings is now committed. > > https://bitbucket.org/naviserver/naviserver/commits/51f101928be6d27efe5ab78d7d9a9693026791c1 > > I'll try to make rc2 soon. > > all the best > > -gn > |
From: Gustaf N. <ne...@wu...> - 2022-06-03 18:03:50
|
Dear David, the automated shortening for the invalid strings is now committed. https://bitbucket.org/naviserver/naviserver/commits/51f101928be6d27efe5ab78d7d9a9693026791c1 I'll try to make rc2 soon. all the best -gn On 30.05.22 21:01, Gustaf Neumann wrote: > > i agree, this might be a lot of logging, especially when the decoded > strings are long. > > Not sure, what the best approach of this is > - an configuration parameter LogInvalidUTF8warnings to deactivate > these messages (default true) > - an configuration parameter NrInvalidUTF8warnings to limit these > messages (default 99999999) > - truncate the invalid UTF8 sequence in the log entry to shorten the > messages (when longer than e.g. 20 bytes) > - try to identify the first invalid UTF8 sequence in the string and > report just this (even shorter log message). > > The last option would be probably the best, but requires an C.API change. > > Other options? > > -gn > > > > On 30.05.22 10:41, David Osborne wrote: >> Thanks Gustaf - the errorCode will be very handy to trap these >> encoding errors. >> >> We're using the per-server "formfallbackcharset" without issue at the >> moment. Working well. >> >> One problem we encountered is the Warnings logged when invalid UTF-8 >> is encountered. >> Some of the POSTs causing encoding issues were very large. This was >> filling logs quite quickly (I think it's possible to have "maxpost" >> bytes of data written to the log). Didn't want to disable all >> Warnings, so we patched NaviServer to log just a Warning of invalid >> UTF-8 but not include the data itself. But there may be a better way >> of dealing with this. >> >> Regards, >> David >> >> >> On Sat, 28 May 2022 at 18:45, Gustaf Neumann <ne...@wu...> wrote: >> >> Dear all, >> >> The latest commits >> - added the "multipart/form-data" handlingfor fallback charsets, >> - provides an error code for invalid UTF-8 and >> - adds support for the "_charset_" field for default charsets >> (see RFC >> 7578, section 4.6). >> >> Also, the regression test got several more tests. >> >> I have still a bug report for ns_connchan (which i could not >> reproduce >> so far), >> if i find something to fix the next days, this will go as well >> into the next >> release, otherwise we are ready for rc2. >> >> all the best >> >> >> >> _______________________________________________ >> naviserver-devel mailing list >> nav...@li... >> https://lists.sourceforge.net/lists/listinfo/naviserver-devel > -- > Univ.Prof. Dr. Gustaf Neumann > Head of the Institute of Information Systems and New Media > of Vienna University of Economics and Business > Program Director of MSc "Information Systems" > > > _______________________________________________ > naviserver-devel mailing list > nav...@li... > https://lists.sourceforge.net/lists/listinfo/naviserver-devel -- Univ.Prof. Dr. Gustaf Neumann Head of the Institute of Information Systems and New Media of Vienna University of Economics and Business Program Director of MSc "Information Systems" |
From: Gustaf N. <ne...@wu...> - 2022-05-30 19:01:47
|
i agree, this might be a lot of logging, especially when the decoded strings are long. Not sure, what the best approach of this is - an configuration parameter LogInvalidUTF8warnings to deactivate these messages (default true) - an configuration parameter NrInvalidUTF8warnings to limit these messages (default 99999999) - truncate the invalid UTF8 sequence in the log entry to shorten the messages (when longer than e.g. 20 bytes) - try to identify the first invalid UTF8 sequence in the string and report just this (even shorter log message). The last option would be probably the best, but requires an C.API change. Other options? -gn On 30.05.22 10:41, David Osborne wrote: > Thanks Gustaf - the errorCode will be very handy to trap these > encoding errors. > > We're using the per-server "formfallbackcharset" without issue at the > moment. Working well. > > One problem we encountered is the Warnings logged when invalid UTF-8 > is encountered. > Some of the POSTs causing encoding issues were very large. This was > filling logs quite quickly (I think it's possible to have "maxpost" > bytes of data written to the log). Didn't want to disable all > Warnings, so we patched NaviServer to log just a Warning of invalid > UTF-8 but not include the data itself. But there may be a better way > of dealing with this. > > Regards, > David > > > On Sat, 28 May 2022 at 18:45, Gustaf Neumann <ne...@wu...> wrote: > > Dear all, > > The latest commits > - added the "multipart/form-data" handlingfor fallback charsets, > - provides an error code for invalid UTF-8 and > - adds support for the "_charset_" field for default charsets > (see RFC > 7578, section 4.6). > > Also, the regression test got several more tests. > > I have still a bug report for ns_connchan (which i could not > reproduce > so far), > if i find something to fix the next days, this will go as well > into the next > release, otherwise we are ready for rc2. > > all the best > > > > _______________________________________________ > naviserver-devel mailing list > nav...@li... > https://lists.sourceforge.net/lists/listinfo/naviserver-devel -- Univ.Prof. Dr. Gustaf Neumann Head of the Institute of Information Systems and New Media of Vienna University of Economics and Business Program Director of MSc "Information Systems" |
From: David O. <da...@qc...> - 2022-05-30 08:42:12
|
Thanks Gustaf - the errorCode will be very handy to trap these encoding errors. We're using the per-server "formfallbackcharset" without issue at the moment. Working well. One problem we encountered is the Warnings logged when invalid UTF-8 is encountered. Some of the POSTs causing encoding issues were very large. This was filling logs quite quickly (I think it's possible to have "maxpost" bytes of data written to the log). Didn't want to disable all Warnings, so we patched NaviServer to log just a Warning of invalid UTF-8 but not include the data itself. But there may be a better way of dealing with this. Regards, David On Sat, 28 May 2022 at 18:45, Gustaf Neumann <ne...@wu...> wrote: > Dear all, > > The latest commits > - added the "multipart/form-data" handlingfor fallback charsets, > - provides an error code for invalid UTF-8 and > - adds support for the "_charset_" field for default charsets (see RFC > 7578, section 4.6). > > Also, the regression test got several more tests. > > I have still a bug report for ns_connchan (which i could not reproduce > so far), > if i find something to fix the next days, this will go as well into the > next > release, otherwise we are ready for rc2. > > all the best > > |
From: Gustaf N. <ne...@wu...> - 2022-05-28 17:45:04
|
Dear all, The latest commits - added the "multipart/form-data" handlingfor fallback charsets, - provides an error code for invalid UTF-8 and - adds support for the "_charset_" field for default charsets (see RFC 7578, section 4.6). Also, the regression test got several more tests. I have still a bug report for ns_connchan (which i could not reproduce so far), if i find something to fix the next days, this will go as well into the next release, otherwise we are ready for rc2. all the best -gn On 19.05.22 21:19, Gustaf Neumann wrote: > > Hi David, > > we have not a global and per-server parameter called > "formfallbackcharset", > the flag for "ns_getform" and "ns_parsequery" is now called > "fallbackcharset". > > In many cases, using e.g. the per-server parameter should be > sufficient to handle > incorrect queries... > > still missing: "multipart/form-data" handling and documentation > updates, error code > > all the best > |
From: Gustaf N. <ne...@wu...> - 2022-05-23 22:26:59
|
Dear all, The deadline for submissions of abstracts for presentation at the forthcoming joint OpenACS / EuroTcl conference is approaching. It is less than one week left. Presentation about NaviServer applications are also very welcome. Important dates: May 27th, 2022: Deadline for submissions of abstracts (max. 2 pages, min. 250 words); June 3rd, 2022: Notification of acceptance June 15th, 2022: Registration ends June 29th, 2022: Meet & greet June 30th - July 1st, 2022: Conference For details, see: https://openacs.org/conf2022/ Gustaf Neumann and Paul Obermeier -- Univ.Prof. Dr. Gustaf Neumann Head of the Institute of Information Systems and New Media of Vienna University of Economics and Business Program Director of MSc "Information Systems" |
From: Gustaf N. <ne...@wu...> - 2022-05-20 19:42:24
|
Thanks as well, change is welcome! ... i've added the documentation for the configuration variables. -g On 20.05.22 16:28, David Osborne wrote: > Thanks Gustaf - I've run some quick tests against the per-server and > global fallback and it seems to work well in the cases we're looking > at - thanks for the fast work! > I've yet to try the ns_getform option but will do so shortly. I've > submitted a pull req for the ns_getform doc which I'm hoping might be > useful to you. > Thanks again. |
From: David O. <da...@qc...> - 2022-05-20 14:28:58
|
Thanks Gustaf - I've run some quick tests against the per-server and global fallback and it seems to work well in the cases we're looking at - thanks for the fast work! I've yet to try the ns_getform option but will do so shortly. I've submitted a pull req for the ns_getform doc which I'm hoping might be useful to you. Thanks again. On Thu, 19 May 2022 at 20:20, Gustaf Neumann <ne...@wu...> wrote: > Hi David, > > we have not a global and per-server parameter called "formfallbackcharset", > the flag for "ns_getform" and "ns_parsequery" is now called > "fallbackcharset". > > In many cases, using e.g. the per-server parameter should be sufficient to > handle > incorrect queries... > > still missing: "multipart/form-data" handling and documentation updates, > error code > > all the best > > -gn > On 18. > |
From: Gustaf N. <ne...@wu...> - 2022-05-19 19:19:54
|
Hi David, we have not a global and per-server parameter called "formfallbackcharset", the flag for "ns_getform" and "ns_parsequery" is now called "fallbackcharset". In many cases, using e.g. the per-server parameter should be sufficient to handle incorrect queries... still missing: "multipart/form-data" handling and documentation updates, error code all the best -gn On 18.05.22 22:00, Gustaf Neumann wrote: > > Dear David, > > i've committed the option "-fallbackencodings" for the commands > "ns_getform" and "ns_parsequery". The implementation covers > "ns_getform", where the data is provided as > "application/x-www-form-urlencoded" either when parsing from memory > or from the spool file. The "multipart/form-data" implementation (also > separate for memory and spoolfile) is not yet covered. > > We can also consider a global parameter for the configuration file > (like e.g. FormFallbackEncodings). Probably, we should use the term > "charset" instead of "encoding", since "charset" is the MIME term, > also used for e.g. "URLCharset", while "encoding" is the Tcl name. > > Although the names might still change, you might test whether this > works for your test cases. > > -gn > > On 16.05.22 16:16, David Osborne wrote: >> Hi Gustaf, >> >> I spotted that *ns_getform *takes a charset argument from looking at >> the source code. >> The options for overriding charsets at the moment seem to be: >> >> *ns_getform iso8859-1 >> * >> * >> * >> *ns_urlcharset iso8859-1* >> *ns_getform >> * >> * >> * >> *ns_conn urlencoding iso8859-1 >> * >> *ns_getform * >> >> We experimented with some code which tried to trap errors from >> *ns_getform*, and where the error was due to "invalid UTF-8", try a >> fallback charset. >> All 3 of the above techniques worked OK when the Content-Type header >> leaves the charset /unspecified/. >> >> The main issues we had were: >> >> 1. When a *charset=utf-8* is present in the *Content-Type* header, >> this overrides ([1]) any encoding we pass with using the 3 techniques >> above. >> In those cases we have to manipulate the headers' ns_set to remove or >> change the charset. >> eg. >> *Content-Type: application/x-www-form-urlencoded; charset=utf-8* >> transform to -> >> *Content-Type: application/x-www-form-urlencoded* >> or >> *Content-Type: application/x-www-form-urlencoded; charset=windows-1252* >> >> 2. Trapping the specific "invalid UTF-8" error - this method seems >> fragile - would be nice if there was an *errorCode *we would trap. >> *::try { >> * >> * ns_getform* >> *} on error {msg options} {* >> * if { [string match "*contains invalid UTF-8" $msg] } {* >> * # change Content_type charset (if present)* >> * # try fallback charset* >> * } else {* >> * # rethrow error* >> * }* >> *}* >> >> But I think this presents us with a way forward in cases where client >> apps are not getting the encoding correct. >> >> [1] >> https://bitbucket.org/naviserver/naviserver/annotate/master/nsd/form.c?at=master#form.c-170 >> >> |
From: Gustaf N. <ne...@wu...> - 2022-05-18 20:01:13
|
Dear David, i've committed the option "-fallbackencodings" for the commands "ns_getform" and "ns_parsequery". The implementation covers "ns_getform", where the data is provided as "application/x-www-form-urlencoded" either when parsing from memory or from the spool file. The "multipart/form-data" implementation (also separate for memory and spoolfile) is not yet covered. We can also consider a global parameter for the configuration file (like e.g. FormFallbackEncodings). Probably, we should use the term "charset" instead of "encoding", since "charset" is the MIME term, also used for e.g. "URLCharset", while "encoding" is the Tcl name. Although the names might still change, you might test whether this works for your test cases. -gn On 16.05.22 16:16, David Osborne wrote: > Hi Gustaf, > > I spotted that *ns_getform *takes a charset argument from looking at > the source code. > The options for overriding charsets at the moment seem to be: > > *ns_getform iso8859-1 > * > * > * > *ns_urlcharset iso8859-1* > *ns_getform > * > * > * > *ns_conn urlencoding iso8859-1 > * > *ns_getform * > > We experimented with some code which tried to trap errors from > *ns_getform*, and where the error was due to "invalid UTF-8", try a > fallback charset. > All 3 of the above techniques worked OK when the Content-Type header > leaves the charset /unspecified/. > > The main issues we had were: > > 1. When a *charset=utf-8* is present in the *Content-Type* header, > this overrides ([1]) any encoding we pass with using the 3 techniques > above. > In those cases we have to manipulate the headers' ns_set to remove or > change the charset. > eg. > *Content-Type: application/x-www-form-urlencoded; charset=utf-8* > transform to -> > *Content-Type: application/x-www-form-urlencoded* > or > *Content-Type: application/x-www-form-urlencoded; charset=windows-1252* > > 2. Trapping the specific "invalid UTF-8" error - this method seems > fragile - would be nice if there was an *errorCode *we would trap. > *::try { > * > * ns_getform* > *} on error {msg options} {* > * if { [string match "*contains invalid UTF-8" $msg] } {* > * # change Content_type charset (if present)* > * # try fallback charset* > * } else {* > * # rethrow error* > * }* > *}* > > But I think this presents us with a way forward in cases where client > apps are not getting the encoding correct. > > [1] > https://bitbucket.org/naviserver/naviserver/annotate/master/nsd/form.c?at=master#form.c-170 > > > _______________________________________________ > naviserver-devel mailing list > nav...@li... > https://lists.sourceforge.net/lists/listinfo/naviserver-devel -- Univ.Prof. Dr. Gustaf Neumann Head of the Institute of Information Systems and New Media of Vienna University of Economics and Business Program Director of MSc "Information Systems" |
From: David O. <da...@qc...> - 2022-05-16 14:16:53
|
Hi Gustaf, I spotted that *ns_getform *takes a charset argument from looking at the source code. The options for overriding charsets at the moment seem to be: *ns_getform iso8859-1* *ns_urlcharset iso8859-1* *ns_getform * *ns_conn urlencoding iso8859-1* *ns_getform * We experimented with some code which tried to trap errors from *ns_getform*, and where the error was due to "invalid UTF-8", try a fallback charset. All 3 of the above techniques worked OK when the Content-Type header leaves the charset *unspecified*. The main issues we had were: 1. When a *charset=utf-8* is present in the *Content-Type* header, this overrides ([1]) any encoding we pass with using the 3 techniques above. In those cases we have to manipulate the headers' ns_set to remove or change the charset. eg. *Content-Type: application/x-www-form-urlencoded; charset=utf-8* transform to -> *Content-Type: application/x-www-form-urlencoded* or *Content-Type: application/x-www-form-urlencoded; charset=windows-1252* 2. Trapping the specific "invalid UTF-8" error - this method seems fragile - would be nice if there was an *errorCode *we would trap. *::try {* * ns_getform* *} on error {msg options} {* * if { [string match "*contains invalid UTF-8" $msg] } {* * # change Content_type charset (if present)* * # try fallback charset* * } else {* * # rethrow error* * }* *}* But I think this presents us with a way forward in cases where client apps are not getting the encoding correct. [1] https://bitbucket.org/naviserver/naviserver/annotate/master/nsd/form.c?at=master#form.c-170 |
From: Gustaf N. <ne...@wu...> - 2022-05-14 07:59:29
|
Hi Dave, Maybe i find time slots before the release for easing this process, e.g.m providing a flag for providing a charset for "ns_getform" in case it fails, "ns_urldecode" has already a "-charset" flag) but i have not checked the details how complex this is. all the best -g On 13.05.22 10:32, David Osborne wrote: > Thanks Gustaf, > > I didn't pick up that your latest commit makes it possible to catch > and handle an encoding error now. > Thanks - we'll try to address the issue that way. > Regards, > Dave > > On Thu, 12 May 2022 at 12:27, Gustaf Neumann <ne...@wu...> wrote: > > Dear David, > > NaviServer is less strict than the W3C-document, since it does not > send automatically an error back. > Such invalid characters can show up during decode operations of > ns_urldecode and ns_getform. > So, a custom application can catch exceptions and try alternative > encodings if necessary. > > Since there is currently a large refactoring concerning Unicode > handling going on for > the Tcl community (with potentially different handling in Tcl 8.6, > 8.7 and 9.0, ... hopefully > there will be full support for Unicode already in Tcl 8.7, the > voting is happening right now) > it is not a good idea to come up with a special handling by > NaviServer. These byte sequences > have to be processed sooner or later by Tcl in various versions... > > I do not think it is a good idea to swallow incorrect incoming > data by transforming this > on the fly, this will cause sooner or later user concerns (e.g. > "why is this funny character > in the user name", ...) When the legacy application sends e.g. > iso8859 encoded data, then it > should set the appropriate charset, and it will be properly > converted by NaviServer. > > If for whatever reason this is not feasible to get a proper > charset, then the NaviServer > approach allows to make a second attempt of decoding the data with > a different charset. > > all the best > > -gn > > > > _______________________________________________ > naviserver-devel mailing list > nav...@li... > https://lists.sourceforge.net/lists/listinfo/naviserver-devel -- Univ.Prof. Dr. Gustaf Neumann Head of the Institute of Information Systems and New Media of Vienna University of Economics and Business Program Director of MSc "Information Systems" |
From: David O. <da...@qc...> - 2022-05-13 08:33:08
|
Thanks Gustaf, I didn't pick up that your latest commit makes it possible to catch and handle an encoding error now. Thanks - we'll try to address the issue that way. Regards, Dave On Thu, 12 May 2022 at 12:27, Gustaf Neumann <ne...@wu...> wrote: > Dear David, > > NaviServer is less strict than the W3C-document, since it does not send > automatically an error back. > Such invalid characters can show up during decode operations of > ns_urldecode and ns_getform. > So, a custom application can catch exceptions and try alternative > encodings if necessary. > > Since there is currently a large refactoring concerning Unicode handling > going on for > the Tcl community (with potentially different handling in Tcl 8.6, 8.7 and > 9.0, ... hopefully > there will be full support for Unicode already in Tcl 8.7, the voting is > happening right now) > it is not a good idea to come up with a special handling by NaviServer. > These byte sequences > have to be processed sooner or later by Tcl in various versions... > > I do not think it is a good idea to swallow incorrect incoming data by > transforming this > on the fly, this will cause sooner or later user concerns (e.g. "why is > this funny character > in the user name", ...) When the legacy application sends e.g. iso8859 > encoded data, then it > should set the appropriate charset, and it will be properly converted by > NaviServer. > > If for whatever reason this is not feasible to get a proper charset, then > the NaviServer > approach allows to make a second attempt of decoding the data with a > different charset. > > all the best > > -gn > |
From: Gustaf N. <ne...@wu...> - 2022-05-12 11:27:03
|
Dear David, NaviServer is less strict than the W3C-document, since it does not send automatically an error back. Such invalid characters can show up during decode operations of ns_urldecode and ns_getform. So, a custom application can catch exceptions and try alternative encodings if necessary. Since there is currently a large refactoring concerning Unicode handling going on for the Tcl community (with potentially different handling in Tcl 8.6, 8.7 and 9.0, ... hopefully there will be full support for Unicode already in Tcl 8.7, the voting is happening right now) it is not a good idea to come up with a special handling by NaviServer. These byte sequences have to be processed sooner or later by Tcl in various versions... I do not think it is a good idea to swallow incorrect incoming data by transforming this on the fly, this will cause sooner or later user concerns (e.g. "why is this funny character in the user name", ...) When the legacy application sends e.g. iso8859 encoded data, then it should set the appropriate charset, and it will be properly converted by NaviServer. If for whatever reason this is not feasible to get a proper charset, then the NaviServer approach allows to make a second attempt of decoding the data with a different charset. all the best -gn On 12.05.22 11:05, David Osborne wrote: > > Thanks again Gustaf, > > I can see the W3C spec you reference seems quite unequivocal in saying > an error message should be sent back when decoding invalid UTF-8 form > data. > > But I was curious why other implementations appear to use the UTF-8 > replacement character (U+FFFD) instead, and found a bit of discussion > in the unicode standard itself [1] & [2]. > > [1] specifically refers to the WHATWG(W3C) spec for encoding/decoding > [3] which defines an "error" condition when decoding UTF-8 as being > one of two possible error modes: > Namely: > > * fatal - "return the error" > * replacement - "Push U+FFFD (�) to output." > > This aligns with the behaviour of, say, Python's bytes.decode() where > the default is to raise an error for encoding errors ("strict" error > handling), but optionally, you can specify "replace" error handling > which will utilise the U+FFFD character instead. I can see this > working in cases where we're told the data should be UTF-8, or where > we're assuming by default it's UTF-8. > > But I'm not sure how much work this would be to implement and whether > it is seen as worthwhile to others? > > As it stands, we have legacy applications which POSTs data to us which > regularly (and, by now, expectedly) sends invalid characters despite > best efforts to fix it. > I guess we would redirect the POSTs to another non-naviserver system, > sanitise the data there, then send it on to NaviServer, but it would > be nice to be able to deal with it within NaviServer itself. > > [1] https://www.unicode.org/versions/Unicode14.0.0/ch03.pdf (Section > 3.9 "U+FFFD Substitution of Maximal Subparts") > [2] https://www.unicode.org/versions/Unicode14.0.0/ch05.pdf (Section > 5.22 "U+FFFD Substitution in Conversion") > [3] https://encoding.spec.whatwg.org/#decoder > [4] https://docs.python.org/3/library/stdtypes.html#bytes.decode > > > On Mon, 2 May 2022 at 13:30, Gustaf Neumann <ne...@wu...> wrote: > > Dear David and all, > > I looked into this issue, and I do not like the current situation > either. > In the current snapshot, a GET request with invalid coded > query variables is rejected, while the POST request leads just > to the warning, and the invalid entry is omitted. > > W3C [1] says in the reference for Multilingual form encoding: > > If non-UTF-8 data is received, an error message should be sent back. > > This means, that the only defensible logic is to reject in both cases > the request as invalid. One can certainly send single-byte funny > character > data in URLs, which is invalid UTF8 (e.g. "%9C" or "%E6" etc.), > but for these requests, the charset has to be specified, either > via content type, or via the default URL encoding in the NaviServer > configuration... see example (2) below. > > As mentioned earlier, there are increasingly many attacks with invalid > UTF-8 data (also by vulnerability scanners), so we to be strict here. > > I will try to address the outstanding issues ASAP and provide then > another RC. > > All the best > > -gn > > [1] https://www.w3.org/International/questions/qa-forms-utf-8 > > > # POST request with already encoded form data (x-www-form-urlencoded) > $ curl -X POST -d "p1=a%C5%93Cb&p2=a%E6b" localhost:8100/upload.tcl > > # POST request with already encoded form data, but proper encoding > $ curl -X POST -H "Content-Type: application/x-www-form-urlencoded; charset=iso-8859-1" -d "p2=a%E6b" localhost:8100/upload.tcl > > # POST + x-www-form-urlencoded, but let curl do the encoding > $ curl -X POST -d "p1=aœb" -d $(echo -e 'p2=a\xE6b') localhost:8100/upload.tcl > > # POST + multipart/form-data, let curl do the encoding > $ curl -X POST -F "p1=aœb" -F $(echo -e 'p2=a\xE6b') localhost:8100/upload.tcl > > POST request with already encoded form data (x-www-form-urlencoded) > $ curl -X GET "localhost:8100/upload.tcl?p1=a%C5%93Cb&p2=a%E6b" > > > > > _______________________________________________ > naviserver-devel mailing list > nav...@li... > https://lists.sourceforge.net/lists/listinfo/naviserver-devel -- Univ.Prof. Dr. Gustaf Neumann Head of the Institute of Information Systems and New Media of Vienna University of Economics and Business Program Director of MSc "Information Systems" |
From: David O. <da...@qc...> - 2022-05-12 09:05:22
|
Thanks again Gustaf, I can see the W3C spec you reference seems quite unequivocal in saying an error message should be sent back when decoding invalid UTF-8 form data. But I was curious why other implementations appear to use the UTF-8 replacement character (U+FFFD) instead, and found a bit of discussion in the unicode standard itself [1] & [2]. [1] specifically refers to the WHATWG(W3C) spec for encoding/decoding [3] which defines an "error" condition when decoding UTF-8 as being one of two possible error modes: Namely: - fatal - "return the error" - replacement - "Push U+FFFD (�) to output." This aligns with the behaviour of, say, Python's bytes.decode() where the default is to raise an error for encoding errors ("strict" error handling), but optionally, you can specify "replace" error handling which will utilise the U+FFFD character instead. I can see this working in cases where we're told the data should be UTF-8, or where we're assuming by default it's UTF-8. But I'm not sure how much work this would be to implement and whether it is seen as worthwhile to others? As it stands, we have legacy applications which POSTs data to us which regularly (and, by now, expectedly) sends invalid characters despite best efforts to fix it. I guess we would redirect the POSTs to another non-naviserver system, sanitise the data there, then send it on to NaviServer, but it would be nice to be able to deal with it within NaviServer itself. [1] https://www.unicode.org/versions/Unicode14.0.0/ch03.pdf (Section 3.9 "U+FFFD Substitution of Maximal Subparts") [2] https://www.unicode.org/versions/Unicode14.0.0/ch05.pdf (Section 5.22 "U+FFFD Substitution in Conversion") [3] https://encoding.spec.whatwg.org/#decoder [4] https://docs.python.org/3/library/stdtypes.html#bytes.decode On Mon, 2 May 2022 at 13:30, Gustaf Neumann <ne...@wu...> wrote: > Dear David and all, > > I looked into this issue, and I do not like the current situation either. > In the current snapshot, a GET request with invalid coded > query variables is rejected, while the POST request leads just > to the warning, and the invalid entry is omitted. > > W3C [1] says in the reference for Multilingual form encoding: > > If non-UTF-8 data is received, an error message should be sent back. > > This means, that the only defensible logic is to reject in both cases > the request as invalid. One can certainly send single-byte funny character > data in URLs, which is invalid UTF8 (e.g. "%9C" or "%E6" etc.), > but for these requests, the charset has to be specified, either > via content type, or via the default URL encoding in the NaviServer > configuration... see example (2) below. > > As mentioned earlier, there are increasingly many attacks with invalid > UTF-8 data (also by vulnerability scanners), so we to be strict here. > > I will try to address the outstanding issues ASAP and provide then > another RC. > > All the best > > -gn > > [1] https://www.w3.org/International/questions/qa-forms-utf-8 > > > # POST request with already encoded form data (x-www-form-urlencoded) > $ curl -X POST -d "p1=a%C5%93Cb&p2=a%E6b" localhost:8100/upload.tcl > > # POST request with already encoded form data, but proper encoding > $ curl -X POST -H "Content-Type: application/x-www-form-urlencoded; charset=iso-8859-1" -d "p2=a%E6b" localhost:8100/upload.tcl > > # POST + x-www-form-urlencoded, but let curl do the encoding > $ curl -X POST -d "p1=aœb" -d $(echo -e 'p2=a\xE6b') localhost:8100/upload.tcl > > # POST + multipart/form-data, let curl do the encoding > $ curl -X POST -F "p1=aœb" -F $(echo -e 'p2=a\xE6b') localhost:8100/upload.tcl > > POST request with already encoded form data (x-www-form-urlencoded) > $ curl -X GET "localhost:8100/upload.tcl?p1=a%C5%93Cb&p2=a%E6b" > > > |
From: Gustaf N. <ne...@wu...> - 2022-05-03 12:39:17
|
Dear all, i have committed a change to achieve a more consistent and compliant behavior. Since all form and query processing of NaviServer happens via API (ns_urldecode, ns_getform), the current architecture does not allow direct error messages. The NaviServer philosophy is that the (Tcl) developer should have the option to handle such cases application specific. We had recently changes to address this (mostly driven by vulnerability scanners) by letting e.g. "ns_urldecode" to raise an exception when this happens. This change completes this by also raising an exception for "ns_getform" in such conditions. Note that raising an exception might be a potential incompatibility for invalid data (which was "swallowed" before). The regression test was extended to handle such cases. There is one more thing (in ns_connchan, so far, not able to reproduce) that i would like to have a look on before making the next release candidate available. all the best -gn On 02.05.22 14:29, Gustaf Neumann wrote: > > Dear David and all, > > I looked into this issue, and I do not like the current situation either. > In the current snapshot, a GET request with invalid coded > query variables is rejected, while the POST request leads just > to the warning, and the invalid entry is omitted. > > W3C [1] says in the reference for Multilingual form encoding: > > If non-UTF-8 data is received, an error message should be sent back. > > This means, that the only defensible logic is to reject in both cases > the request as invalid. One can certainly send single-byte funny character > data in URLs, which is invalid UTF8 (e.g. "%9C" or "%E6" etc.), > but for these requests, the charset has to be specified, either > via content type, or via the default URL encoding in the NaviServer > configuration... see example (2) below. > > As mentioned earlier, there are increasingly many attacks with invalid > UTF-8 data (also by vulnerability scanners), so we to be strict here. > > I will try to address the outstanding issues ASAP and provide then > another RC. > > All the best > > -gn > > [1] https://www.w3.org/International/questions/qa-forms-utf-8 > > > # POST request with already encoded form data (x-www-form-urlencoded) > $ curl -X POST -d "p1=a%C5%93Cb&p2=a%E6b" localhost:8100/upload.tcl > > # POST request with already encoded form data, but proper encoding > $ curl -X POST -H "Content-Type: application/x-www-form-urlencoded; charset=iso-8859-1" -d "p2=a%E6b" localhost:8100/upload.tcl > > # POST + x-www-form-urlencoded, but let curl do the encoding > $ curl -X POST -d "p1=aœb" -d $(echo -e 'p2=a\xE6b') localhost:8100/upload.tcl > > # POST + multipart/form-data, let curl do the encoding > $ curl -X POST -F "p1=aœb" -F $(echo -e 'p2=a\xE6b') localhost:8100/upload.tcl > > POST request with already encoded form data (x-www-form-urlencoded) > $ curl -X GET "localhost:8100/upload.tcl?p1=a%C5%93Cb&p2=a%E6b" > On 28.04.22 17:45, David Osborne wrote: >> Hi Gustaf, >> >> We've been testing *4.99.24 rc1* and it seems pretty solid so far. >> Thanks for all the work that went into it. >> >> One change of behaviour that is causing us issues is the handling of >> invalid UTF8 characters. >> >> We have a system which regularly POSTs data to NaviServer - sometimes >> (for reasons we're looking into) the POST'ed data received by >> NaviServer can contain urlencoded characters which don't exist in >> UTF8 ( for example *%9C* instead of *%C5%93*). >> >> In previous versions of NaviServer, this causes an invalid character >> to be embedded in the data when we save it. >> >> Now, in version 4.99.24 we, rightly, get the warning "*Warning: >> decoded string is invalid UTF-8:*". >> But the additional behaviour is that the entire form variable seems >> to be dropped. >> >> I just wanted to query if that is the intended behaviour? >> >> I've seen some servers convert such invalid characters to *\ufffd* >> (\ufffd being "replacement character" - "used to replace an incoming >> character whose value is unknown or unrepresentable in Unicode") - >> but not sure which is the correct behaviour. >> >> Regards, >> Dave >> >> >> >> >> >> _______________________________________________ >> naviserver-devel mailing list >> nav...@li... >> https://lists.sourceforge.net/lists/listinfo/naviserver-devel > -- > Univ.Prof. Dr. Gustaf Neumann > Head of the Institute of Information Systems and New Media > of Vienna University of Economics and Business > Program Director of MSc "Information Systems" > > > _______________________________________________ > naviserver-devel mailing list > nav...@li... > https://lists.sourceforge.net/lists/listinfo/naviserver-devel -- Univ.Prof. Dr. Gustaf Neumann Head of the Institute of Information Systems and New Media of Vienna University of Economics and Business Program Director of MSc "Information Systems" |
From: Gustaf N. <ne...@wu...> - 2022-05-02 12:29:29
|
Dear David and all, I looked into this issue, and I do not like the current situation either. In the current snapshot, a GET request with invalid coded query variables is rejected, while the POST request leads just to the warning, and the invalid entry is omitted. W3C [1] says in the reference for Multilingual form encoding: > If non-UTF-8 data is received, an error message should be sent back. This means, that the only defensible logic is to reject in both cases the request as invalid. One can certainly send single-byte funny character data in URLs, which is invalid UTF8 (e.g. "%9C" or "%E6" etc.), but for these requests, the charset has to be specified, either via content type, or via the default URL encoding in the NaviServer configuration... see example (2) below. As mentioned earlier, there are increasingly many attacks with invalid UTF-8 data (also by vulnerability scanners), so we to be strict here. I will try to address the outstanding issues ASAP and provide then another RC. All the best -gn [1] https://www.w3.org/International/questions/qa-forms-utf-8 # POST request with already encoded form data (x-www-form-urlencoded) $ curl -X POST -d "p1=a%C5%93Cb&p2=a%E6b" localhost:8100/upload.tcl # POST request with already encoded form data, but proper encoding $ curl -X POST -H "Content-Type: application/x-www-form-urlencoded; charset=iso-8859-1" -d "p2=a%E6b" localhost:8100/upload.tcl # POST + x-www-form-urlencoded, but let curl do the encoding $ curl -X POST -d "p1=aœb" -d $(echo -e 'p2=a\xE6b') localhost:8100/upload.tcl # POST + multipart/form-data, let curl do the encoding $ curl -X POST -F "p1=aœb" -F $(echo -e 'p2=a\xE6b') localhost:8100/upload.tcl POST request with already encoded form data (x-www-form-urlencoded) $ curl -X GET "localhost:8100/upload.tcl?p1=a%C5%93Cb&p2=a%E6b" On 28.04.22 17:45, David Osborne wrote: > Hi Gustaf, > > We've been testing *4.99.24 rc1* and it seems pretty solid so far. > Thanks for all the work that went into it. > > One change of behaviour that is causing us issues is the handling of > invalid UTF8 characters. > > We have a system which regularly POSTs data to NaviServer - sometimes > (for reasons we're looking into) the POST'ed data received by > NaviServer can contain urlencoded characters which don't exist in UTF8 > ( for example *%9C* instead of *%C5%93*). > > In previous versions of NaviServer, this causes an invalid character > to be embedded in the data when we save it. > > Now, in version 4.99.24 we, rightly, get the warning "*Warning: > decoded string is invalid UTF-8:*". > But the additional behaviour is that the entire form variable seems to > be dropped. > > I just wanted to query if that is the intended behaviour? > > I've seen some servers convert such invalid characters to *\ufffd* > (\ufffd being "replacement character" - "used to replace an incoming > character whose value is unknown or unrepresentable in Unicode") - but > not sure which is the correct behaviour. > > Regards, > Dave > > > > > > _______________________________________________ > naviserver-devel mailing list > nav...@li... > https://lists.sourceforge.net/lists/listinfo/naviserver-devel -- Univ.Prof. Dr. Gustaf Neumann Head of the Institute of Information Systems and New Media of Vienna University of Economics and Business Program Director of MSc "Information Systems" |
From: David O. <da...@qc...> - 2022-04-28 15:46:13
|
Hi Gustaf, We've been testing *4.99.24 rc1* and it seems pretty solid so far. Thanks for all the work that went into it. One change of behaviour that is causing us issues is the handling of invalid UTF8 characters. We have a system which regularly POSTs data to NaviServer - sometimes (for reasons we're looking into) the POST'ed data received by NaviServer can contain urlencoded characters which don't exist in UTF8 ( for example *%9C* instead of *%C5%93*). In previous versions of NaviServer, this causes an invalid character to be embedded in the data when we save it. Now, in version 4.99.24 we, rightly, get the warning "*Warning: decoded string is invalid UTF-8:*". But the additional behaviour is that the entire form variable seems to be dropped. I just wanted to query if that is the intended behaviour? I've seen some servers convert such invalid characters to *\ufffd* (\ufffd being "replacement character" - "used to replace an incoming character whose value is unknown or unrepresentable in Unicode") - but not sure which is the correct behaviour. Regards, Dave |
From: David O. <da...@qc...> - 2022-04-26 11:12:18
|
Thanks Gustaf - we've successfully built 4.99.24 rc1 and are in the process of testing it. Much appreciated! On Sun, 10 Apr 2022 at 13:04, Gustaf Neumann <ne...@wu...> wrote: > There are now two changes committed to bitbucket: > > a) Provide an error message when the configured locale is not installed > on the host (misconfiguration) > > This change causes NaviServer to abort, when the configured locale is > not installed on the host. Typically, this locale is e.g. used by > ns_strcoll for determining the default collating order. The > configuration file for the regression testing sets the environment > variable LANG to "en_US.UTF-8". This means that for running the stock > regression test, this locale must be installed on the OS level. > > b) Silence warning with recent versions of gcc when certain values of > _FORTIFY_SOURCE/-Wstringop-overflow are set > > Newer versions of gcc support warning of dangerous operations (such as > e.g. strncat) when these depend on not easy traceable sources. In the > fixed case, the warning was: > > warning: ‘__builtin_strncat’ specified bound depends on > the length of the source argument > > With FORTIFY_SOURCE whenever possible, GCC tries to use buffer-length > aware replacements for functions, which was not possible in the case in > question. The documentation says that with _FORTIFY_SOURCE set to > 2, some more checking is added, but some conforming programs might fail. > > The case for (b) was a false positive, but it is still better to silence > these rather than ignoring it. > > all the best > > -g > > > > |