From: Gustaf N. <ne...@wu...> - 2017-03-24 17:04:12
|
Dear all I've committed some code concerning the discussed urlencode reform. The code is now conforming to RFC3986 (2005), and recommendation of HTML 4.01 (for www-url-encoded). It spits out warnings when characters are included in the "location" header fields misses encodings (in the obvious cases). Furthermore, it contains a new flag "-uppercase" for OAuth ... and is able to decode upper case hex codes as well. The new code is already running now on OpenACS.org to see if i missed something obvious. I've not yet updated the documentation and the regression test. ... and - not to forget - i am planning to raise exceptions, when invalid hex codes are trying to be decoded. -gn PS: Below the more detailed commit messages: - The new code conforms to RFC3986 (2005), which has a more precise and differently structured definition (e.g. no 'unwise' characters) of characters encoded in URLs. The coding of the query part is actually defined in the HTML 4.01 definitions. - Coding of space (" ") in the path and query part is still different as in older versions of NaviServer (spaces are coded as "*" in the query part and "%20" in the path segments. This distinction is not not necessary according to the RFCs. - The coding tables are documented in detail, containing design considerations. - A new flag "-uppercase" added for supporting encoding for OAuth (RFC 5849); note that the "path" segment encoding has to be used to avoid coding space as "+". - A warning is produced now, when an URL is passed to the location field (e.g. ns_returnredirect), which is not properly encoded. Only characters, which have to be always coded, are flagged. - One can obtain the previous encoding behavior when compiling with "RFC1738" defined. Am 23.03.17 um 09:00 schrieb Michael Aram: > Dear all, > > I agree with Wolfgang that this should generally be taken care of by > the app developer. However, as there is probably much legacy code out > there that passes the URL unencoded to ns_returnredirect (at least in > our OpenACS installations there are many such places), I would opt for > a solution where this can be parameterized via a boolean parameter of > the function (b). Ideally, the default of this parameter could be > changed via the NaviServer config file (setting the default behavior > to e.g. "encode" instead of the current "leave untouched"). If this is > considered to much effort, at least a warning would be very useful. > > As a side note: if the ns_urlencode code is touched, I would suggest > to add a parameter that allows for choosing the case of the encoded > characters. The background for this is that OAuth 1 strictly requires > the percent-encoded characters to be uppercase only, as these become > part of the signature base string (without this restriction, two > signatures could differ only because of different encoding > implementations at the endpoints). See > https://tools.ietf.org/html/rfc5849 chapter 3.6. > > All the best, > Michael > > > > On Thu, Mar 23, 2017 at 7:49 AM > <nav...@li... > <mailto:nav...@li...>> wrote: > > Send naviserver-devel mailing list submissions to > nav...@li... > <mailto:nav...@li...> > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.sourceforge.net/lists/listinfo/naviserver-devel > or, via email, send a message with subject or body 'help' to > nav...@li... > <mailto:nav...@li...> > > You can reach the person managing the list at > nav...@li... > <mailto:nav...@li...> > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of naviserver-devel digest..." > > > Today's Topics: > > 1. url-encoding and ns_returnredirect, RFC updates (Gustaf Neumann) > 2. Re: url-encoding and ns_returnredirect, RFC updates > (Wolfgang Winkler) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Wed, 22 Mar 2017 11:50:49 +0100 > From: Gustaf Neumann <ne...@wu... <mailto:ne...@wu...>> > Subject: [naviserver-devel] url-encoding and ns_returnredirect, RFC > updates > To: Navidevel <nav...@li... > <mailto:nav...@li...>> > Message-ID: <888...@wu... > <mailto:888...@wu...>> > Content-Type: text/plain; charset="utf-8" > > Dear all, > > as it looks, edge is more picky about the encoding of URLs in the > location: header field (see e.g. recent entry in the OpenACS issue > tracker [1]). RFC 7231 states [2] that > > Location = URI-reference > > but as well: > > Note: Some recipients attempt to recover from Location > fields that > are not valid URI references. This specification does not > mandate > or define such processing, but does allow it for the sake > of robustness. > > The BNF in [3] clear, that it has to be encoded (see snippet for path > segments) > > URI-reference = URI / relative-ref > relative-ref = relative-part [ "?" query ] [ "#" fragment ] > relative-part = "//" authority path-abempty > / path-absolute > / path-noscheme > / path-empty > > path-abempty = *( "/" segment ) > path-absolute = "/" [ segment-nz *( "/" segment ) ] > path-noscheme = segment-nz-nc *( "/" segment ) > path-rootless = segment-nz *( "/" segment ) > path-empty = 0<pchar> > > > segment = *pchar > segment-nz = 1*pchar > segment-nz-nc = 1*( unreserved / pct-encoded / sub-delims / > "@" ) > ; non-zero-length segment without any colon ":" > pchar = unreserved / pct-encoded / sub-delims / ":" > / "@" > > > Naviserver passes the URL as is from e.g. a ns_returnredirect to the > "Location:" field. > > So the question is, should ns > > a) take care about this encoding b) take care about this encoding via > optional flag c) do nothing and leave the responsibility to the > application programmer (current situation) d) provide a warning > when an > "obviously" unencoded url is passed to ns_returnredirect > > I think, (a) is not useful, since ns can't decide from the string, > whether a "/" in the part is e.g. a delimiter or part of the segment. > Furthermore, it would break existing programs that encode already the > urls correctly. (b) might be useful in simple cases. > > I am inclined towards (d), although an exact check for every char > which > should have been escaped might be to costly on some characters > (checking > if "%" was used just as an escape indicator, etc.); however, an > application developer can get hints via (d), where the > url-encoding was > probably lacking. > > While looking at the nsd/urlencode.c i saw that the encoding is more > conservative than commented (.... "All ASCII control characters (00-1f > and 7f) and the URI 'delim' and 'unwise' characters are encoded" > ...), > but it encodes as well the characters from 0x80 to 0xff. Do I > interprete > this correctly, that this refers to the differences/confusions between > RFC1738 (1994) and RFC1808 (1995) vs. RFC2396 (1998), see [5]. The > code > says, it conforms with RFC1738, so probably an update to at least > RFC2396 seems appropriate. > > Comments? > > -g > > [1] http://openacs.org/bugtracker/openacs/bug?bug_number=3312 [2] > https://tools.ietf.org/html/rfc7231#page-68 [3] > https://tools.ietf.org/html/rfc3986#appendix-A [4] > https://tools.ietf.org/html/rfc2396 [5] > https://tools.ietf.org/html/rfc2396#appendix-G.2 > > -------------- next part -------------- > An HTML attachment was scrubbed... > > ------------------------------ > > Message: 2 > Date: Thu, 23 Mar 2017 07:22:54 +0100 > From: Wolfgang Winkler <wol...@di... > <mailto:wol...@di...>> > Subject: Re: [naviserver-devel] url-encoding and ns_returnredirect, > RFC updates > To: nav...@li... > <mailto:nav...@li...> > Message-ID: > <129...@di... > <mailto:129...@di...>> > Content-Type: text/plain; charset="windows-1252" > > Hi! > > I opt for version c (programmers responsibility) as url encoding > can be > tricky stuff if you don't know the context of the passed in url. A > flag > or maybe an extra proc for checking an url for problems could be > useful, > but that's something that can be done easily with ns_urlencode and > ns_urldecode. > > Updating to RFC2396 would be most welcome. As stated in the RFC: > > This document > defines the generic syntax of URI, including both absolute and > relative forms, and guidelines for their use; it revises and > replaces > the generic definitions in RFC 1738 and RFC 1808. > > and > > This document updates and merges "Uniform Resource Locators" > [RFC1738] and "Relative Uniform Resource Locators" [RFC1808] > in order > to define a single, generic syntax for all URI. > > To me it seems, that the RFC1738 has been, at least in parts, > deprecated. > > regards, > > Wolfgang > > > Am 2017-03-22 um 11:50 schrieb Gustaf Neumann: > > > > Dear all, > > > > as it looks, edge is more picky about the encoding of URLs in the > > location: header field (see e.g. recent entry in the OpenACS issue > > tracker [1]). RFC 7231 states [2] that > > > > Location = URI-reference > > > > but as well: > > > > Note: Some recipients attempt to recover from Location > fields that > > are not valid URI references. This specification does > not mandate > > or define such processing, but does allow it for the sake > of robustness. > > > > The BNF in [3] clear, that it has to be encoded (see snippet for > path > > segments) > > > > URI-reference = URI / relative-ref > > relative-ref = relative-part [ "?" query ] [ "#" fragment ] > > relative-part = "//" authority path-abempty > > / path-absolute > > / path-noscheme > > / path-empty > > > > path-abempty = *( "/" segment ) > > path-absolute = "/" [ segment-nz *( "/" segment ) ] > > path-noscheme = segment-nz-nc *( "/" segment ) > > path-rootless = segment-nz *( "/" segment ) > > path-empty = 0<pchar> > > segment = *pchar > > segment-nz = 1*pchar > > segment-nz-nc = 1*( unreserved / pct-encoded / sub-delims > / "@" ) > > ; non-zero-length segment without any colon ":" > > pchar = unreserved / pct-encoded / sub-delims / > ":" / "@" > > > > Naviserver passes the URL as is from e.g. a ns_returnredirect to the > > "Location:" field. > > > > So the question is, should ns > > > > a) take care about this encoding b) take care about this > encoding via > > optional flag c) do nothing and leave the responsibility to the > > application programmer (current situation) d) provide a warning when > > an "obviously" unencoded url is passed to ns_returnredirect > > > > I think, (a) is not useful, since ns can't decide from the string, > > whether a "/" in the part is e.g. a delimiter or part of the > segment. > > Furthermore, it would break existing programs that encode > already the > > urls correctly. (b) might be useful in simple cases. > > > > I am inclined towards (d), although an exact check for every char > > which should have been escaped might be to costly on some characters > > (checking if "%" was used just as an escape indicator, etc.); > however, > > an application developer can get hints via (d), where the > url-encoding > > was probably lacking. > > > > While looking at the nsd/urlencode.c i saw that the encoding is more > > conservative than commented (.... "All ASCII control characters > (00-1f > > and 7f) and the URI 'delim' and 'unwise' characters are encoded" > > ...), but it encodes as well the characters from 0x80 to 0xff. Do I > > interprete this correctly, that this refers to the > > differences/confusions between RFC1738 (1994) and RFC1808 (1995) vs. > > RFC2396 (1998), see [5]. The code says, it conforms with RFC1738, so > > probably an update to at least RFC2396 seems appropriate. > > > > Comments? > > > > -g > > > > [1] http://openacs.org/bugtracker/openacs/bug?bug_number=3312 [2] > > https://tools.ietf.org/html/rfc7231#page-68 [3] > > https://tools.ietf.org/html/rfc3986#appendix-A [4] > > https://tools.ietf.org/html/rfc2396 [5] > > https://tools.ietf.org/html/rfc2396#appendix-G.2 > |