Share

Tcl

Tracker: Bugs

5 URl checking too strict when using multiple question marks - ID: 2891171
Last Update: Comment added ( dgp )

Trying the following requiest through IE works fine:

http://165.114.214.221/getParam.cgi?AIValue_00=?
reply: AIValue_00=28192

Doing the same through TCL:
% package require http
2.7
% set url http://165.114.214.221
http://165.114.214.221
% set reply [::http::geturl ${url}/getParam.cgi?AIValue=?]
Unsupported URL: http://165.114.214.221/getParam.cgi?AIValue=?
%

Analisis: The following line in the URLmatcher variable:
( / [^\#?]* (?: \? [^\#?]* )?)? # <path> (including query)
Checks if the <path> part of the URL contains exactly
one question mark. According through RFC 3986 this
is how an URL should look like, but apparently there
are URL's that violate this. There is no reason to
restruct URL's this way, so I think the regular
expression should be relaxed.

Suggestion: replace the mentioned line with:
( / [^\#]*)? # <path> (including query)
should do the trick.

If no-one objects, I will check this change in, in
a few days.


Jan Nijtmans ( nijtmans ) - 2009-11-03 10:11

5

Closed

Fixed

Jan Nijtmans

29. http Package

development: 8.6b1.1

Public


Comments ( 13 )

Date: 2009-11-11 16:15
Sender: dgpProject Admin


backport to http 2.7.5 (distributed with Tcl 8.5.8)
completed.


Date: 2009-11-11 16:00
Sender: dgpProject Admin


sorry about the bad advice. I had forgotten
that the http packages on the 8-5 and 8-6
branches had already diverged.



Date: 2009-11-11 12:55
Sender: nijtmansProject Admin

Yes, I am planning to backport this, but I cannot do it until tonight. Don,
feel free to do it, if you are waiting for it in order to complete the
8.5.8 release.....


Date: 2009-11-11 12:47
Sender: dkfProject Admin

I see that you committed a fix for this. Are you going to backport?


Date: 2009-11-10 20:53
Sender: matzek

I hereby withdraw my objection :-) ... I read through the RFCs and it seems
I indeed had the wrong impression that there is a strict definition of the
last part of the URL notation.

-- Matthias Kraft


Date: 2009-11-10 16:03
Sender: patthoyts

When the strict uri checking code went in (during 8.5 development iirc) it
got a -strict option added to disable it. So http::geturl $uri -strict 0
will probably have let you work with this as it stands.
However, I would support changing this to match the URI rfc as you've
quoted below. We are evidently being too strict.
No objection.


Date: 2009-11-10 15:58
Sender: dgpProject Admin


nijtmans, please commit your patch
to go in http 2.7.4 distributed with
Tcl 8.5.8 and 8.6b2. Thanks.


Date: 2009-11-03 12:14
Sender: nijtmansProject Admin

So, I am glad I brought this up. So many reactions. Thanks all!

Answerings: Yes Firefox and other browsers behave the same
way. No browser does automatic URL-encoding. They all
pas through the exact URL given.

No, I cannot get the server fixed. RFC 1738 is
indeed rather misleading in the interpretation
of what is allowed in the <query> part of an URL.

Lucky enough, RFC 3986 has a much clearer
description, which convinced me that multiple
question marks are allowed in a quiery. Citing:
this RFC:

===================================
3.4. Query

The query component contains non-hierarchical data that, along with
data in the path component (Section 3.3), serves to identify a
resource within the scope of the URI's scheme and naming authority
(if any). The query component is indicated by the first question
mark ("?") character and terminated by a number sign ("#") character
or by the end of the URI.

query = *( pchar / "/" / "?" )

The characters slash ("/") and question mark ("?") may represent data
within the query component. Beware that some older, erroneous
implementations may not handle such data correctly when it is used as
the base URI for relative references (Section 5.1), apparently
because they fail to distinguish query data from path data when
looking for hierarchical separators. However, as query components
are often used to carry identifying information in the form of
"key=value" pairs and one frequently used value is a reference to
another URI, it is sometimes better for usability to avoid percent-
encoding those characters.



Date: 2009-11-03 12:01
Sender: matzek

I am also against such a move. If this is really needed and going to be
implemented, please ensure it either has to be explicitly switched on, or
can at least be switched off...

-- Matthias Kraft


Date: 2009-11-03 11:58
Sender: coldstore

Can't you get the server fixed?


Date: 2009-11-03 11:47
Sender: dkfProject Admin

Does it work in any browser other than IE? If not, it's just a broken site
and we shouldn't introduce breakage into the http package just to allow for
this one stupidity.

As evidence, I cite RFC 1738 which states that "?" is one of the reserved
characters in the <searchpart> (the part of the query after the first "?"
character).


Date: 2009-11-03 11:02
Sender: nijtmansProject Admin

>A '?' in the request parameter should be encoded as '%3f'.

Yes, that was my first though too. But in this case, the question
mark is not used as a value, it is a separator, so encoding
it as %3f does not work with this service.


Date: 2009-11-03 10:47
Sender: dkfProject Admin

A '?' in the request parameter should be encoded as '%3f'.


Attached File

No Files Currently Attached

Changes ( 6 )

Field Old Value Date By
status_id Open 2009-11-11 16:15 dgp
allow_comments 1 2009-11-11 16:15 dgp
close_date - 2009-11-11 16:15 dgp
resolution_id None 2009-11-11 12:47 dkf
priority 9 2009-11-10 16:03 patthoyts
priority 5 2009-11-10 15:58 dgp