#4433 http 2.7.2 fails to correctly parse input

obsolete: 8.6b1
open
5
2009-10-08
2009-10-08
C Jolly
No

The http event handler function does not handle web servers that put in blank lines ahead of the http response header. This is not to standard but I believe the fix I am suggesting does not harm other web servers that do return the status on the first line. I am connecting to a webserver that sends back the following to a GET request :
---------------------------
\n
\n
\n
HTTP/1.1 200 File Follows
\n
.....
-------
without the suggested change the current 2.7.2 version of http.tm files to parse the reply. The status is blank and the metadata holds the contents of the body. This totally screws up the retrieval of elements using the http api. The following change will not set the header until it gets a non blank line. The relevant portions are shown; all other code has been removed for the sake of brevity.

The http::Event proc needs to be modified from :

# current version 2.7.2
proc http::Event { socket token } {
....
if {$state(state) eq "connecting"} {
set state(state) "header"
if {[catch {gets $sock state(http)} n]} {
return [Finish $token $n]
}
}
...
}

to :

# suggested update
proc http::Event { soket token } {
....
if {$state(state) eq "connecting"} {
if {[catch {gets $sock state(http)} n]} {
return [Finish $token $n]
} else {
if { [string length $state(http) ] > 0 } {
set state(state) "header"
}
}
}

Discussion

  • Donal K. Fellows

    But that's just a plain-old broken server. At a guess the response should be interpreted as being a very old version of HTTP (I believe there's one that doesn't respond with headers) and if that's a problem, it's the server that ought to be taken out in the back yard and shot. It's for the best.

     
  • C Jolly

    C Jolly - 2009-10-09

    The server is on a Tivo Series 2 and so there are millions of instances of this. I should add that the TclCurl extension was used to access this server also and did not suffer this problem. The maxim here is to "send strictly, receive forgivingly.". If it allows the http package to access more web servers and has no ill effects then why not make the change.

     
  • Colin McCormack

    Colin McCormack - 2009-10-09

    The observed behaviour may not be due to the server prepending lines to the header in response, it may be due to improper encoding or decoding of preceding content, or by incorrect calculation of content length. This is to be suspected because the received text is reported to be \n and not \r\n (which is the normal form EOL for headers.)

    If the error is in content, not header, then 'fixing' the client by making it accept the extra chars as line noise will merely mask a different problem, which will likely recur in a harder-to-find form.

    One discriminator is: is the erroneous GET response the first request in a pipeline, or a subsequent request?

     
  • C Jolly

    C Jolly - 2009-10-11

    When I wrote this bug my intention was to show that there were blank lines in the input before the response header, I'm a unix guy so I naturally used \n but in reality it could have been \r\n. My only intention was to show the blank lines; a series of 4 blank lines before the header may be misinterpreted by anyone that looks as this bug as a formating to set the output off from the rest of the text. As the content-length value shows up after the response header line it has no effect on how the http::get_url function gets the response header. What I do know is if the reponse header is not correctly read in, many of the state array items are incorrect or missing. Whether on the first time or the 5th time, anytime this happens it is a bad thing. Without knowing for sure; I cannot believe that the encoding has anything to do with the response header as it arrives before the metadata that holds the encoding parameter. The encoding would only be applicable to the actual content. The error is not in the content but the content is not correct if the response header is blank an the content (i.e. state(body) shows up as a metadata item. If no one can think of a reason not to read until you get to a non-blank response header line then as a matter of robustness this code should be corrected as I've indicated.

     

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

JavaScript is required for this form.





No, thanks