#59 Port set in Host header

Gene Wood
Kai Schätzl

I was trying to fetch the URL "http://www.heise-security.co.uk/news/news-atom.xml" and found that I was always getting a 404, not only for this URL but the whole site. After some investigation it turns out that Snoopy sends a slightly wrong Host header:

Host: www.heise-security.co.uk:80

I didn't check the RFC if this is allowed or not, but I would never even try that. You go to the port and then you ask for the hostname, not for hostname:port. Even if it is allowed (and some servers seem to ignore it or I would have encountered the problem earlier) it's redundant since you cannot go to port 1000 and then ask for host domain:1111 for instance.

So, I think the correct bugfix is to completely remove

$headers .= ":".$this->port;
(around line 800, maybe it's in the other functions as well, I didn't check them yet)

While troubleshooting this I missed a method to display the request headers. I tried rawheaders, but that is always an empty array it seems. And I'm not sure if it should contain them, anyway, I just rushed thru the class since I wanted to find the problem with this site quickly.
Would be nice to add this.

And you forgot to update the Version: information in the copyright section of the class. I actually fetched 1.2.3 twice because I thought I had still 1.01 ;-)

Apart from that, many thanks for this nice and easy to use http client! I came about it when building an application that uses magpie to process RSS feeds.


  • hakre

    Logged In: YES
    Originator: NO

    I encountered a similar problem with an RSS feed. I checked RFC and it is allowed for http/1.1. In http/1.0 there is no host header at all (but don't remove it, many httpds need it!). My suggestion is to use the port only if it's not the default port (http=80) you can find more info in the bugreport here: http://trac.wordpress.org/ticket/3993

  • rob1n

    Logged In: YES
    Originator: NO

    I attached a fix to the WP Trac ticket (http://trac.wordpress.org/attachment/ticket/3993/3993.diff) that *should* apply to Snoopy, but you'll likely have to change filenames around.

    I would make a Snoopy-specific patch and upload it, but I can't for the life of me figure out how to add attachments to an SF ticket.

  • NukeHavoc

    Logged In: YES
    Originator: NO

    I've also encounter this problem via RSS: Snoopy snags and RSS feed via Magpie, and adds a :80 to it. In most cases, this may not be a problem, but in the world of RSS redirects, it appears a standard URI and one that specifies port 80 are not equivalent.

    Case in point -- The Wall Street Journal's RSS feeds (as of this writing) redirect to another RSS feed when you try to subscribe them. So this feed for "What's News"


    ends up redirecting to this feed:


    However, if you add a :80 to the original url (resulting in http://online.wsj.com:80/xml/rss/3_7011.xml\) you end up getting redirected to the WSJ journal home page...


    ...which causes Magpie to return a 302 redirection error because Snoopy's not able to follow the redirect to its true destination. This ultimately causes web apps that rely on Magpie to parse RSS (like Moodle, http://www.moodle.org, which uses Magpie and Snoopy for its RSS block) to fail on what appears to be a perfectly OK RSS feed. Feed2JS (http://www.feed2js.org) would have the same problem, except its using a slightly older version of Snoopy that doesn't have this Port 80 logic included.

    The fix referenced by Hakre did the trick for me, but it would be nice to see this work its way into Snoopy proper.

  • Gene Wood
    Gene Wood

    • status: open --> closed-fixed
    • assigned_to: Gene Wood
    • Group: -->