Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.

Close

#704 Can't retrieve info for androidforums.com

closed
James McCoy
5
2013-08-24
2011-01-31
Stephen Nichols
No

My friendly supybot in my local IRC channel seems unable to retrieve information for androidforums or any pages on it. I've fed it URLs from androidforums.com as well as www.androidforums.com, tried appending index.html to no avail. I then configured FireFox to spoof its User-Agent (and tested on my own blog to verify) but when I go to androidforums.com all looks normal. When typing any androidforums.com URL in the channel supybot remains silent. This is the User-Agent I found when I tested it: "Mozilla/5.0 (Compatible; Supybot 0.83.4.1)"

Discussion

  • nanotube
    nanotube
    2011-01-31

    Hi
    are you talking about the page titles?
    i just tried on my bot and got the page title for androidforums.com without any problems.
    can you try to see if it works with other pages, etc? or maybe you're talking about other information than page titles?

     
  • Page titles is what I am after. Other pages seem fine. See our channel log for details: http://logs.ubuntu-eu.org/freenode/2011/01/31/%23ubuntu-us-pa.html

    About a page down I paste a link to androidforums.com and then start running tests to see why our bot is being moody.

     
  • nanotube
    nanotube
    2011-01-31

    try getting title manually with command 'web title androidforums.com'

    also, try checking your supybot.plugins.Web.nonSnarfingRegexp config, maybe someone stuck something in there.

     
  • James McCoy
    James McCoy
    2013-08-24

    • status: open --> closed
    • assigned_to: James McCoy
     
  • James McCoy
    James McCoy
    2013-08-24

    It's because their pages have a ton of extra crap in the <head> section before <title>. Supybot doesn't download the whole page, just however many bytes are configured in supybot.protocols.http.peekSize and that's not enough to actually find the <title> tag in their web page.