Menu

Host unreachable (Connection timed out) Problem

Help
Anonymous
2013-06-01
2013-06-04
  • Anonymous

    Anonymous - 2013-06-01

    I have successfully used PHPCrawl with multiple URLs, however when trying to use it with http://pastebin.com I get a Host unreachable error from $DocInfo->error_string:
    "Error connecting to http://pastebin.com: Host unreachable (Connection timed out)"

    Please let me know how I can get this to work and snippets of code if there is anything unusual.

    Thank you!

     

    Last edit: Anonymous 2013-11-18
  • Uwe Hunfeld

    Uwe Hunfeld - 2013-06-02

    Hi!

    Die you try to increase timeouts like this?

    $crawler->setStreamTimeout(5); // defaults to 2 seconds $crawler->setConnectionTimeout(10); // defaults to 5 seconds

    Also see the first QaA in the FAQ section:
    http://cuab.de/faq.html

    Hope this will help.

     
  • Anonymous

    Anonymous - 2013-06-02

    Yes, I increased timeouts to 30 seconds.

     
  • Anonymous

    Anonymous - 2013-06-02

    I even tried it at 300 seconds each. It still doesn't work.

     
  • Uwe Hunfeld

    Uwe Hunfeld - 2013-06-02

    Hmm, do you use a proxy?
    And are you able to retreive pages from pastebin.com with wget (or something else) from the server running your script?

    Maybe they blocked your IP or they blocked the UserAgent "phpcrawl".
    (try to change it with setUserAgentString()).

    I'll give it a try tomorrow and senn what happens from here.

     
  • Uwe Hunfeld

    Uwe Hunfeld - 2013-06-03

    Hi again,

    i did some tests with phpcrawl and http://pastebin.com and you are right, it doen't work.

    It's a little confusing, but it is like that:
    As soon as you try to access pastebin.com with phpcrawl, the server doesn't answer anymore and your IP gets blocked for about 10 minutes. Within this 10 minutes the server won't accept connections anymore from your IP, doen't matter what client you are using (browser, wget, ping etc.), you always get the "Connection timed out" error you mentioned.

    So thete's something the server doesn't like about the request-header phpcrawl sends.
    I don't know yet what it is, but it's not the UserAgent-String.

    I'll let you know if i found a fix.
    I'll open a bugreport for this.

     
  • Uwe Hunfeld

    Uwe Hunfeld - 2013-06-04

    THe problems should be fixed.

    See this bugreport https://sourceforge.net/p/phpcrawl/bugs/49/ and the
    attached and fixed PHPCrawlerHTTPRequest-class.

     

Anonymous
Anonymous

Add attachments
Cancel





Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.