Menu

how to set proxy in phpcrawl?

Help
2009-07-11
2013-04-09
  • kamal farvandi

    kamal farvandi - 2009-07-11

    how to set proxy in phpcrawl??
    i want to connect through a proxy server that requires authentication

     
    • Uwe Hunfeld

      Uwe Hunfeld - 2009-07-19

      Hi,

      sorry, but proxy-support is currently not implemented in phpcrawl.

       
      • Radatz

        Radatz - 2009-07-28

        it works for me - perhaps also for you

        change the line #150 in phpcrawlerpagerequest.class.php in:

        $s = @fsockopen ("PROXYURL", "PROXYPORT", $e, $t, $this->socket_mean_timeout);

        with your values in place of the capitalized letters

         
  • Nobody/Anonymous

    Adding proxy support with authentication requires the following modifications to phpcrawlerpagerequest.class.php (version 0.71):

    OLD CODE (starting at line 148):

    ===
        // Open socket-connection
        if ($url_disallowed == false)
        {
          $s = @fsockopen ($host_str, $port, $e, $t, $this->socket_mean_timeout);
        }
        else
        {
          return false; // Return false if the URL was completely ignored
        }
       
        if ($s==false) // Connection-error
        {
          $error_string = $t;
          $error_code = $e;
         
          if ($t=="" && $e=="")
          {
            $error_code = 0;
            $error_string = "Couldn't connect to server";
          }
        }
        else
        {
          $header_found = false; // will get true if the header of the page was extracted
         
          // Build header to send
          $headerlines_to_send = "GET ".$path.$file.$query." HTTP/1.0\r\n";
          $headerlines_to_send = "HOST: ".$host."\r\n";
         
          // Referer
          if ($referer_url!="")
          {
            $headerlines_to_send = "Referer: $referer_url\r\n";
          }
         
          // Cookies
          if ($this->handle_cookies == true)
          {
            $cookie_string = PHPCrawlerUtils::buildHeaderCookieString ($this->cookies, $host);
          }

          if (isset($cookie_string))
          {
            $headerlines_to_send = "Cookie: ".$cookie_string."\r\n";
          }
         
          // Authentication
          if (count($authentication) > 0)
          {
            $auth_string = base64_encode($authentication.":".$authentication);
            $headerlines_to_send = "Authorization: Basic ".$auth_string."\r\n";
          }
         
          // Rest of header
          $headerlines_to_send = "User-Agent: ".str_replace("\n", "", $this->user_agent_string)."\r\n";
          $headerlines_to_send = "Connection: close\r\n";
          $headerlines_to_send = "\r\n";
    ===

    NEW CODE:

        // Open socket-connection
        if ($url_disallowed == false)
        {
          $s = @fsockopen ($PROXY_URL_OR_IP, $PROXY_PORT, $e, $t, $this->socket_mean_timeout);
        }
        else
        {
          return false; // Return false if the URL was completely ignored
        }
       
        if ($s==false) // Connection-error
        {
          $error_string = $t;
          $error_code = $e;
         
          if ($t=="" && $e=="")
          {
            $error_code = 0;
            $error_string = "Couldn't connect to server";
          }
        }
        else
        {
          $header_found = false; // will get true if the header of the page was extracted
         
          // Build header to send
          #$headerlines_to_send = "GET ".$path.$file.$query." HTTP/1.0\r\n";
          $headerlines_to_send = "GET $url_to_crawl HTTP/1.0\r\n";
          $headerlines_to_send = "HOST: ".$host."\r\n";
         
          // Referer
          if ($referer_url!="")
          {
            $headerlines_to_send = "Referer: $referer_url\r\n";
          }
         
          // Cookies
          if ($this->handle_cookies == true)
          {
            $cookie_string = PHPCrawlerUtils::buildHeaderCookieString ($this->cookies, $host);
          }

          if (isset($cookie_string))
          {
            $headerlines_to_send = "Cookie: ".$cookie_string."\r\n";
          }
         
          // Authentication
          if (count($authentication) > 0)
          {
            $auth_string = base64_encode($authentication.":".$authentication);
            $headerlines_to_send = "Authorization: Basic ".$auth_string."\r\n";
          }
         
          // Rest of header
          $headerlines_to_send = "User-Agent: ".str_replace("\n", "", $this->user_agent_string)."\r\n";
          $headerlines_to_send = "Proxy-Authorization: Basic ".base64_encode("username:password") ."\r\n";
          $headerlines_to_send = "Connection: close\r\n";
          $headerlines_to_send = "\r\n";
    ===

    So you're changing three things:

    1. You're connecting to the proxy instead of directly to the site.
    2. You're changing the GET request to the full $url_to_crawl
    3. You're passing in the encoded username/password for proxy authorization.

     

Anonymous
Anonymous

Add attachments
Cancel





Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.