#24 htpps/ssl-requests not working over proxy

open
nobody
None
5
2017-05-17
2012-06-22
Anonymous
No

When trying to crawl a https/ssl-url over a defined proxy (setProxy), the proxy-server always responds "501 not implented".
This is due to the fact, that "CONNECT"-requests (instead og "GET"-requests) are not implemented yet.

Discussion

  • Comment has been marked as spam. 
    Undo

    You can see all pending comments posted by this user  here

    Anonymous - 2015-05-05

    I write some code in file libs/PHPCrawlerHTTPRequset.class.php that worked for me
    I created function that building http header for ssl handshake

    protected function buildProxySSLRequestHeader() {
    // Create header
    $headerlines = array();
    // HTTP protocol
    if ($this->http_protocol_version == PHPCrawlerHTTPProtocols::HTTP_1_1) $http_protocol_verison = "1.1";
    else $http_protocol_verison = "1.0";
    
    $headerlines[] = "CONNECT {$this->url_parts["host"]}:443 HTTP/{$http_protocol_verison}\r\n";
    $headerlines[] = "Host: {$this->proxy['proxy_host']}\r\n";
    $headerlines[] = "User-Agent: ".str_replace("\n", "", $this->userAgentString)."\r\n";
    $headerlines[] = "Proxy-Connection: Keep-Alive\r\n";
    $headerlines[] = "Connection: Keep-Alive\r\n";
    $headerlines[] = "\r\n";
    
    return $headerlines;
    }
    

    and in function sendRequest() after this lines

    // If error occured
    if ($PageInfo->error_code != null)
    {
      // If proxy-error -> throw exception
      if ($PageInfo->error_code == PHPCrawlerRequestErrors::ERROR_PROXY_UNREACHABLE)
      {
        throw new Exception("Unable to connect to proxy '".$this->proxy["proxy_host"]."' on port '".$this->proxy["proxy_port"]."'");
      }
    
      $PageInfo->error_occured = true;
      return $PageInfo; 
    }
    

    paste this code

        // SSL handshake on proxy
    if ($PageInfo->protocol == "https://" && $this->proxy != null) {
        $this->sendRequestHeader ($this->buildProxySSLRequestHeader ());
    
        $sslResponseHeader = new PHPCrawlerResponseHeader(
                $this->readResponseHeader($PageInfo->error_code, $PageInfo->error_string),
                $this->UrlDescriptor->url_rebuild
                );
    
        if ($sslResponseHeader->http_status_code != 200) {
            @fclose($this->socket);
            throw new Exception("Proxy SSL connection returned not 200 response");
        }
    
        $modes = array(
            STREAM_CRYPTO_METHOD_TLS_CLIENT,
            STREAM_CRYPTO_METHOD_SSLv3_CLIENT,
            STREAM_CRYPTO_METHOD_SSLv23_CLIENT,
            STREAM_CRYPTO_METHOD_SSLv2_CLIENT
        );
    
        $success = false;
        foreach($modes as $mode) {
            $success = stream_socket_enable_crypto($this->socket, true, $mode);
            if ($success) break;
        }
        if (!$success) {
            @fclose($this->socket);
            throw new Exception("Cannot secure connection");
        }
    }
    
     
  • Comment has been marked as spam. 
    Undo

    You can see all pending comments posted by this user  here

    Anonymous - 2015-05-05

    I forgot... also need to change one line in function buildRequestHeader() under comment "A Proxy needs the full qualified URL in the GET or POST headerline."

        if ($this->proxy != null)
        {
          // A Proxy needs the full qualified URL in the GET or POST headerline.
            $headerlines[] = $request_type." ".$this->url_parts["path"].$this->url_parts["file"].$this->url_parts["query"]." HTTP/{$http_protocol_verison}\r\n";
        }
        else
        {
          $query = $this->prepareHTTPRequestQuery($this->url_parts["path"].$this->url_parts["file"].$this->url_parts["query"]);
          $headerlines[] = $request_type." ".$query." HTTP/".$http_protocol_verison."\r\n";
        }
    
     
  • Comment has been marked as spam. 
    Undo

    You can see all pending comments posted by this user  here

    Anonymous - 2015-05-06

    Hi!

    What should i say, THANKS a lot for providing your code, great!

    Will get implemented in the next release.

    Thnks and cheers!

     
  • Vinay

    Vinay - 2016-05-22

    I had issues getting crawling SSL enabled websites through a proxy. I implemented the code above and it works wonderfully well for most SSL enabled websites. However, for some websites, I get the following error

    PHP Warning: stream_socket_enable_crypto(): SSL/TLS already set-up for this stream in /home/vinay/phpcrawl/libs/PHPCrawlerHTTPRequest.class.php on line 410
    PHP Warning: stream_socket_enable_crypto(): SSL/TLS already set-up for this stream in /home/vinay/phpcrawl/libs/PHPCrawlerHTTPRequest.class.php on line 410
    PHP Warning: stream_socket_enable_crypto(): SSL/TLS already set-up for this stream in /home/vinay/phpcrawl/libs/PHPCrawlerHTTPRequest.class.php on line 410
    Cannot secure connection

    The website that this can error can be replicated on is https://moz.com/

    Any ideas on what I can do to fix it?

     
  • Comment has been marked as spam. 
    Undo

    You can see all pending comments posted by this user  here

    Anonymous - 2017-05-17

    Same Problem as Vinay. Has anyone a solution?

     


Anonymous

Cancel  Add attachments





Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

JavaScript is required for this form.





No, thanks