#22 HTTPS not working since PHP supports SNI in TLS


Since PHP 5.3.2 SNI is supported. This sends the host name to the server, before the header reaches the server (See http://en.wikipedia.org/wiki/Server_Name_Indication )
If the host name in SNI does not match the HTTP header, Apache answers with "400: Bad request" and logs "[error] Hostname xxx.xxx.xxx.xxx provided via SNI and hostname www.example.com provided via HTTP are different".
This is caused by the DNS cache used in PHPCrawl, as we do not pass the host name to fsockopen.
IMHO there are two solutions to fix this:
1. Disable DNS caching and use the hostname in every request (slow). I tested this, it works.
2. Set the host name in the request context:
$context = stream_context_create(array(
'ssl' => array('SNI_server_name' => 'www.example.com'),
$fp = stream_socket_client("tcp://xxx.xxx.xxx.xxx:443", $errno, $errstr, 30, STREAM_CLIENT_CONNECT, $context);

And keep up the good work!



  • Uwe Hunfeld
    Uwe Hunfeld

    Hi ilexius,

    thanks alot for the report.
    Do you have an example-url where this SNI-error occurs (for a quick check?)?


  • Hi, I cant post the address here. Can I send it via email? You can contact me at info AT ilexius DOT de.

  • Hi ilexius,

    Thanks a lot. This helped me a lot. I was breaking my head for almost a month with this issue. I was thinking of writing a crawler on my own. You saved me a lot of time.


    To the developer:

    Thanks a lot for this great crawler.

    This bug is not because of your code., but because of a bug in PHP itself. Ref: https://bugs.php.net/bug.php?id=54511

    I fixed it by adding the fix (at PHPCrawlerHTTPRequest, function openSocket) recommendation given by ilexius. It works.

    And btw i have to test it under proxy yet.

    With Regards,

  • Uwe Hunfeld
    Uwe Hunfeld

    Thanks a lot to both of you!

    The bug is confirmed and the "stream_context_create" fix posted by ilexius seems to work prefectly! I will implenet it in the next version.

    @madharasan: I think it's not really necessary to use "stream_socket_client" instead of "fsockopen" for proxy-requests since the proxyserver has to send the SNI-request to the target-webserver, not the crawler. So you may leave this section just untouched (i think so).

    And again, thanks alot for reportng this bug and the solution on top!

    Best regards!



Cancel   Add attachments