#22 HTTPS not working since PHP supports SNI in TLS

closed
nobody
None
5
2012-10-14
2012-06-09
ilexius
No

Since PHP 5.3.2 SNI is supported. This sends the host name to the server, before the header reaches the server (See http://en.wikipedia.org/wiki/Server_Name_Indication )
If the host name in SNI does not match the HTTP header, Apache answers with "400: Bad request" and logs "[error] Hostname xxx.xxx.xxx.xxx provided via SNI and hostname www.example.com provided via HTTP are different".
This is caused by the DNS cache used in PHPCrawl, as we do not pass the host name to fsockopen.
IMHO there are two solutions to fix this:
1. Disable DNS caching and use the hostname in every request (slow). I tested this, it works.
2. Set the host name in the request context:
$context = stream_context_create(array(
'ssl' => array('SNI_server_name' => 'www.example.com'),
));
$fp = stream_socket_client("tcp://xxx.xxx.xxx.xxx:443", $errno, $errstr, 30, STREAM_CLIENT_CONNECT, $context);
(UNTESTED)

And keep up the good work!

Cheers,
ilexius

Discussion

  • Uwe Hunfeld
    Uwe Hunfeld
    2012-06-11

    Hi ilexius,

    thanks alot for the report.
    Do you have an example-url where this SNI-error occurs (for a quick check?)?

    Thanks!

     
  • Hi, I cant post the address here. Can I send it via email? You can contact me at info AT ilexius DOT de.

     
  • Hi ilexius,

    Thanks a lot. This helped me a lot. I was breaking my head for almost a month with this issue. I was thinking of writing a crawler on my own. You saved me a lot of time.

    ------------

    To the developer:

    Thanks a lot for this great crawler.

    This bug is not because of your code., but because of a bug in PHP itself. Ref: https://bugs.php.net/bug.php?id=54511

    I fixed it by adding the fix (at PHPCrawlerHTTPRequest, function openSocket) recommendation given by ilexius. It works.

    And btw i have to test it under proxy yet.

    With Regards,
    Madharasan
    madharasan.com

     
  • Uwe Hunfeld
    Uwe Hunfeld
    2012-06-22

    Thanks a lot to both of you!

    The bug is confirmed and the "stream_context_create" fix posted by ilexius seems to work prefectly! I will implenet it in the next version.

    @madharasan: I think it's not really necessary to use "stream_socket_client" instead of "fsockopen" for proxy-requests since the proxyserver has to send the SNI-request to the target-webserver, not the crawler. So you may leave this section just untouched (i think so).

    And again, thanks alot for reportng this bug and the solution on top!

    Best regards!

     


Anonymous


Cancel   Add attachments