I've encoutered a problem with the HTTP request fetcher when using SSL sockets.
The site that led me to this problem is:
https://www.alexianbrothershealth.org/
The problem is that the site does accept the connection but is not responding at all. The timeout does not seem to work in the SSL socket case under these conditions, at least not in PHP 5.4.4.
Because I had another issue in the past (#56) with weird quirks of PHP's socket implementation (probably a related issue), I decided to port the PHPCrawlHTTPRequest module to use the PHP cURL extension instead.
I'm not sure if you or anyone else is interested, but it is a drop-in replacement for the existing PHPCrawl/libs/PHPCrawlerHTTPRequest.class.php
It does add one additional error code in PHPCrawl/libs/Enums/PHPCrawlerRequestErrors.class.php:
/** * Error-Code: CURL not supported (probably curl extension not installed) */ const ERROR_CURL_NOT_SUPPORTED = 7;
In the attachment you'll find my version, based on the PHPCrawlerHTTPRequest in PHPCrawl 0.82.
Don't get me wrong, I'm very impressed by the PHPCrawl software and the implementation of the HTTP Protocol using sockets in PHP, but I now needed something that works fast and reliably without having to hack the PHP source code for working around another quirk.
In my initial tests, the implementation works fine but there's bound to be some bugs left after such a big change in the code, so if I run into anything in the next couple of days I'll post an update here if you're interested. Feel free to do whatever you want with the code I upload.
Anonymous
Hi MadEgg!
Thank you very much (again) for your report together with a solution!
The idea of changing (optionally) the underlying "module" for HTTP-requests (like to curl) is very nice, maybe there are other modules/ways that could come in handy for some people.
Meanwhile i'll try to find a fix for the not working timeout on (some) ssl-requests, have to make some tests though (php-version, OS and stuff).
Again, thanks! Your reports a probably the best because you always come in together with a working solution, great!
Best regards and a nice weekend,
Uwe.
View and moderate all "bugs Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Bugs"
Just a note:
Confirmed, the read/stream-timeout doesn't have any effect on ssl-connections (socket_set_timeout).
This seems to be a bug that re-appeared in php 5.4/5.5 after it was fixed in version 4.4.
As far as i can see there is no possibility to create a workaround without changing to cURL (or similar).
I didn't realize that this indeed is a re-introduced bug: I hadn't noticed it in a while but may very well be after an upgrade to PHP 5.4 recently that this bug occured again.
Additional info about my cURL-adapter: it does try to provide the same information as good as possible (headers, timing info, data transfer size), and tries to mimic the behavior as good as possible regarding obeying data transfer limit and the timeouts but the results may be slightly different than what the socket version would accomplish. Also, it doesn't use the internal IP-address resolver anymore as this would result in problems using hostnames; cURL internally maintains a IP-cache so this also seems somewhat superfluous when using cURL.
I've been running the cURL-adapter on high volume for the past 2 weeks and haven't encountered any problems with it so far, so while there's probably still bugs left, it should be reasonably safe to use.
View and moderate all "bugs Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Bugs"
Made some fixes to the class that are needed when there are HTTP redirects (some class variables needed to be re-initialized).