From: Nathanial H. <eq...@ya...> - 2002-03-02 05:46:20
|
First, are you and tim on the apb-development mailing list (http://lists.sourceforge.net/lists/listinfo/apb-development)? I don't want to be bombarding you with doubles of each email I send, but I'd like this kind of stuff to be on the list. I got the new scheme and patches you posted setup at http://labs.retards.org/apb2 It doesn't work yet, since I don't have php compiled with curl support. I did take the first step and install the curl libraries though. I'd like to know why you used curl, and not normal php functions. I'm not crazy about having dependencies, and I'm less crazy about dependencies that require a recompile of PHP. I also read that curl doesn't like PHP earlier than 4.1.0. Another thing that makes me uneasy about using it. It seems like we can provide the functionality you've come up with, without using something that will make apb harder to install. Maybe I'm wrong, that's why I'm asking. I'm not sure why you choose curl. Anyway, a quick reply about that would be really useful. Thanks, Nathan http://retards.org/ __________________________________________________ Do You Yahoo!? Yahoo! Sports - sign up for Fantasy Baseball http://sports.yahoo.com |
From: Darron F. <da...@fr...> - 2002-03-02 17:58:07
|
Nathan, I'm pretty sure that Tim chose curl for a couple reasons: 1. We've been using it for a while to do other handy things with PHP and haven't had many problems with it. We've been using it without problems since about PHP 4.0.6 - 4.0.4pl1 and curl works OK but there are some small bugs. 2. I'm pretty sure that he tried using the fopen wrappers and couldn't get them to work properly for some reason. Regardless, it should be pretty easy to use the fopen wrappers if curl isn't available - all that needs to be changed is the getFile function. There are people who are talking about bundling curl with PHP on the php-dev list - so in the future, this may not be an issue: http://www.zend.com/lists/php-dev/200201/msg03022.html And because of PHP's large security bugs, everyone's *strongly* encouraged to upgrade to 4.1.2 ASAP anyways - which doesn't give them a working curl automatically, but it's a start. Your call - we could leave the current code in and if curl isn't available, it just uses the regular fopen wrappers to grab the files - that shouldn't be too hard. I've signed up to the development list - Tim may take a little while longer to sign up - last I heard from him, his wife was going into labor. ;-) ----- Original Message ----- From: "Nathanial Hendler" <eq...@ya...> To: "Darron Froese" <da...@fr...>; <apb...@li...> Cc: <st...@lb...>; <ti...@ja...> Sent: Friday, March 01, 2002 9:46 PM Subject: Re: Apb caching patches... > First, are you and tim on the apb-development mailing list > (http://lists.sourceforge.net/lists/listinfo/apb-deve lopment)? I don't want to be bombarding you > with doubles of each email I send, but I'd like this kind of stuff to be on the list. > > I got the new scheme and patches you posted setup at http://labs.retards.org/apb2 It doesn't work > yet, since I don't have php compiled with curl support. I did take the first step and install the > curl libraries though. I'd like to know why you used curl, and not normal php functions. I'm not > crazy about having dependencies, and I'm less crazy about dependencies that require a recompile of > PHP. I also read that curl doesn't like PHP earlier than 4.1.0. Another thing that makes me > uneasy about using it. It seems like we can provide the functionality you've come up with, > without using something that will make apb harder to install. Maybe I'm wrong, that's why I'm > asking. I'm not sure why you choose curl. > > Anyway, a quick reply about that would be really useful. > > Thanks, > Nathan > http://retards.org/ |
From: Darron F. <da...@fr...> - 2002-03-02 19:42:10
|
> Your call - we could leave the current code in and if curl isn't available, > it just uses the regular fopen wrappers to grab the files - that shouldn't > be too hard. I just did a quick test and if we do this - we can have our cake and eat it too: function getFile($url, $headers=0) { // If cURL exists - use these. if (function_exists(curl_init)){ $ch = curl_init(); curl_setopt ($ch, CURLOPT_URL, $url); curl_setopt ($ch, CURLOPT_HEADER, $headers); curl_setopt ($ch, CURLOPT_NOBODY, $headers); curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 1); curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt ($ch, CURLOPT_USERAGENT, 'Mozilla/4.0 (compatible; MSIE 5.12; Mac_PowerPC)'); curl_setopt ($ch, CURLOPT_REFERER, ''); $result = curl_exec ($ch); curl_close ($ch); } else { // If cURL doesn't exist, use the fopen wrappers. } That way we can use cURL which is: 1. More robust. 2. Faster. 3. Just plain better. And yet still provide for older installs without cURL. Thoughts? |
From: Nathanial H. <eq...@ya...> - 2002-03-02 23:44:14
|
--- Darron Froese <da...@fr...> wrote: > I just did a quick test and if we do this - we can have our cake and eat it > too: ... > > That way we can use cURL which is: > > 1. More robust. > 2. Faster. > 3. Just plain better. > > And yet still provide for older installs without cURL. Good, that's the direction I was thinking last night after I sent my email. I don't mind supporting curl, if you think it's better. It doesn't look like it takes much work to support it, and if it's better, that's great. It seems so easy that I wish I didn't have to go write a bunch of fsockopen code now to support non-curl people. BTW, we're going to want to provide link checking that sends HEAD requests, I assume curl can do HEAD requests. When you re-check a cached page, do you check to see if the orinal sources have ben modified? Or do you just grab them no matter what? Which is the right way to do it? I think your database changes are good, in fact I was very happy with them. Nathan http://retards.org/ __________________________________________________ Do You Yahoo!? Yahoo! Sports - sign up for Fantasy Baseball http://sports.yahoo.com |
From: Darron F. <da...@fr...> - 2002-03-03 02:44:47
|
> I don't mind supporting curl, if you think it's better. It doesn't look like it takes much work > to support it, and if it's better, that's great. It seems so easy that I wish I didn't have to go > write a bunch of fsockopen code now to support non-curl people. It's been really easy to work with so far - and seems to just work - while fopen stuff has been somewhat problematic for me and Tim in the past. > BTW, we're going to want to provide link checking that sends HEAD requests, I assume curl can do > HEAD requests. If you're talking about just getting the headers and not receiving any body then yes. You can set the option: CURLOPT_NOBODY We're doing this in a couple places to get some header information about things. Take a look in apb_common.php in the apb_cache_page function. All the curl options are detailed here: http://www.php.net/manual/en/function.curl-setopt.php > When you re-check a cached page, do you check to see if the orinal sources have > ben modified? Or do you just grab them no matter what? Which is the right way to do it? Right now the recaching code just: 1. Removes the old cached files. 2. Grabs the current page and recaches. We thought about doing an md5 on the text returned and store that as a checksum - then we could check against that to see if the page had changed but decided against it - I can't remember why. Many pages would change with the changing of ads and a single character change would make the md5sums not mach. It would be almost impossible to detect without almost always saying: "It's a changed page." We could probably add a quick check at the top of the recache.php page that: 1. Computes an md5 hash of the current text of the page. 2. Checks it against the page at the time of recaching. 3. If it's different, it goes on with the recaching process. 4. If it's not different, it doesn't recache and just ends there. We could use: http://www.php.net/manual/en/function.md5.php I tried this just now and am seeming to get inconsistent results - maybe it's my code. Here's the code I added to recache.php: print "<p>Please wait a moment...</p>\n"; // New stuff starts here. // Check to see if it's different. $query = "SELECT * from apb_cache WHERE cache_id = '$cache_id'"; $result = mysql_db_query($APB_SETTINGS['apb_database'], $query); $row = mysql_fetch_assoc($result); $cache_md5 = md5($row['cache_code']); $current_md5 = md5($current_page = getFile($row['cache_url'])); print "<b>Cache MD5:</b> $cache_md5<br>\n"; print "<p><b>Cached Page</b></p><pre>" . htmlentities($row['cache_code']) . "</pre>"; print "<b>Current MD5:</b> $current_md5<br>\n"; print "<p><b>Current Page</b></p><pre>" . htmlentities($current_page) . "</pre>"; if ($cache_md5 != $current_md5) { // Do all of the recaching here. print "<p>Looks like they're different - recaching.</p>"; } else { die("The pages are the same - no need to recache."); } // New stuff ends here. removeCacheData($cache_id); The md5's are different on even very simple pages that I know haven't changed between caching because I wrote the pages myself and because I did the recaching about 20 seconds after the first one. And the pages don't appear to be different at a glance. So, that idea seems to be out - maybe we can do something with strcmp or something. Got any other ideas? Is it even worth it? |