From: Darron F. <da...@fr...> - 2002-03-03 02:44:47
|
> I don't mind supporting curl, if you think it's better. It doesn't look like it takes much work > to support it, and if it's better, that's great. It seems so easy that I wish I didn't have to go > write a bunch of fsockopen code now to support non-curl people. It's been really easy to work with so far - and seems to just work - while fopen stuff has been somewhat problematic for me and Tim in the past. > BTW, we're going to want to provide link checking that sends HEAD requests, I assume curl can do > HEAD requests. If you're talking about just getting the headers and not receiving any body then yes. You can set the option: CURLOPT_NOBODY We're doing this in a couple places to get some header information about things. Take a look in apb_common.php in the apb_cache_page function. All the curl options are detailed here: http://www.php.net/manual/en/function.curl-setopt.php > When you re-check a cached page, do you check to see if the orinal sources have > ben modified? Or do you just grab them no matter what? Which is the right way to do it? Right now the recaching code just: 1. Removes the old cached files. 2. Grabs the current page and recaches. We thought about doing an md5 on the text returned and store that as a checksum - then we could check against that to see if the page had changed but decided against it - I can't remember why. Many pages would change with the changing of ads and a single character change would make the md5sums not mach. It would be almost impossible to detect without almost always saying: "It's a changed page." We could probably add a quick check at the top of the recache.php page that: 1. Computes an md5 hash of the current text of the page. 2. Checks it against the page at the time of recaching. 3. If it's different, it goes on with the recaching process. 4. If it's not different, it doesn't recache and just ends there. We could use: http://www.php.net/manual/en/function.md5.php I tried this just now and am seeming to get inconsistent results - maybe it's my code. Here's the code I added to recache.php: print "<p>Please wait a moment...</p>\n"; // New stuff starts here. // Check to see if it's different. $query = "SELECT * from apb_cache WHERE cache_id = '$cache_id'"; $result = mysql_db_query($APB_SETTINGS['apb_database'], $query); $row = mysql_fetch_assoc($result); $cache_md5 = md5($row['cache_code']); $current_md5 = md5($current_page = getFile($row['cache_url'])); print "<b>Cache MD5:</b> $cache_md5<br>\n"; print "<p><b>Cached Page</b></p><pre>" . htmlentities($row['cache_code']) . "</pre>"; print "<b>Current MD5:</b> $current_md5<br>\n"; print "<p><b>Current Page</b></p><pre>" . htmlentities($current_page) . "</pre>"; if ($cache_md5 != $current_md5) { // Do all of the recaching here. print "<p>Looks like they're different - recaching.</p>"; } else { die("The pages are the same - no need to recache."); } // New stuff ends here. removeCacheData($cache_id); The md5's are different on even very simple pages that I know haven't changed between caching because I wrote the pages myself and because I did the recaching about 20 seconds after the first one. And the pages don't appear to be different at a glance. So, that idea seems to be out - maybe we can do something with strcmp or something. Got any other ideas? Is it even worth it? |