From: Manuel L. <ml...@ac...> - 2004-11-04 02:56:29
|
Hello, I tried the general list but it seems nobody could help. Lets see if anybody can help here: I have been using htdig for years to crawl a site that now has over 10.000 pages. Since it may go through many changes in the pages I have been reindexing the whole site once on a daily basis. However this lazy indexing approach is taking too much resources. Therefore I am looking into a better approach of keeping a list of only the pages that have changed and just reindex those pages in much shorter cycle than what I am doing. My question is how can I reindex just a few pages at once and merge the crawled pages with a previously indexed site database? I mean, index only a few pages that I list and only follow links to site pages that were not yet indexed. -- Regards, Manuel Lemos PHP Classes - Free ready to use OOP components written in PHP http://www.phpclasses.org/ PHP Reviews - Reviews of PHP books and other products http://www.phpclasses.org/reviews/ Metastorage - Data object relational mapping layer generator http://www.meta-language.net/metastorage.html |