Would be great to store crawled links in a db, so threads can check if the url was visited already in the current crawl. It would also be easier to resume halted crawls.
Hi!
I don't understand this request, please could you explain it a liitle more detailed?
phpcrawl ALWAYS checks if an URL was already visited before, regardless of the caching-type and the number of used threads/processes.
And there already is the possibility to use a SQLite-database as linkcache, see here: http://phpcrawl.cuab.de/classreferences/PHPCrawler/method_detail_tpl_method_setUrlCacheType.htm
Anonymous
You seem to have CSS turned off. Please don't fill out this field.
Hi!
I don't understand this request, please could you explain it a liitle more detailed?
phpcrawl ALWAYS checks if an URL was already visited before, regardless of the caching-type and the number of used threads/processes.
And there already is the possibility to use a SQLite-database as linkcache,
see here:
http://phpcrawl.cuab.de/classreferences/PHPCrawler/method_detail_tpl_method_setUrlCacheType.htm