Persist URL cache into a database

Status: Beta

Brought to you by: huni

#26 Persist URL cache into a database

Status: open

Owner: nobody

Labels: None

Priority: 5

Updated: 2014-11-17

Created: 2014-05-25

Creator: Anonymous

Private: No

Would be great to store crawled links in a db, so threads can check if the url was visited already in the current crawl. It would also be easier to resume halted crawls.

Discussion

Uwe Hunfeld - 2014-11-17

Hi!

I don't understand this request, please could you explain it a liitle more detailed?

phpcrawl ALWAYS checks if an URL was already visited before, regardless of the caching-type and the number of used threads/processes.

And there already is the possibility to use a SQLite-database as linkcache,
see here:
http://phpcrawl.cuab.de/classreferences/PHPCrawler/method_detail_tpl_method_setUrlCacheType.htm

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Anonymous

Persist URL cache into a database

Searches

Help

#26 Persist URL cache into a database

Discussion