I want to build a domain collection and i want to use this "perfect" script to collect urls from web. I want to filter urls (with regexp) before written in sqlite database.
Example i want to crawl with only one (or more but predefinied) country domains, other example, i want to crawl only urls what is not contain a special word or any character chain. (i know, i will get a inaccurate result, but is good for me)
Somebody could to help me please where i must put my few lines regexp code to filtering which urls will be crawlings?
Thank you
Roland
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi for all,
I want to build a domain collection and i want to use this "perfect" script to collect urls from web. I want to filter urls (with regexp) before written in sqlite database.
Example i want to crawl with only one (or more but predefinied) country domains, other example, i want to crawl only urls what is not contain a special word or any character chain. (i know, i will get a inaccurate result, but is good for me)
Somebody could to help me please where i must put my few lines regexp code to filtering which urls will be crawlings?
Thank you
Roland
View and moderate all "Help" comments posted by this user
Mark all as spam, and block user from posting to "Forum"
Huhu