And while the lines are beign added - the crawler still crawls the same pattern - I suspect its pages that have been collected before I added the filter.
Is there any workaround for this?
Thanks!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
View and moderate all "Help" comments posted by this user
Mark all as spam, and block user from posting to "Forum"
I'm trying to filter out urls with get parameters (?) so that they are not crawled more than once - for example after
http://www.hrblock.com/tax-answers/services/jsp/article.jsp?article_id=66943
I dont want to crawl any domain that starts with
http://www.hrblock.com/tax-answers/services/jsp/article.jsp?
right now I'm adding a filter rule when handling the pages like so :
And while the lines are beign added - the crawler still crawls the same pattern - I suspect its pages that have been collected before I added the filter.
Is there any workaround for this?
Thanks!