Anonymous - 2015-12-06

I'm trying to filter out urls with get parameters (?) so that they are not crawled more than once - for example after
http://www.hrblock.com/tax-answers/services/jsp/article.jsp?article_id=66943
I dont want to crawl any domain that starts with
http://www.hrblock.com/tax-answers/services/jsp/article.jsp?

right now I'm adding a filter rule when handling the pages like so :

function handleDocumentInfo(PHPCrawlerDocumentInfo $DocInfo) {
    if(strpos($DocInfo->url,"?")){
        $noParams = preg_quote(substr($DocInfo->url,0,strpos($DocInfo->url,"?")),"#");
        echo $noParams . $lb;
        if(! $this->addURLFilterRule("#^$noParams\?.*# i")){
            echo "not added" . $lb;             
        }           
    } 
 }

And while the lines are beign added - the crawler still crawls the same pattern - I suspect its pages that have been collected before I added the filter.
Is there any workaround for this?
Thanks!