|
From: Neal R. <ne...@ri...> - 2003-02-14 22:51:47
|
On Fri, 14 Feb 2003, Geoff Hutchison wrote: > On Fri, 14 Feb 2003, Neal Richter wrote: > > > What if we had a feature that stripped the querystrs from a URL > > contained in "bad_querystr" rather than rejecting them? > > url_rewrite_rules ? Ouch. That was a big frying pan that went all the way through my dunce cap. I guess I need to break down and read all 199 config vars. ;-) I'm currently working on an extension to libhtdig that will allow validation/testing of a given URL against these config vars: limit_urls_to limit_normalized exclude_urls max_hop_count restrict I'll add code to send the test-URL through the rewriter. It basically duplicates Retriever::IsURLValid without having to instantiate a Retriever object and also tests the 'restrict' var. It will allow the building of a cgi or PHP page for light configuration of the spidering component of HtDig and test urls interactively. This was inspired by the a commercial search engine filter page. http://ai.rightnow.com/htdig/testurl_snapshot.png If I am missing any config vars tell me! Thanks. Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |