From: Joe R. J. <jj...@cl...> - 2004-04-21 23:16:42
|
On Wed, 21 Apr 2004, Lachlan Andrew wrote: > Date: Wed, 21 Apr 2004 23:13:27 +1000 > From: Lachlan Andrew <lh...@us...> > To: Gilles Detillieux <gr...@sc...>, Christopher Murtagh <chr...@mc...> > Cc: htd...@li... > Subject: [htdig-dev] Re: Performance issue with exclude_urls > > Greetings Gilles + all, > > Yes, I agree that we need a more "polished" patch for the > distribution. I still like my intermediate path: If *any* server > blocks or URL blocks are used, then the user takes the performance > hit and re-parses each time. If *no* server/URL blocks are used, we > use Chris's patch. This should be just as fast as Chris's patch (in > the "3.1-compatibly mode" without server/URL blocks), and just as > flexible as the current status (if blocks are used). If that can get > ht://Dig fast enough to get into sarge, then I suggest we implement > it first, and then work on Gilles's more complete solution at more > leisure. I applied Chris' patch and ran htdig on the same site as before for profile; htdig ran ~40% faster than last time;) Here is the profile: ftp://ftp.ccsf.org/htdig-patches/3.2.0b5/htdig.gmon.exclude_perform.gz > A first hack at this (not even compile-tested) is attached, patched > relative to Chris's patched version, so you can see what I mean. If > people are in favour, I'll try to work on it over the weekend. The "slightly-better.0" patch applies, but it does not compile: Retriever.cc: In method `int Retriever::IsValidURL(const String &)': Retriever.cc:998: `config_server_URL_blocks' undeclared (first use this function) Retriever.cc:998: (Each undeclared identifier is reported only once Retriever.cc:998: for each function it appears in.) gmake[1]: *** [Retriever.o] Error 1 Regards, Joe -- _/ _/_/_/ _/ ____________ __o _/ _/ _/ _/ ______________ _-\<,_ _/ _/ _/_/_/ _/ _/ ......(_)/ (_) _/_/ oe _/ _/. _/_/ ah jj...@cl... > One issue with caching input strings is that we would have to have > some sort of cache-flushing, or just let the storage grow as HtRegEx > is called repeatedly. > > Cheers, > Lachlan > > On Wed, 21 Apr 2004 07:45 am, Gilles Detillieux wrote: > > Hi, Chris and other developers. The problem with this fix is that > > exclude_urls and bad_querystr can no longer be used in server > > blocks or URL blocks, as they'll only be parsed once regardless of > > how they're used. |