|
From: Gilles D. <gr...@sc...> - 2001-12-05 22:23:01
|
According to Joe R. Jah: > On Fri, 30 Nov 2001, Gilles Detillieux wrote: > > I don't think the difference between 99 and 104 seconds is significant. > > This confirms my suspicion that the HAVE_BROKEN_REGEX doesn't do a > > whole lot. To be sure, though, I think we'd need timings for 112501 + > > parsedate.0 + ssl.6, remove reference to regex.o in htlib/Makefile, #undef > > AND #define HAVE_BROKEN_REGEX (i.e. two tests) in include/htconfig.h > > (but don't remove htlib/regex.h). I suspect the timings for both will > > be like the 2nd test above, around 143 sec. > > ___________________ 112501 + parsedate.0 + ssl.6 ___________________ > remove reference to regex.o in htlib/Makefile > #define HAVE_BROKEN_REGEX in include/htconfig.h > > htdig: Start digging: Sat Dec 1 00:10:58 PST 2001 > htmerge: Start merging: Sat Dec 1 00:12:44 PST 2001 106 ... > ___________________ 112501 + parsedate.0 + ssl.6 ___________________ > remove reference to regex.o in htlib/Makefile > #undef HAVE_BROKEN_REGEX in include/htconfig.h > > htdig: Start digging: Sat Dec 1 00:18:55 PST 2001 > htmerge: Start merging: Sat Dec 1 00:20:38 PST 2001 103 ... OK, these are all around 100 sec, so I guess the main thing is to make sure the bundled htlib/regex.c isn't compiled and the resulting regex.o put into htlib/htlib.a. Removing the reference to regex.o in the Makefile seems to be the key. > > I suspect the difference between the 143 and the 99-104 sec is due > > to the inclusion of the bundled regex.h even though you're using > > the C library regex.o code. It's a wonder this works at all, but > > there does seem to be some impact on performance. > > I am not sure how that 143 came about last time; I can't reproduce it any > more;-/ Probably some other system activity, or less pages in the disk cache when you ran that test. Are you getting times closer to 100 sec now? This would stand to reason. However, to be on the safe side, I think the code should make sure it doesn't use the bundled regex.h if it doesn't use the bundled regex.c. If you mix and match them, there may be problems in some cases we haven't discovered yet. Geoff said he'd look into what other packages do for regex support. > > > ____________________ 092301 + Armstrong + ssl.4 ____________________ > > > htdig: Start digging: Fri Nov 30 00:18:06 PST 2001 > > > htmerge: Start merging: Fri Nov 30 00:18:44 PST 2001 38 seconds > > ... > > > > This is the part I find a bit troubling, but I don't know what we > > can do about it. I don't know why Armstrong's patch, which uses rx > > instead of regex, causes htdig to run 2-3 times faster, unless there > > are other changes between 092301 and 112501 that account for much of > > this, but it could well be just implementation efficiencies in one > > library and not in the other. > > I reported the difference in indexing time to the list the very first time > url_rewrite_rules was integrated in the code. I don't believe at that > time anything else had changed in the code. Right you are. The Sep 23 snapshot was just before I committed Geoff's changes for url_rewrite_rules using regex. Since then, very little has changed that should affect htdig performance. I was thinking back to when your Armstrong patch benchmarks were on a snapshot from early or mid-August, and before I had committed a number of parser changes. > > In your tests above, do you make use of url_rewrite_rules? If so, > > how do the timings change if you don't use it? > > ___________________ 112501 + parsedate.0 + ssl.6 ___________________ > remove reference to regex.o in htlib/Makefile > #define HAVE_BROKEN_REGEX in include/htconfig.h > no url_rewrite_rules > > htdig: Start digging: Sat Dec 1 00:40:09 PST 2001 > htmerge: Start merging: Sat Dec 1 00:40:34 PST 2001 25 seconds ... > ___________________ 112501 + parsedate.0 + ssl.6 ___________________ > remove reference to regex.o in htlib/Makefile > #undef HAVE_BROKEN_REGEX in include/htconfig.h > no url_rewrite_rules > > htdig: Start digging: Sat Dec 1 00:28:50 PST 2001 > htmerge: Start merging: Sat Dec 1 00:29:10 PST 2001 20 seconds ... OK, I don't think that 5 second difference can be treated as significant given the variations in timings we've seen for other tests. The only way to get more significant results would be to run each test several times and take the mean run time. It is good to know that the latest code doesn't bog down when you're not using url_rewrite_rules. That suggests we're not seeing the sort of wierdness we were seeing in your profiling of 3.2 several months ago, with the millions of unexplained calls to regcomp. -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 |