From: Gilles D. <gr...@sc...> - 2001-11-21 19:31:01
|
According to Joe R. Jah: > Sorry it took such a long time to respond, but I have been very busy > lately. It is not easy to prove a negative; however, I have tried a few > times to make 3.1.6 miss indexing files in stable snapshots of my site > without success;) > > Here is a comparison of the latest 3.1.6 snapshot on a snapshot of my site > -- 163 HTML-only documents -- with 3.1.6-072901: > > _______3.1.6-072901 + Armstrong patch + ssl.4_______ > htdig: Start digging: Sun Nov 11 18:15:43 PST 2001 > htmerge: Start merging: Sun Nov 11 18:16:16 PST 2001 33 seconds > htmerge: Total word count: 13171 > htmerge: Total documents: 163 > htmerge: Total doc db size (in K): 1888 > -------------------------8<------------------------- > __________3.1.6-111101 + ssl.5 + FAQ#5.14___________ > htdig: Start digging: Sun Nov 11 18:19:19 PST 2001 > htmerge: Start merging: Sun Nov 11 18:20:58 PST 2001 99 seconds > htmerge: Total word count: 13171 > htmerge: Total documents: 163 > htmerge: Total doc db size (in K): 1888 > -------------------------8<------------------------- > CPU: 350 MHz Pentium > RAM: 384 Megs > OS: BSDi-4.2 > > They both index the exact number of documents; this is as conclusive a > result as I can produce. The only difference is the the time they take. > > Incidentally, ssl.4 fails to apply to the latest snapshot because of the > recent changes to Connection.cc. I have modified the patch to apply > cleanly to the latest snapshot of 3.1.6: > > ftp://ftp.ccsf.org/htdig-patches/3.1.6/ssl.5 Thanks for all your efforts, Joe. I can't exactly boast about my response times lately either. You're right that it's not easy to prove a negative, but I'm satisfied that what we were seeing before is most likely due to uncontrolled variables, rather than parser bugs. It's very strange that the latest 3.1.6 snapshot is 3 times slower, but that could be entirely due to regex being much less efficient than rx on BSD. Funny thing is the rx code that came with older versions of htdig was much, much slower than the GNU regex code. Maybe BSD's regex is based on the old rx code we had been using, while their rx code uses algorithms similar to GNU regex. It would be nice if the library developers got their act together and came up with one solid, standard, efficient implementation of both, so that it wouldn't matter which API your code used. -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 |