RE: [Htmlparser-user] RE: [Htmlparser-developer] Final Statistics from Trek Run
Brought to you by:
derrickoswald
From: Claude D. <CD...@ar...> - 2002-07-11 16:17:46
|
We're not quite done yet... ;-) =20 Here are some numbers that reflect the differences with the larger files. This set is 57,952 files (6,256,488,243 bytes), many of which are several megabyte log file dumps to HTML (average file size for this set is 107,959 bytes). These are especially problematic for the Swing parser: =20 Time for Swing (in minutes): 10,305 (0.0937279 docs/sec) Time for HTMLParser 1.1 (in minutes): 294 (3.2943877 docs/sec) Time for HTMLParser 1.2 (in minutes): 311 (3.1665058 docs/sec) =20 Note that this run was done on a single box with no other parallel runs. Also, there was a variance of about 1000 files between runs that are reflected in the speed numbers. But I provided the average in the paragraph above, so you will not get exact results from recalculating from those numbers. Still, everything needs to be looked at in perspective. =20 Notable here is that the 1.2 version seems to be a tiny bit slower on big files. This is almost certainly due to string reallocation. As contiguous content gets larger, which can happen in any application that works heavily with string objects. It might be worth looking at whether this is addressable. Overall, though, HTMParser 1.2 is clearly an improvement over the most commonly used Java/HTML parser (ie: Swing) in use today ;-). -----Original Message----- From: Somik Raha [mailto:so...@ya...]=20 Sent: Wednesday, July 10, 2002 5:19 PM To: htm...@li...; htm...@li... Subject: Re: [Htmlparser-user] RE: [Htmlparser-developer] Final Statistics from Trek Run Hi Claude, Thanks a ton for all these tests. Do you think you could write an article on this that we could put up ? =20 Regards Somik |