[Htmlparser-developer] Final Statistics from Trek Run
Brought to you by:
derrickoswald
From: Claude D. <CD...@ar...> - 2002-07-10 15:58:21
|
The latest version of the HTMLParser (20020707) appears to deliver good performance over the Swing parser and previous HTMLParser versions. These tests were done in context (using our application, which converts HTML documents, among others, into a normalized form and transmits the result as XML to a server over TCP/IP). We have subtracted the transmission time from these numbers, but a small amount of imprecision is probable given preprocessing and file I/O that gets done up front. Given the size of the tests (more than a half million documents), these elements should negligable. Note that this set includes a large number of small documents and we know from earlier tests that the Swing parser slows down dramatically as documents get larger, while the HTMLParser does not. =20 Total Documents processed: 642,077 Average Document Size: 4,043 =20 Average Number of Documents Per Second for: =20 Swing Parser (Java 1.3.1): 2.797185195 HTMLParser 1.1 Production Version: 2.558727723 HTMLParser 1.2 Early integration build: 2.585632061 HTMLParser 1.2 (build 20020707): 3.224910367 =20 Conclusions: The HTMLParser 1.2 is now about 15% faster than the Swing parser on Swing's home turf (Swing does best with smaller HTML files). With larger files, we have seen improvements as high as 35 times the seed of the Swing parser). =20 |