htmlparser-developer Mailing List for HTML Parser (Page 30)

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

The latest version of the HTMLParser (20020707) appears to deliver good
performance over the Swing parser and previous HTMLParser versions.
These tests were done in context (using our application, which converts
HTML documents, among others, into a normalized form and transmits the
result as XML to a server over TCP/IP). We have subtracted the
transmission time from these numbers, but a small amount of imprecision
is probable given preprocessing and file I/O that gets done up front.
Given the size of the tests (more than a half million documents), these
elements should negligable. Note that this set includes a large number
of small documents and we know from earlier tests that the Swing parser
slows down dramatically as documents get larger, while the HTMLParser
does not.
=20
Total Documents processed: 642,077
Average Document Size: 4,043
=20
Average Number of Documents Per Second for:
=20
Swing Parser (Java 1.3.1): 2.797185195
HTMLParser 1.1 Production Version: 2.558727723
HTMLParser 1.2 Early integration build: 2.585632061
HTMLParser 1.2 (build 20020707): 3.224910367
=20
Conclusions: The HTMLParser 1.2 is now about 15% faster than the Swing
parser on Swing's home turf (Swing does best with smaller HTML files).
With larger files, we have seen improvements as high as 35 times the
seed of the Swing parser).
=20

2001	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct (4)	Nov (1)	Dec (4)
2002	Jan (12)	Feb	Mar (7)	Apr (27)	May (14)	Jun (16)	Jul (27)	Aug (74)	Sep (1)	Oct (23)	Nov (12)	Dec (119)
2003	Jan (31)	Feb (23)	Mar (28)	Apr (59)	May (119)	Jun (10)	Jul (3)	Aug (17)	Sep (8)	Oct (38)	Nov (6)	Dec (1)
2004	Jan (4)	Feb (4)	Mar (1)	Apr (2)	May	Jun (7)	Jul (6)	Aug (1)	Sep	Oct	Nov	Dec
2005	Jan	Feb (1)	Mar	Apr (8)	May	Jun	Jul	Aug (2)	Sep (10)	Oct (4)	Nov (15)	Dec
2006	Jan	Feb (1)	Mar	Apr (4)	May (11)	Jun	Jul	Aug	Sep (2)	Oct	Nov	Dec
2007	Jan (3)	Feb (2)	Mar	Apr (2)	May	Jun	Jul (1)	Aug	Sep	Oct	Nov	Dec
2008	Jan	Feb (1)	Mar	Apr	May	Jun	Jul	Aug	Sep (5)	Oct (1)	Nov	Dec
2009	Jan	Feb (1)	Mar	Apr (2)	May	Jun (4)	Jul	Aug (1)	Sep	Oct	Nov	Dec (2)
2010	Jan (1)	Feb	Mar	Apr (8)	May	Jun	Jul	Aug	Sep (6)	Oct	Nov (1)	Dec
2011	Jan	Feb	Mar	Apr	May (3)	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2012	Jan	Feb	Mar	Apr	May (1)	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2014	Jan	Feb	Mar	Apr	May (1)	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2015	Jan	Feb	Mar	Apr (1)	May	Jun (1)	Jul	Aug	Sep	Oct	Nov (2)	Dec (1)
2016	Jan	Feb	Mar	Apr	May	Jun	Jul (2)	Aug	Sep	Oct	Nov (2)	Dec (2)

htmlparser-developer Mailing List for HTML Parser (Page 30)

htmlparser-developer — The developer mailing list of the htmlparser project