Thread: [Htmlparser-user] Testing/feedback, question

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

I've just started using the HTMLParser and hope to be able to provide
improved throughput and reliability over the Swing HTML parser by
applying this open source solution, hopefully offering bug
fixes/enhancements back to the community. We have (my company) processed
about 11 million HTML documents successfully (with the Swing parser),
some of which we'll see tested again with the HTMLParser code in the
next few weeks.

To date, we have only run a few simple tests with the HTMLParser code
but it appears now that the library is writing to standard err. I would
expect all errors to result in parser-specific exceptions that the
calling application would be free to handle as it may see fit.

Some of the data we are processing is not publicly available. The errors
we have seen are issues with vary large HTML files that were generated
from log files. These are suprisingly common but offer a special
challenge to HTML parsers in that they tend to contain large strings of
log file information between <pre></pre> tags.

We'll probably be running about 1 or 2 million files through the parser
this week. I will try to report problems and get set up to build the
library so that I can offer more specific class/line-based
feedback/fixes.

Thanks.

Thread: [Htmlparser-user] Testing/feedback, question

htmlparser-user