Re: [Htmlparser-user] Testing/feedback, question
Brought to you by:
derrickoswald
From: Somik R. <so...@ya...> - 2002-06-25 01:32:08
|
Dear Claude, >We have (my company) processed >about 11 million HTML documents successfully (with the >Swing parser), >some of which we'll see tested again with the >HTMLParser code in the >next few weeks. Great - this will be a great service to this project and its community. Thank you very much. >To date, we have only run a few simple tests with the HTMLParser code >but it appears now that the library is writing to standard err. I would >expect all errors to result in parser-specific exceptions that the >calling application would be free to handle as it may see fit. Hmm.. although I agree with this, I have a question - what do you see being written to standard err ? My understanding is that, when the parser crashes, it usually throws an exception all the way up - so if you wrap your parsing block (the for loop) in a try-catch and look for a simple exception, you would be able to catch it. >Some of the data we are processing is not publicly available. The errors >we have seen are issues with vary large HTML files that were generated >from log files. These are suprisingly common but offer a special >challenge to HTML parsers in that they tend to contain large strings of >log file information between <pre></pre> tags. Sounds interesting. Even if we cant get the data that you tested with, we could simulate an equivalent testcase... >We'll probably be running about 1 or 2 million files through the parser >this week. I will try to report problems and get set up to build the >library so that I can offer more specific class/line-based >feedback/fixes. Cool. Looking forward to it. Cheers, Somik |