I use HtmlParse quite intensively (several thousands of urls per day).
I try to filter out the urls that I know are not html documents (exe, zip, etc.) but sometimes a non-html one slips through the cracks and gets sent to HtmlParser.
When one such url causes HtmlParser to crash (NPE or OutOfmemory error), I submit a bug report.
I submitted a BR yesterday about a crash in the parsing of a text document, and got this comment from Derrick:
"Next they'll ask to parse .exe files! Sheeesh, I should have rejected this one."
I'm very grateful for the efforts the HtmlParser team put in the product; so, the last thing I want is waste the developers' time .
My reasoning was that reporting these problems can help make the product more stable and robust (sometimes you just don't know what coming down the pipe).
If you think I should not bother, please let me know and I'll stick to bugs in HTML documents.
I've been noticing your bug reports and I think we all are grateful for them. Its just that we've been seeing too much dirty html out there..
When you are a developer on this project, the pressure can be very high. I'm sure Derrick was only joking. Please keep your bug reports coming.
My life has been crazy for a while now, so I haven't been able to get deeply involved. However, we would love to have some help- if you could take one step further, and try to find out why the parser is crashing, and guide us to the nature of the problem, it makes our task that much easier.
Keep em' coming.
Fair enough. I spent the weekend investigating further some of the problems I reported last week and that Derrick could not reproduce.
From now on, I'll do more work up front to make your life easier.