[Htmlparser-user] scanning / parsing bug?
Brought to you by:
derrickoswald
From: Subramanya S. <sa...@cs...> - 2007-12-11 22:02:42
|
For this url, http://www.washingtonpost.com/wp-dyn/content/article/2007/12/10/AR2007121001600.html (and maybe other washington post urls), I wonder if HTML Parser is running into a bug. The HTML source for this page has the following block of HTML in the middle .. <!---------------- End New Comments Box ------------------> <div class="sidebarhack"><b></b></div> .... .... </div> <!-- sphereit end --> <br clear="all"> The parser is ignoring all content from the start of the line 'End New Comments Box' till 'sphereit end' ... I wonder if this is because of the lack of a space before the '-->' closing comment string in the first line ... I tested the code by adding a space manually at that point, and sure enough, the block of HTML in the middle is correctly recognized. Is there a workaround for this? I am also willing to download the source code and incorporate a fix, if necessary. Thanks, Subbu. |