Re: [Htmlparser-user] Parsing malformed HTML whilst still leaving it intact

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

This has been a requested task for two years now:
http://sourceforge.net/pm/task.php?group_project_id=21601&group_id=24399&func=browse

The virtual tags that are added have the start position the same as the 
end position, so a smarter toHtml() could recognize them that way and 
avoid outputting them.

Marc Candle wrote:

>Hi,
> 
>I'm parsing snippets of HTML pages at a time, making some changes and then
>outputting back to HTML. The problem with HTML snippets is that they will be
>malformed since some closing tags, for example, will be missing. 
> 
>The Parser seems to automatically correct the malformed HTML by adding
>closing tags. Is it possible to prevent it from doing so? Or at least it can
>notify me when it does so, so that before reconstructing the modified HTML
>output I can simply delete them.
> 
>An alternative would be to use the Lexer but then I loose all the
>hierarchical features of the Parser, which not an option.
> 
>This is similar to the general problem brought up in 
> <http://sourceforge.net/mailarchive/message.php?msg_id=12635550>
>http://sourceforge.net/mailarchive/message.php?msg_id=12635550 .
> 
>Kind Regards
> 
>Mark
> 
> 
>  
>