Re: [Htmlparser-user] Parsing malformed HTML whilst still leaving it intact
Brought to you by:
derrickoswald
From: Derrick O. <Der...@Ro...> - 2006-01-23 12:37:23
|
This has been a requested task for two years now: http://sourceforge.net/pm/task.php?group_project_id=21601&group_id=24399&func=browse The virtual tags that are added have the start position the same as the end position, so a smarter toHtml() could recognize them that way and avoid outputting them. Marc Candle wrote: >Hi, > >I'm parsing snippets of HTML pages at a time, making some changes and then >outputting back to HTML. The problem with HTML snippets is that they will be >malformed since some closing tags, for example, will be missing. > >The Parser seems to automatically correct the malformed HTML by adding >closing tags. Is it possible to prevent it from doing so? Or at least it can >notify me when it does so, so that before reconstructing the modified HTML >output I can simply delete them. > >An alternative would be to use the Lexer but then I loose all the >hierarchical features of the Parser, which not an option. > >This is similar to the general problem brought up in > <http://sourceforge.net/mailarchive/message.php?msg_id=12635550> >http://sourceforge.net/mailarchive/message.php?msg_id=12635550 . > >Kind Regards > >Mark > > > > |