Re: [Htmlparser-user] why some font tags are not treated as parent tag of the related text node?
Brought to you by:
derrickoswald
From: Derrick O. <Der...@Ro...> - 2006-04-22 11:15:40
|
Brandy, As your example illustrates, <B> is not often closed by a </B> which causes some grief for the parser. For this heuristic reason, not all possible tags are registered as CompositeTag nodes, which is what gives the 'parent/child' nesting relationship. The heading tags were just added recently in version 1.6, which alters the heuristic a bit, but seems to be acceptable to most people. Derrick Brandy Ye wrote: > Hello, all > > I'm a newbie of Htmlparser. I have a question when I wrote my > first sample using Htmlparser, something to show the html structures. > > When I use "getParent()" to get the parent of a text node, some tags > such as "<b>" and "<i>" are not treated as its parent node. > > The html to be parsed: > > <html> > <title>test.html</title> > <body> > <b> > <h1> > content > </h1> > </font> > </body> > </html> > > and the parent nodes of "content" are: h1, body, html (but NO b). > > Is it the expected behaviour? I found headingTag (h1,h2...) was not > treated as parent node too in Htmlparser1.5. > > Thanks in advance! |