Brandy,
As your example illustrates, <B> is not often closed by a </B> which
causes some grief for the parser. For this heuristic reason, not all
possible tags are registered as CompositeTag nodes, which is what gives
the 'parent/child' nesting relationship. The heading tags were just
added recently in version 1.6, which alters the heuristic a bit, but
seems to be acceptable to most people.
Derrick
Brandy Ye wrote:
> Hello, all
>
> I'm a newbie of Htmlparser. I have a question when I wrote my
> first sample using Htmlparser, something to show the html structures.
>
> When I use "getParent()" to get the parent of a text node, some tags
> such as "<b>" and "<i>" are not treated as its parent node.
>
> The html to be parsed:
>
> <html>
> <title>test.html</title>
> <body>
> <b>
> <h1>
> content
> </h1>
> </font>
> </body>
> </html>
>
> and the parent nodes of "content" are: h1, body, html (but NO b).
>
> Is it the expected behaviour? I found headingTag (h1,h2...) was not
> treated as parent node too in Htmlparser1.5.
>
> Thanks in advance!
|