Menu

#507 htmlparse::2tree gets the HTML Doc structure wrong

open
htmlparse (25)
5
2004-08-13
2004-08-13
Anonymous
No

The following shows how to repeat the bug.

$ cat wrong.tcl
package require htmlparse
set indent 0
set t [struct::tree]
set html
{<html><head></head><body><h1>heading</h1><p>ayaken!</p></body></html>}
proc painter {tree act node} {
global indent
if {$act == "enter"} then {
incr indent
puts ">[string repeat - [expr
{$indent-1}]][$tree get $node type]"
} else {
puts "<[string repeat - [expr
{$indent-1}]][$tree get $node type]"
incr indent -1
}
}

htmlparse::2tree $html $t
$t walk root -order both -type dfs {act node} {painter
$t $act $node}

$ tclsh wrong.tcl
>root
>-hmstart
>--html
>---head
<---head
>---body
>----h1
>-----PCDATA
<-----PCDATA
>-----p
>------PCDATA
<------PCDATA
<-----p
<----h1
<---body
<--html
<-hmstart
<root

You can see the p element is wrong in the hierarchy, it
got sorted as a child of the h1 element, not, as it
should be, as its sibling.

I assume it's htmlparse innards which get it wrong, not
2tree only, just a guess though.

If there's need to contact me, I'm reachable at
Ephaeton (at) gmx (dot) net.

Discussion