Menu

#288 unwanted PCDATA space trim

open
htmlparse (25)
5
2003-04-04
2003-04-04
No

### test code ###

package require cmdline 1.1
package require htmlparse 0.3

set html {<html><head><title>Title</title>
<body><b>BB</b> RR <b>BB</b> <i>II</i></body></html>}

catch {t destroy}
struct::tree t
htmlparse::2tree $html t
t walk node1 -order both -command {puts [list %n %a [t
getall %n]] }

### result ###
...
node7 enter {type b data {}}
node8 enter {type PCDATA data BB}
node8 leave {type PCDATA data BB}
node7 leave {type b data {}}
node9 enter {type PCDATA data RR}
node9 leave {type PCDATA data RR}
node10 enter {type b data {}}
node11 enter {type PCDATA data BB}
node11 leave {type PCDATA data BB}
node10 leave {type b data {}}
node12 enter {type i data {}}
node13 enter {type PCDATA data II}
node13 leave {type PCDATA data II}
node12 leave {type i data {}}
...

### expected result ###

...
node9 enter {type PCDATA data { RR }}
...
node12 enter {type PCDATA data { }}
...

### discussion ###

Due to "string trim $textBehindTheTag",
significant PCDATA spaces are destroyed.
The code should distinguish if PCDATA is expected
or not expected. It must rely on a sort of simplified DTD.

Discussion