|
From: Ted N. S. A. GA <te...@ag...> - 2003-02-04 19:03:58
|
Hello folks,
You may recall that back in December, I had found a bug in using the
pure Tcl version of tclxml to parse XML streams with the "-final 0" option.
I think someone suggested diking out the Rose patch for bug #596959, but
there were others saying that would break something else, and there was
no real resolution to the issue.
I recently had a rare spare day to look at it again. I certainly can't
claim to understand the code which uses regular expressions in a much more
heavy duty fashion than anything else I have seen in Tcl.
However, after hours of playing around with "puts", I was drawn back to
the Rose patch. It seems to be intended to insert a null XML element
into the stream for some reason, but I question the way it does this.
The tokenized xml seems to be put into a list in 4-tuples, as in the
loop at line 332 of sgmlparser.tcl:
foreach {tag close param text} $sgml
The rose patch on line 175 of sgmlparse.tcl inserts -5- empty tuples
into the tokenized xml:
set sgml [linsert $sgml 0 {} {} {} {} {}]
This pushes what was a "tag" into being a "close".
Should this not be -4- empty tuples?
If I apply the following patch:
-----CUT HERE-----
*** sgmlparser.tcl.bak Tue Feb 4 13:34:40 2003
--- sgmlparser.tcl Tue Feb 4 13:40:16 2003
***************
*** 172,178 ****
# Patch from bug report #596959, Marshall Rose
if {[string compare [lindex $sgml 4] ""]} {
! set sgml [linsert $sgml 0 {} {} {} {} {}]
}
} else {
--- 172,178 ----
# Patch from bug report #596959, Marshall Rose
if {[string compare [lindex $sgml 4] ""]} {
! set sgml [linsert $sgml 0 {} {} {} {} ]
}
} else {
-----CUT HERE---
Then my test program seems to run OK.
Comments?
Ted
PS:
Here's my test program. Run it with no args to do a piecemeal parse
with -final 0. Run it with 1 arg to do a all at once parse with
-final 0, and run it with 2 args to do an all at once parse with -final 1.
Before the "patch", case 1 will error out, case 2 will produce no output
and case 3 will work OK. After the patch, all 3 cases produce the same
output.
----CUT HERE----
#!/usr/local/bin/tclsh8.4
package require xml
proc xml_el_start {name attrs args} {
puts "Start name ($name) attrs ($attrs) args ($args)"
}
proc xml_el_end {name args} {
puts "End name ($name) args ($args)"
}
proc xml_char_data {data} {
if { [string length $data] } {
puts "Cdata data ($data)"
}
}
set parser [ ::xml::parser \
-elementstartcommand xml_el_start \
-elementendcommand xml_el_end \
-characterdatacommand xml_char_data \
-defaultcommand xml_default \
-final 0 \
]
if { [llength $argv] == 0 } {
$parser parse "<fooby>"
$parser parse "<hello>"
$parser parse "world"
$parser parse "</hello>"
$parser parse "</fooby>"
} else {
if { [llength $argv] == 2} {
$parser configure -final 1
}
$parser parse {<fooby><hello>world</hello></fooby>}
}
----CUT HERE---
|