From: Ted N. S. A. GA <te...@ag...> - 2003-02-04 19:03:58
|
Hello folks, You may recall that back in December, I had found a bug in using the pure Tcl version of tclxml to parse XML streams with the "-final 0" option. I think someone suggested diking out the Rose patch for bug #596959, but there were others saying that would break something else, and there was no real resolution to the issue. I recently had a rare spare day to look at it again. I certainly can't claim to understand the code which uses regular expressions in a much more heavy duty fashion than anything else I have seen in Tcl. However, after hours of playing around with "puts", I was drawn back to the Rose patch. It seems to be intended to insert a null XML element into the stream for some reason, but I question the way it does this. The tokenized xml seems to be put into a list in 4-tuples, as in the loop at line 332 of sgmlparser.tcl: foreach {tag close param text} $sgml The rose patch on line 175 of sgmlparse.tcl inserts -5- empty tuples into the tokenized xml: set sgml [linsert $sgml 0 {} {} {} {} {}] This pushes what was a "tag" into being a "close". Should this not be -4- empty tuples? If I apply the following patch: -----CUT HERE----- *** sgmlparser.tcl.bak Tue Feb 4 13:34:40 2003 --- sgmlparser.tcl Tue Feb 4 13:40:16 2003 *************** *** 172,178 **** # Patch from bug report #596959, Marshall Rose if {[string compare [lindex $sgml 4] ""]} { ! set sgml [linsert $sgml 0 {} {} {} {} {}] } } else { --- 172,178 ---- # Patch from bug report #596959, Marshall Rose if {[string compare [lindex $sgml 4] ""]} { ! set sgml [linsert $sgml 0 {} {} {} {} ] } } else { -----CUT HERE--- Then my test program seems to run OK. Comments? Ted PS: Here's my test program. Run it with no args to do a piecemeal parse with -final 0. Run it with 1 arg to do a all at once parse with -final 0, and run it with 2 args to do an all at once parse with -final 1. Before the "patch", case 1 will error out, case 2 will produce no output and case 3 will work OK. After the patch, all 3 cases produce the same output. ----CUT HERE---- #!/usr/local/bin/tclsh8.4 package require xml proc xml_el_start {name attrs args} { puts "Start name ($name) attrs ($attrs) args ($args)" } proc xml_el_end {name args} { puts "End name ($name) args ($args)" } proc xml_char_data {data} { if { [string length $data] } { puts "Cdata data ($data)" } } set parser [ ::xml::parser \ -elementstartcommand xml_el_start \ -elementendcommand xml_el_end \ -characterdatacommand xml_char_data \ -defaultcommand xml_default \ -final 0 \ ] if { [llength $argv] == 0 } { $parser parse "<fooby>" $parser parse "<hello>" $parser parse "world" $parser parse "</hello>" $parser parse "</fooby>" } else { if { [llength $argv] == 2} { $parser configure -final 1 } $parser parse {<fooby><hello>world</hello></fooby>} } ----CUT HERE--- |