From: Ted N. S. A. GA <te...@ag...> - 2002-12-09 17:58:51
|
> >There is already an open bug on "-final" in the SF bug >tracker (#413341). Please feel free to append your comments >to that bug. The bug is currently assigned to Mats, and may be >worthwhile reminding him (gently) to fix it ;-) > >Of course, if you can provide a patch I would be only too >happy to incorporate it myself and close the bug report if >it is fixed. > Steve, Thanks. It seems pretty clear to me after looking at it for a bit that the following patch is required, but not suficient: solabel10% diff -c sgmlparser.tcl $a *** sgmlparser.tcl Thu Sep 12 17:13:46 2002 --- /home/solabel10/ted/wdc/src/uc/lib/tclxml2.4/sgmlparser.tcl Mon Dec 9 12:46:41 2002 *************** *** 151,157 **** if {[info exists options(-statevariable)]} { # Mats: Several rewrites here to handle -final 0 option. # If any cached unparsed xml (state(leftover)), prepend it. ! upvar #0 $opts(-statevariable) state if {[string length $state(leftover)]} { regsub -all $elemExpr $state(leftover)$sgml $elemSub sgml set state(leftover) {} --- 151,157 ---- if {[info exists options(-statevariable)]} { # Mats: Several rewrites here to handle -final 0 option. # If any cached unparsed xml (state(leftover)), prepend it. ! upvar #0 $options(-statevariable) state if {[string length $state(leftover)]} { regsub -all $elemExpr $state(leftover)$sgml $elemSub sgml set state(leftover) {} solabel10% This fixes the "can't read "opts(-statevariable)": no such variable while executing" problem, but something else is wrong as well. Ted |
From: Mats B. <ma...@pr...> - 2002-12-10 07:36:15
|
> > This seems like a pretty basic thing. Any suggestions? > > There is already an open bug on "-final" in the SF bug > tracker (#413341). Please feel free to append your comments > to that bug. The bug is currently assigned to Mats, and may be > worthwhile reminding him (gently) to fix it ;-) > > Of course, if you can provide a patch I would be only too > happy to incorporate it myself and close the bug report if > it is fixed. I have just managed to get cvs to work on my Mac OS X box (well it worked right away, but I had to get used to it), and I was actually thinking of fixing this, eventually. Must be mindreading or something. I'll post to this list when fixed. /Mats |
From: Mats B. <ma...@pr...> - 2002-12-10 17:18:03
|
Hi all, I checked out a fresh copy of tclxml (version 2.5 released today), and found that most (all?) of my changes already were applied (by Steve Ball presumably). I've been running my patched tclxml daily for a year, with a lot of chopped off xml, and that works fine. However, the cvs version does not work as previously noted. There are some more changes to the cvs version compared to my version. The main difference seems to be in sgmlparser, and after some search I found the code: # Patch from bug report #596959, Marshall Rose to be the reason. Just do: if {0} { # Patch from bug report #596959, Marshall Rose if {[string compare [lindex $sgml 4] ""]} { set sgml [linsert $sgml 0 {} {} {} {} {}] } } I don't know the reason for this code. This must be sorted out by the author. This would never have happened if there were a test case with -final 0 and chopped off xml. So, please someone, add this so we wont see bugs like this again. There remains a question of how to actually use the -final option. I use it always when there is a risc of incomplete xml. Is there any reason for using -final 1? What's the advantage? In case there are remaining problems you can always extract my patched TclXML form my whiteboard application. See "http://hem.fyristorg.com/matben/" Best Wishes, Mats Test case on mac (I've a special pkgIndex file that sets version to 3.0 in order to not interfere with other installations): % package require xml 3.0 (Build) 2 % (Build) 3 % proc cdata {data args} { > puts $data > } (Build) 4 % (Build) 5 % set parser [::xml::parser parseit \ > -characterdatacommand cdata -final 0] parseit (Build) 6 % (Build) 7 % $parser parse "<the>" (Build) 8 % $parser parse "world" world (Build) 9 % $parser parse "</the>" |
From: Steve B. <Ste...@zv...> - 2002-12-10 20:00:22
|
Mats Bengtsson wrote: > I checked out a fresh copy of tclxml (version 2.5 released today), > and found that most (all?) of my changes already were applied > (by Steve Ball presumably). I've been running my patched tclxml > daily for a year, with a lot of chopped off xml, and that works fine. > However, the cvs version does not work as previously noted. > There are some more changes to the cvs version compared to my version. > The main difference seems to be in sgmlparser, and after some > search I found the code: > # Patch from bug report #596959, Marshall Rose > to be the reason. > Just do: > if {0} { > # Patch from bug report #596959, Marshall Rose > if {[string compare [lindex $sgml 4] ""]} { > set sgml [linsert $sgml 0 {} {} {} {} {}] > } > } > > I don't know the reason for this code. This must be sorted out > by the author. This would never have happened if there were a test > case with -final 0 and chopped off xml. So, please someone, > add this so we wont see bugs like this again. It is quite clear that making this change will break regression tests. At this stage we need three things: 1. A test case for bug #596959 2. A test case for bug #413341 3. Make sure that *both* tests pass According to Marshall's bug report, the Jabber client's use of TclXML tickled the bug so applying your change will break that program. > There remains a question of how to actually use the -final option. > I use it always when there is a risc of incomplete xml. > Is there any reason for using -final 1? What's the advantage? The simpler case when you know that the document is complete. See also below. > Test case on mac (I've a special pkgIndex file that sets version to 3.0 in order > to not interfere with other installations): Until we release v3.0... > % package require xml > 3.0 > (Build) 2 % > (Build) 3 % proc cdata {data args} { > >> puts $data >>} > > (Build) 4 % > (Build) 5 % set parser [::xml::parser parseit \ > >> -characterdatacommand cdata -final 0] > > parseit > (Build) 6 % > (Build) 7 % $parser parse "<the>" > > (Build) 8 % $parser parse "world" > world > (Build) 9 % $parser parse "</the>" Don't you set '-final 1' at all? Setting that configuration option to 1 from 0 does a final check that the document is well-formed. Consider: $parser parse "<the>" $parser parse "world" "<the>world" is not well-formed XML, but you will never know that because the parsing ends and no error is raised. Cheers, Steve Ball -- Steve Ball | XSLT Standard Library | Training & Seminars Zveno Pty Ltd | Web Tcl Complete | XML XSL Schemas http://www.zveno.com/ | TclXML TclDOM | Tcl, Web Development Ste...@zv... +---------------------------+--------------------- Ph. +61 2 6242 4099 | Mobile (0413) 594 462 | Fax +61 2 6242 4099 |
From: Mats B. <ma...@pr...> - 2002-12-12 10:34:07
|
Hi all, Just one clarification. I have not written tclxml. I just found the problem with -final 0 for my Jabber client and fixed it to the best of my knowledge. Fixing bugs in others code is not the funniest, and tclxml is one of the most complex piece of tcl code I've ever seen. End of statement. Steve Ball wrote: > > Mats Bengtsson wrote: > > Just do: > > if {0} { > > # Patch from bug report #596959, Marshall Rose > > if {[string compare [lindex $sgml 4] ""]} { > > set sgml [linsert $sgml 0 {} {} {} {} {}] > > } > > } > > > > I don't know the reason for this code. This must be sorted out > > by the author. This would never have happened if there were a test > > case with -final 0 and chopped off xml. So, please someone, > > add this so we wont see bugs like this again. > > It is quite clear that making this change will break > regression tests. At this stage we need three things: The bug #596959 is a fix for chopped of xml as I understand it from the desciption at SF bugtracker. But this is fixed in a much more elegant way by my earlier patch (see sgml::tokenise) which enables you to feed the parser with xml that is chopped off anywhere. The core part of the parser is the regsub -all $elemExpr $sgml $elemSub sgml in sgml::tokenise, with variable tokExpr <(/?)( too long.... )> variable substExpr "\}\n{\\2} {\\1} {\\3} \{" with elemExpr=tokExpr, and elemSub=substExpr. I've traced this code to Brent Welch's book 2ed, and he refers this trick to Stephen Uhler. The trick here is that each tag is matched by each regsub'stitution, and all inbetween subsequent matches, which is cdata, comes after the open brace to the right, and before the close brace to the left. The \n was used in Welch's book to produce tcl code with a command on each line which is used to render html. But I don't see why it is used here! Why is not the sgml handled as a list? My suspicion is that the extra spurios \n come from this regexp. Rewriting this is a major task, and right now I don't have time or energy to do this. > > 1. A test case for bug #596959 > 2. A test case for bug #413341 > 3. Make sure that *both* tests pass I think both bugs refer to the same problem. I add some xml as a postscript to this mail. > > According to Marshall's bug report, the Jabber client's > use of TclXML tickled the bug so applying your change > will break that program. I'm not compleletly sure of this, see above. > Don't you set '-final 1' at all? Setting that configuration > option to 1 from 0 does a final check that the document > is well-formed. Consider: > > $parser parse "<the>" > $parser parse "world" > > "<the>world" is not well-formed XML, but you will never > know that because the parsing ends and no error is raised. > With -final 0 you usually never know when the document ends. You usually know that after you've got the last xml, and then its too late. Kind of catch 22. In the Jabber case, you read xml from a network socket, so there you only know that it was the last chunk when the connection closed down, or if you got the final </stream>. It could perhaps be useful to have a new command that checks if document so far was final (isducumentfinal). This could check [llength $state(stack)] which should be 0 when finished, since state(stack) is a list with all unmatched open tags in hierarchy. /Mats set parser [::xml::parser parseit \ -characterdatacommand cdata -final 0] # ex 1 $parser parse "<the>" $parser parse "world" $parser parse "</the>" # ex 2 $parser parse <stream><ju $parser parse "nk attr='lo" $parser parse "se>Hello Wo" $parser parse "rld</junk" $parser parse "></stream>" or something like this, with appropriate callbacks. |
From: Mats B. <ma...@pr...> - 2002-12-12 13:53:57
|
Update: I just tested this once again with a slightly modified callback, which shows that there are no extra newlines; it's just a callback with a completely empty (nothing) cdata. It's the newline of the puts we have seen. See script below and verbatim output from the same script. /Mats package require xml proc cdata {data args} { puts "(data)==>'$data'" } set parser [::xml::parser parseit \ -characterdatacommand cdata -final 0] $parser parse "<the>" $parser parse "world" $parser parse "</the>" From cvs Dec 10, but without patch #596959 (I still used a hacked up pkgIndex.tcl file, therefore 3.0): % package require xml 3.0 (Build) 2 % (Build) 3 % proc cdata {data args} { > puts "(data)==>'$data'" > } (Build) 4 % set parser [::xml::parser parseit \ > -characterdatacommand cdata -final 0] parseit (Build) 5 % (Build) 6 % $parser parse "<the>" (data)==>'' (Build) 7 % $parser parse "world" (data)==>'world' (Build) 8 % $parser parse "</the>" (data)==>'' |
From: Steve B. <Ste...@zv...> - 2002-12-09 20:40:43
|
Ted Nolan SRI Augusta GA wrote: > Thanks. It seems pretty clear to me after looking at it for > a bit that the following patch is required, but not suficient: [...snip...] > This fixes the > > "can't read "opts(-statevariable)": no such variable while executing" > > problem, but something else is wrong as well. I'll apply that patch. Can you also supply a test case(s), in the form of a patch to the test suite, that demonstrates the bug? Cheers, Steve Ball -- Steve Ball | XSLT Standard Library | Training & Seminars Zveno Pty Ltd | Web Tcl Complete | XML XSL Schemas http://www.zveno.com/ | TclXML TclDOM | Tcl, Web Development Ste...@zv... +---------------------------+--------------------- Ph. +61 2 6242 4099 | Mobile (0413) 594 462 | Fax +61 2 6242 4099 |