From: Mats B. <ma...@pr...> - 2002-12-12 10:34:07
|
Hi all, Just one clarification. I have not written tclxml. I just found the problem with -final 0 for my Jabber client and fixed it to the best of my knowledge. Fixing bugs in others code is not the funniest, and tclxml is one of the most complex piece of tcl code I've ever seen. End of statement. Steve Ball wrote: > > Mats Bengtsson wrote: > > Just do: > > if {0} { > > # Patch from bug report #596959, Marshall Rose > > if {[string compare [lindex $sgml 4] ""]} { > > set sgml [linsert $sgml 0 {} {} {} {} {}] > > } > > } > > > > I don't know the reason for this code. This must be sorted out > > by the author. This would never have happened if there were a test > > case with -final 0 and chopped off xml. So, please someone, > > add this so we wont see bugs like this again. > > It is quite clear that making this change will break > regression tests. At this stage we need three things: The bug #596959 is a fix for chopped of xml as I understand it from the desciption at SF bugtracker. But this is fixed in a much more elegant way by my earlier patch (see sgml::tokenise) which enables you to feed the parser with xml that is chopped off anywhere. The core part of the parser is the regsub -all $elemExpr $sgml $elemSub sgml in sgml::tokenise, with variable tokExpr <(/?)( too long.... )> variable substExpr "\}\n{\\2} {\\1} {\\3} \{" with elemExpr=tokExpr, and elemSub=substExpr. I've traced this code to Brent Welch's book 2ed, and he refers this trick to Stephen Uhler. The trick here is that each tag is matched by each regsub'stitution, and all inbetween subsequent matches, which is cdata, comes after the open brace to the right, and before the close brace to the left. The \n was used in Welch's book to produce tcl code with a command on each line which is used to render html. But I don't see why it is used here! Why is not the sgml handled as a list? My suspicion is that the extra spurios \n come from this regexp. Rewriting this is a major task, and right now I don't have time or energy to do this. > > 1. A test case for bug #596959 > 2. A test case for bug #413341 > 3. Make sure that *both* tests pass I think both bugs refer to the same problem. I add some xml as a postscript to this mail. > > According to Marshall's bug report, the Jabber client's > use of TclXML tickled the bug so applying your change > will break that program. I'm not compleletly sure of this, see above. > Don't you set '-final 1' at all? Setting that configuration > option to 1 from 0 does a final check that the document > is well-formed. Consider: > > $parser parse "<the>" > $parser parse "world" > > "<the>world" is not well-formed XML, but you will never > know that because the parsing ends and no error is raised. > With -final 0 you usually never know when the document ends. You usually know that after you've got the last xml, and then its too late. Kind of catch 22. In the Jabber case, you read xml from a network socket, so there you only know that it was the last chunk when the connection closed down, or if you got the final </stream>. It could perhaps be useful to have a new command that checks if document so far was final (isducumentfinal). This could check [llength $state(stack)] which should be 0 when finished, since state(stack) is a list with all unmatched open tags in hierarchy. /Mats set parser [::xml::parser parseit \ -characterdatacommand cdata -final 0] # ex 1 $parser parse "<the>" $parser parse "world" $parser parse "</the>" # ex 2 $parser parse <stream><ju $parser parse "nk attr='lo" $parser parse "se>Hello Wo" $parser parse "rld</junk" $parser parse "></stream>" or something like this, with appropriate callbacks. |