Re: [tcljava-user] jacl 1.3.1 regexp
Brought to you by:
mdejong
From: Mo D. <mo...@mo...> - 2007-04-09 21:31:38
|
Johannes Kleinlercher wrote: > Hi all, > > I look for a way to parse an XML-File in IBMs wsadmin, which uses jacl > 1.3.1 as a scripting language. > > I found a project called sa4was [1] on IBMs developerworks which does > parsing XML with regexp. sa4was works with WebSpher 5.x (which used jacl > 1.2.x) however in WebSphere 6.0 (which uses jacl 1.3.1) it doesn't work > anymore. > > Some code in sa4was does the following: > > ============================================ > while {[regexp {([^=]+)="([^"]*?)"(.*)} $restOfTag dontCare > attributeName attributeValue restOfTag]} { > if {$attributeName == "id"} { > set idValue $attributeValue > break > } > } > ============================================ > > and there I get the error > "couldn't compile regular expression pattern: nested *?+" > > > I found out that regexp changed in jacl 1.3.1 and non-greedy regexp > (with this "*?" expression) doesn't work anymore. Is that right? > > Questions: > a) So are there workarounds for something like that? > b) Or are there some better ways to parse XML in jacl? > Currently, there is no workaround except porting regexp code back to use the older Tcl 8.1 style regexp syntax. Very old versions of Jacl made use or the Oro regexp package, but it was non free and had to be removed from Jacl. But, lets back up a minute. I would not even suggest that you use regexp commands to parse your XML code. Using a series of regexp call like that is going to be SLOW! There is just no way around it, you would be much better off using a XML parsing engine written in Java. There are lots and lot of them available. You could create your XML parser in Java and build up a DOM in memory and then pass a handle to the DOM back to your Tcl scripts. It is actually quite easy to examine and decode a DOM tree with Tcl code once you have your own utility procs that examine subtrees and extract data from nodes. I hope that helps Mo DeJong |