From: André K. <ak...@la...> - 2010-11-21 16:23:24
|
Hi all, I've put an "enhanced" xml select version in the 'enhanced_select' branch. The most useful switches to me currently are --var, --key and -- choose/--when/--otherwise. But --import might be useful also. The --function/--param/--call-template/--with-param needs some thinking since it's too verbose to my taste. I use variables mostly for clarity, and since they can hold nodesets, it makes it easier to collect data from several places in an xml document (or even several times from the same place). A basic example (multiplication tables above 2) $ cat a.xml <x> <n>0</n> <n>1</n> <n>2</n> <n>3</n> <n>4</n> <n>5</n> <n>6</n> <n>7</n> <n>8</n> <n>9</n> </x> $ cat a.xml \ | xml sel \ --text \ --var numbers='//n' \ -t \ -m '$numbers[. >= 2]' \ --var i='.' \ -o '== Multiplication table of ' -v '$i' -o ' ==' -n \ -m '$numbers' \ --var j='.' \ -v '$i' -o ' * ' -v '$j' -o ' = ' -v '$i * $j' -n \ -b \ -b \ -b == Multiplication table of 2 == 2 * 0 = 0 2 * 1 = 2 2 * 2 = 4 2 * 3 = 6 2 * 4 = 8 ... == Multiplication table of 9 == ... 9 * 6 = 54 9 * 7 = 63 9 * 8 = 72 9 * 9 = 81 Here's a little script which dumps the structure of an XML document and demonsrates the use of --var and --choose/--when/--otherwise switches. $ cat struct.sh #!/bin/sh struct() { "${xml:-xml}" sel \ "$@" \ --text \ --var empty="''" \ --var dot="'.'" \ --var plus="'+'" \ -t \ --var nl -n -b \ --var PI -o 'PI ' -b \ --var comm -o 'Comm' -b \ --var text -o 'Text' -b \ --var NS -o 'NS ' -b \ --var attr -o 'Attr' -b \ --var elem -o 'Elem' -b \ --var root -o 'Root' -b \ --var NA -o 'N/A ' -b \ -m '/|//node()|//@*' \ --var path \ -o '/' \ -m 'ancestor-or-self::*' \ -i 'position() > 1' -o '/' -b \ -v 'name()' \ -b \ -b \ --var indent \ -m 'ancestor-or-self::*' -v '$plus' -b \ -m 'ancestor-or-self::*' -v 'concat($dot, $dot, $dot)' -b \ -b \ --var type \ --var ns='../namespace::*' \ --choose \ --when './self::processing-instruction()' -v '$PI' - b \ --when 'count(.|$ns) = count($ns)' -v '$NS' - b \ --when './self::comment()' -v '$comm' - b \ --when './self::text()' -v '$text' - b \ --when 'count(.|../@*) = count(../@*)' -v '$attr' - b \ --when './self::*' -v '$elem' - b \ --when 'not(./parent::*)' -v '$root' - b \ --otherwise -v '$NA' -b \ -b \ -b \ --choose \ --when '$type = $PI' \ -v '$type' -o ': ' -v '$indent' \ -o '[' -v 'name()' -o '][' -v '.' \ -o ']' -n \ -b \ --when '$type = $comm' \ -v '$type' -o ': ' -v '$indent' \ -o '[' -v '.' -o ']' -n \ -b \ --when '$type = $text' \ -v '$type' -o ': ' -v '$indent' \ -o '[' -v '$path' -o '][' -v 'str:replace(., $nl, "\n")' -o ']' -n \ -b \ --when '$type = $attr' \ -v '$type' -o ': ' -v '$indent' \ -o '[' -v '$path' -o '][@' -v 'name()' -o '][' -v '.' -o ']' -n \ -b \ --when '$type = $root' \ -v '$type' -o ': ' -v '$indent' \ -o '[' -v '$path' -o ']' -n \ -b \ --when '$type = $elem' \ -v '$type' -o ': ' -v '$indent' \ -o '[' -v '$path' -o ']' \ -n \ --var nslist \ -m 'namespace::*[name() != "xml"]' \ -s 'A:T:U' 'name()' \ -i 'position() > 1' -o ', ' -b \ --choose \ --when 'name() = $empty' -o '<def>' -b \ --otherwise -v 'name()' -b \ -b \ -o '=' -v '.' \ -b \ -b \ -o 'NS ' -o ': ' -v '$indent' \ -o '[' -v '$path' -o '][' -v '$nslist' -o ']' -n \ -b \ --otherwise \ -v '$type' -o ': ' -v '$indent' \ -o '[' -c '.' -o ']' -n \ -b \ -b \ -b } struct "$@" $ cat eg.xml <?xml version="1.0"?> <?pi value="3.141592"?> <x xmlns="http://a.org"> <!-- Sample --> <t id="1">Text 1 <b>Text 2</b> Text 3</t> <t id="2"> Multi Line Text </t> <ns xmlns:b="http://b.net" xmlns:c="http://c.com"> <b:s>1</b:s> <c:s>2</c:s> </ns> </x> $ cat eg.xml | struct.sh Root: [/] PI : [pi][value="3.141592"] Elem: +...[/x] NS : +...[/x][<def>=http://a.org] Text: +...[/x][\n ] Comm: +...[ Sample ] Text: +...[/x][\n ] Elem: ++......[/x/t] NS : ++......[/x/t][<def>=http://a.org] Attr: ++......[/x/t][@id][1] Text: ++......[/x/t][Text 1 ] Text: +...[/x][\n ] Elem: +++.........[/x/t/b] NS : +++.........[/x/t/b][<def>=http://a.org] Text: +++.........[/x/t/b][Text 2] Text: ++......[/x/t][ Text 3] Elem: ++......[/x/t] NS : ++......[/x/t][<def>=http://a.org] Attr: ++......[/x/t][@id][2] Text: ++......[/x/t][\nMulti\n Line\n Text\n] Text: +...[/x][\n ] Elem: ++......[/x/ns] NS : ++......[/x/ns][<def>=http://a.org, b=http://b.net, c=http:// c.com] Text: ++......[/x/ns][\n ] Text: +++.........[/x/ns/b:s][1] Text: ++......[/x/ns][\n ] Elem: +++.........[/x/ns/c:s] NS : +++.........[/x/ns/c:s][<def>=http://a.org, b=http://b.net, c=http://c.com ] Text: +++.........[/x/ns/c:s][2] Text: ++......[/x/ns][\n ] Text: +...[/x][\n] Elem: +++.........[/x/ns/b:s] NS : +++.........[/x/ns/b:s][<def>=http://a.org, b=http://b.net, c=http://c.com ] Note there seems to be a misplaced 'Text: +...[/x][\n ]' right after 'Text: ++......[/x/t][Text 1 ]' It should be right after 'Text: ++......[/x/t][ Text 3]' but probably the order of nodes in a nodeset is not guaranteed. With other XSLT processors, the result may be different. I can provide some examples on how to use key and function switches. Regards, André |
From: Noam P. <npo...@us...> - 2010-11-26 23:27:58
|
André Kaplan <ak...@la...> writes: > Hi all, > > I've put an "enhanced" xml select version in the 'enhanced_select' > branch. > > The most useful switches to me currently are --var, --key and -- > choose/--when/--otherwise. But --import might be useful also. > The --function/--param/--call-template/--with-param needs some > thinking since it's too verbose to my taste. > > I use variables mostly for clarity, and since they can hold nodesets, > it makes it easier to collect data from several places in an xml > document (or even several times from the same place). Is it worth adding all these options? Your example shell commands look like XSL with shorter tags (<tagname></tagname> becomes --tagname or -t). > > Here's a little script which dumps the structure of an XML document > and demonsrates the use of --var and --choose/--when/--otherwise > switches. I couldn't get this one to work. I got a whole bunch of warnings like: warning: failed to load external entity "b" warning: failed to load external entity "--when" Noam |
From: André K. <ak...@la...> - 2010-11-29 13:21:22
|
Hi Noam, >> I've put an "enhanced" xml select version in the 'enhanced_select' >> branch. >> >> The most useful switches to me currently are --var, --key and -- >> choose/--when/--otherwise. But --import might be useful also. >> The --function/--param/--call-template/--with-param needs some >> thinking since it's too verbose to my taste. >> >> I use variables mostly for clarity, and since they can hold nodesets, >> it makes it easier to collect data from several places in an xml >> document (or even several times from the same place). > > Is it worth adding all these options? Your example shell commands look > like XSL with shorter tags (<tagname></tagname> becomes --tagname or > -t). Well that's what I wanted to discuss about! I think some switches are definitely worth adding: var, key, choose/ when/otherwise, import, include. Some are less worth it and as you say could be added with extra switches or extended --elem and --attr (say --xelem and --xattr) which would write xslt elements. Some switches could be shortcuts to longer xslt constructs (like currently --template, --match): function and/or function call switches. In the end the command line will look like an XSLT stylesheet but less verbose and hopefully human-readable. But I also see xmlstarlet as a command-line xslt stylesheet designer. >> >> Here's a little script which dumps the structure of an XML document >> and demonsrates the use of --var and --choose/--when/--otherwise >> switches. > > I couldn't get this one to work. I got a whole bunch of warnings like: > > warning: failed to load external entity "b" > warning: failed to load external entity "--when" I think that my mail client breaks lines longer than 70 characters. I added the contrib/xml_struct.sh script in the enhanced_select branch. Hopefully you'll be able to use it. Regards, André |
From: Noam P. <npo...@us...> - 2010-11-29 19:09:49
|
André Kaplan <ak...@la...> writes: >> >> Is it worth adding all these options? Your example shell commands look >> like XSL with shorter tags (<tagname></tagname> becomes --tagname or >> -t). > > Well that's what I wanted to discuss about! > I think some switches are definitely worth adding: var, key, choose/ > when/otherwise, import, include. > Some are less worth it and as you say could be added with extra > switches or extended --elem and --attr (say --xelem and --xattr) which > would write xslt elements. > > Some switches could be shortcuts to longer xslt constructs (like > currently --template, --match): function and/or function call switches. > > In the end the command line will look like an XSLT stylesheet but less > verbose and hopefully human-readable. > But I also see xmlstarlet as a command-line xslt stylesheet designer. > Hmm, well I see it more as a little tool for when you have just a small XML manipulation task that isn't worth the bother of writing a stylesheet: little one-liners. It would be interesting to hear others people's opinion on this. If we do go in the "less verbose stylesheet direction", I think there should be some kind of syntax defined instead of using options for everything. I don't find the dashes very pretty, and single letter options aren't all that human-readable. > I think that my mail client breaks lines longer than 70 characters. > I added the contrib/xml_struct.sh script in the enhanced_select branch. > Hopefully you'll be able to use it. Yup that fixed it. Noam |
From: André K. <ak...@la...> - 2010-11-30 00:19:20
|
>>> >>> Is it worth adding all these options? Your example shell commands >>> look >>> like XSL with shorter tags (<tagname></tagname> becomes --tagname or >>> -t). >> >> Well that's what I wanted to discuss about! >> I think some switches are definitely worth adding: var, key, choose/ >> when/otherwise, import, include. >> Some are less worth it and as you say could be added with extra >> switches or extended --elem and --attr (say --xelem and --xattr) >> which >> would write xslt elements. >> >> Some switches could be shortcuts to longer xslt constructs (like >> currently --template, --match): function and/or function call >> switches. >> >> In the end the command line will look like an XSLT stylesheet but >> less >> verbose and hopefully human-readable. >> But I also see xmlstarlet as a command-line xslt stylesheet designer. >> > > Hmm, well I see it more as a little tool for when you have just a > small > XML manipulation task that isn't worth the bother of writing a > stylesheet: little one-liners. It would be interesting to hear others > people's opinion on this. I felt a bit let down when I was trying to go over the one-liner since things are rarely as simple as they look like. This is why I added options to support other xsl elements. First of I didn't much like the idea of writing: xml sel -t -i 'condition' -o 'yes' -b -i 'not(condition) -o 'no' -b -b for an if/then/else so I added support for choose/when/otherwise: xml sel -t --choose --when 'condition' -o 'yes' -b --otherwise -o 'no' -b -b -b Sure it's longer to write but it also opens new horizons. Then things leading to another I added support to var, key. Anyway, adding other switches won't prevent you from using xmlstarlet for one-liners if that's how you use it. I also use it for one-liners. I just don't want to be stopped at the first hurdle. I'd be interested in other people's opinion as well. > If we do go in the "less verbose stylesheet direction", I think there > should be some kind of syntax defined instead of using options for > everything. I don't find the dashes very pretty, and single letter > options aren't all that human-readable. Well sure that's a matter of taste. I don't mind too much the dashes, but I rarely write long xml commands directly in the terminal, I directly integrate them in longer shell scripts. As for the the syntax I don't quite understand. If that's a full language, then XQuery is already there. If that's some shortcuts or syntactic sugar to avoid typing dashes then OK. For instance I'd prefer to write -a name=value to set an attribute rather than the current -a name -o value -b. But the second syntax remains useful for more complex logic. There's a learning curve as for any command-line or any tool, and if I don't want to have to remember what a short option means, I use the long one. More typing but less head scratching three months later. Regards, André |
From: Noam P. <npo...@us...> - 2010-11-30 02:45:16
|
André Kaplan <ak...@la...> writes: > I felt a bit let down when I was trying to go over the one-liner since > things are rarely as simple as they look like. > This is why I added options to support other xsl elements. > First of I didn't much like the idea of writing: > xml sel -t -i 'condition' -o 'yes' -b -i 'not(condition) -o 'no' -b -b > for an if/then/else so I added support for choose/when/otherwise: > xml sel -t --choose --when 'condition' -o 'yes' -b --otherwise -o 'no' > -b -b -b I see your point, though in this specific instance we don't need to be bound xslt's zany conditionals: xml sel -t --if 'condition' -o 'yes' --else 'no' -b > > Anyway, adding other switches won't prevent you from using xmlstarlet > for one-liners if that's how you use it. I also use it for one-liners. > I just don't want to be stopped at the first hurdle. The more code there is, the greater the chance of screwing it up. >> If we do go in the "less verbose stylesheet direction", I think there >> should be some kind of syntax defined instead of using options for >> everything. I don't find the dashes very pretty, and single letter >> options aren't all that human-readable. > > Well sure that's a matter of taste. I don't mind too much the dashes, > but I rarely write long xml commands directly in the terminal, I > directly integrate them in longer shell scripts. I forgot to mention the quoting, that's probably the worst thing. Shell evaluation rules are sufficiently icky that I really try to avoid longer shell scripts. > As for the the syntax I don't quite understand. > If that's a full language, then XQuery is already there. > If that's some shortcuts or syntactic sugar to avoid typing dashes > then OK. Hmm, maybe shell is good enough. Noam |
From: André K. <ak...@la...> - 2010-11-30 15:59:50
|
>> Anyway, adding other switches won't prevent you from using xmlstarlet >> for one-liners if that's how you use it. I also use it for one- >> liners. >> I just don't want to be stopped at the first hurdle. > > The more code there is, the greater the chance of screwing it up. What's your point here, I don't get it? It's about as true as saying: no code, no bug. >>> If we do go in the "less verbose stylesheet direction", I think >>> there >>> should be some kind of syntax defined instead of using options for >>> everything. I don't find the dashes very pretty, and single letter >>> options aren't all that human-readable. >> >> Well sure that's a matter of taste. I don't mind too much the dashes, >> but I rarely write long xml commands directly in the terminal, I >> directly integrate them in longer shell scripts. > > I forgot to mention the quoting, that's probably the worst thing. > Shell > evaluation rules are sufficiently icky that I really try to avoid > longer > shell scripts. xmlstarlet isn't responsible for the quoting in scripts. But it surely has a bug since it doesn't properly xml-escape its arguments. Currently xml sel -t -m '//x[@class = "A"]' -c '.' -b -b isn't working, when it should. You have to write: xml sel -t -m '//x[@class = "A"]' -c '.' - b -b Are you referring to that particular issue (I think it's in the Bug tracker)? Regards, André |
From: Noam P. <npo...@us...> - 2010-11-30 20:26:21
|
André Kaplan <ak...@la...> writes: >> The more code there is, the greater the chance of screwing it up. > > What's your point here, I don't get it? > It's about as true as saying: no code, no bug. Just that adding features does have some cost. More features --> more code --> (potentially) more bugs. > xmlstarlet isn't responsible for the quoting in scripts. No, but that's a reason to want to use it outside of a shell script. Noam |