Re: [Xml-coreutils-discuss] Where to start?
Status: Alpha
Brought to you by:
lbreyer
From: <la...@lb...> - 2015-02-12 13:48:42
|
Hi Douglas, Thanks for your questions. I'll answer them each separately. The sample xml document you provided was very helpful. First, the following one: > Each /response/listing element has a //listing_id and some, but not all > listings have a floor_plan element. > > Goal: > I would like to extract only listings with floor plans, and only the > selected elements I am interested in, into a new document as: > <listings> > <listing> > <listing_id>1</listing_id> > <floor_plan>http://example.com/123456</floor_plan> > </listing> > <listings> You can extract the listing nodes as temporary xml files by using xml-find, eg xml-find page1.xml :/response/* -exec echo {-} ';' The {-} is the name of a temporary file which exists only as long as the echo command runs. If you had a script cmd.sh instead of echo, the script could open the file name {-} and process it. Alternatively, you can run bash executing a string of commands, like so: xml-find page1.xml :/response/* -exec bash -c 'if grep -q floor_plan {-} ; then xml-printf "id %s\nagent %s\n" {-} ://listing_id ://agent_name; fi' ';' (note the single quotes around the semicolon). The above will print plain text if the listing_id and agent_name exist, otherwise you'll get some error messages on stderr and no output. You could replace the plain grep with if xml-grep '.*' {-} ://floor_plan >/dev/null ; then ..... Also, if you prefer to have xml output, perhaps try xml-find page1.xml ://response/* -exec xml-grep '.*' {-} ://floor_plan ://listing_id ://agent_name ';' | xml-cat Here xml-grep outputs an xml fragment for each temporary file, and xml-cat reassembles the fragments into a single xml file. These ideas are probably the closest to the workflow you suggest below. > The workflow I would expect, based on the coreutils workflows I > normally > use, would be: > cat source.xml | while read element; do > if echo $element | grep -q floor_plan ; then > echo $element >> target.xml > fi > done > > It would be nice if I could use a complement of the above technologies > for > processing the XML. The below are imaginary pseudo-commands: > > xml-cat source.xml :/response/listing | while xml-read listing; do > # $listing is now an xml fragment of one listing element's full > content > if echo $listing | xml-grep -q ://floor_plan; then > cat $element | xml-egrep > "://listing|://listing/listing_id|://listing/floor_plan" | xml-insert > target.xml :/listings/ > fi > done Cheers, Laird Breyer |