Re: [Xml-coreutils-discuss] Where to start?
Status: Alpha
Brought to you by:
lbreyer
|
From: Douglas H. <do...@do...> - 2015-02-11 20:20:32
|
for posterity, both of my == operators are erroneous and should be replaced
with .equals()
On Wed, Feb 11, 2015 at 8:18 PM, Douglas Held <do...@do...> wrote:
> Maybe I am just criminally insane, but I could solve my problem with the
> following Java program. I would prefer however to learn to quickly use the
> xml-coreutils command line utilities...
>
> import java.io.File;
> import java.io.IOException;
> import java.io.OutputStream;
> import java.io.OutputStreamWriter;
>
> import javax.xml.parsers.DocumentBuilderFactory;
> import javax.xml.transform.OutputKeys;
> import javax.xml.transform.Transformer;
> import javax.xml.transform.TransformerException;
> import javax.xml.transform.TransformerFactory;
> import javax.xml.transform.dom.DOMSource;
> import javax.xml.transform.stream.StreamResult;
>
> import org.w3c.dom.Document;
> import org.w3c.dom.Element;
> import org.w3c.dom.Node;
> import org.w3c.dom.NodeList;
>
>
> public class Search {
>
> public static void main(String[] args) throws Exception {
> Document output =
> DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument();
> output.appendChild( output.createElement( "listings" ) );
> System.out.println( output.getDocumentElement().toString() );
>
> Document input =
> DocumentBuilderFactory.newInstance().newDocumentBuilder().parse( new File(
> "/Users/douglasheld/zoopla/listings.xml" ) );
> NodeList listings = input.getElementsByTagName( "listing" );
> for ( int i=0; i<listings.getLength(); i++ ){
> Node listing = listings.item( i );
> boolean insert = false;
> Element newListing = output.createElement( "listing" );
> for ( int j=0; j < listing.getChildNodes().getLength(); j++){
> if ( listing.getChildNodes().item(j).getNodeName() ==
> "listing_id" ){
> newListing.setAttribute("id",
> listing.getChildNodes().item(j).getFirstChild().getNodeValue() );
> }
> if ( listing.getChildNodes().item(j).getNodeName() ==
> "floor_plan" ){
> insert = true;
> Element newFloorPlan = output.createElement(
> "floor_plan" );
> newFloorPlan.setTextContent(
> listing.getChildNodes().item(j).getFirstChild().getNodeValue() );
> newListing.appendChild( newFloorPlan );
> }
> }
> if ( insert ){
> output.getDocumentElement().appendChild( newListing );
> System.err.print( '.' );
> }
> }
> printDocument( output, System.out );
> }
>
> /* copy/paste from
> http://stackoverflow.com/questions/2325388/java-shortest-way-to-pretty-print-to-stdout-a-org-w3c-dom-document
> */
> public static void printDocument(Document doc, OutputStream out)
> throws IOException, TransformerException {
> TransformerFactory tf = TransformerFactory.newInstance();
> Transformer transformer = tf.newTransformer();
> transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION,
> "no");
> transformer.setOutputProperty(OutputKeys.METHOD, "xml");
> transformer.setOutputProperty(OutputKeys.INDENT, "yes");
> transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
> transformer.setOutputProperty("{
> http://xml.apache.org/xslt}indent-amount", "4");
>
> transformer.transform(new DOMSource(doc),
> new StreamResult(new OutputStreamWriter(out, "UTF-8")));
> }
> }
>
>
> On Wed, Feb 11, 2015 at 6:59 PM, Douglas Held <do...@do...>
> wrote:
>
>> I need some help getting my thinking in the xml-coreutils world.
>>
>> I have some prepared data in the format:
>> <response>
>> <listing>
>> <foo>foo</foo>
>> <bar>bar</bar>
>> <floor_plan>http://example.com/123456</floor_plan>
>> <listing_id>1</listing_id>
>> </listing>
>> <listing>
>> <listing_id>2</listing_id>
>> <foo>foo</foo>
>> <bar>bar</bar>
>> </listing>
>> </response>
>>
>> Each /response/listing element has a //listing_id and some, but not all
>> listings have a floor_plan element.
>>
>> Goal:
>> I would like to extract only listings with floor plans, and only the
>> selected elements I am interested in, into a new document as:
>> <listings>
>> <listing>
>> <listing_id>1</listing_id>
>> <floor_plan>http://example.com/123456</floor_plan>
>> </listing>
>> <listings>
>>
>> I have tried commands such as:
>> # Create the target file
>> xml-echo -e "[listings@updated=20150210]" >listings.xml
>> # Copy selected elements into target
>> xml-cp page1.xml
>> :/response/listing/listing_id[/response/listing/floor_plan != null]
>> listings.xml :/listings/
>>
>> I have read all the man pages and experimented with many different of the
>> xml-* commands. Very seldom do they work as I am hoping.
>>
>> The workflow I would expect, based on the coreutils workflows I normally
>> use, would be:
>> cat source.xml | while read element; do
>> if echo $element | grep -q floor_plan ; then
>> echo $element >> target.xml
>> fi
>> done
>>
>> It would be nice if I could use a complement of the above technologies
>> for processing the XML. The below are imaginary pseudo-commands:
>>
>> xml-cat source.xml :/response/listing | while xml-read listing; do
>> # $listing is now an xml fragment of one listing element's full content
>> if echo $listing | xml-grep -q ://floor_plan; then
>> cat $element | xml-egrep
>> "://listing|://listing/listing_id|://listing/floor_plan" | xml-insert
>> target.xml :/listings/
>> fi
>> done
>>
>> Perhaps by following my pseudo-logic, you can explain how I can carry out
>> these operations with xml-coreutils.
>>
>> I have attached one of my source files.
>>
>> Regards,
>> Doug
>> --
>> Douglas Held
>> do...@do...
>> +447775733093
>>
>
>
>
> --
> Douglas Held
> do...@do...
> +447775733093
>
--
Douglas Held
do...@do...
+447775733093
|