Re: [Xml-coreutils-discuss] Where to start?
Status: Alpha
Brought to you by:
lbreyer
|
From: Douglas H. <do...@do...> - 2015-02-11 20:19:22
|
Maybe I am just criminally insane, but I could solve my problem with the
following Java program. I would prefer however to learn to quickly use the
xml-coreutils command line utilities...
import java.io.File;
import java.io.IOException;
import java.io.OutputStream;
import java.io.OutputStreamWriter;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.OutputKeys;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerException;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
public class Search {
public static void main(String[] args) throws Exception {
Document output =
DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument();
output.appendChild( output.createElement( "listings" ) );
System.out.println( output.getDocumentElement().toString() );
Document input =
DocumentBuilderFactory.newInstance().newDocumentBuilder().parse( new File(
"/Users/douglasheld/zoopla/listings.xml" ) );
NodeList listings = input.getElementsByTagName( "listing" );
for ( int i=0; i<listings.getLength(); i++ ){
Node listing = listings.item( i );
boolean insert = false;
Element newListing = output.createElement( "listing" );
for ( int j=0; j < listing.getChildNodes().getLength(); j++){
if ( listing.getChildNodes().item(j).getNodeName() ==
"listing_id" ){
newListing.setAttribute("id",
listing.getChildNodes().item(j).getFirstChild().getNodeValue() );
}
if ( listing.getChildNodes().item(j).getNodeName() ==
"floor_plan" ){
insert = true;
Element newFloorPlan = output.createElement(
"floor_plan" );
newFloorPlan.setTextContent(
listing.getChildNodes().item(j).getFirstChild().getNodeValue() );
newListing.appendChild( newFloorPlan );
}
}
if ( insert ){
output.getDocumentElement().appendChild( newListing );
System.err.print( '.' );
}
}
printDocument( output, System.out );
}
/* copy/paste from
http://stackoverflow.com/questions/2325388/java-shortest-way-to-pretty-print-to-stdout-a-org-w3c-dom-document
*/
public static void printDocument(Document doc, OutputStream out) throws
IOException, TransformerException {
TransformerFactory tf = TransformerFactory.newInstance();
Transformer transformer = tf.newTransformer();
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION,
"no");
transformer.setOutputProperty(OutputKeys.METHOD, "xml");
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
transformer.setOutputProperty("{
http://xml.apache.org/xslt}indent-amount", "4");
transformer.transform(new DOMSource(doc),
new StreamResult(new OutputStreamWriter(out, "UTF-8")));
}
}
On Wed, Feb 11, 2015 at 6:59 PM, Douglas Held <do...@do...> wrote:
> I need some help getting my thinking in the xml-coreutils world.
>
> I have some prepared data in the format:
> <response>
> <listing>
> <foo>foo</foo>
> <bar>bar</bar>
> <floor_plan>http://example.com/123456</floor_plan>
> <listing_id>1</listing_id>
> </listing>
> <listing>
> <listing_id>2</listing_id>
> <foo>foo</foo>
> <bar>bar</bar>
> </listing>
> </response>
>
> Each /response/listing element has a //listing_id and some, but not all
> listings have a floor_plan element.
>
> Goal:
> I would like to extract only listings with floor plans, and only the
> selected elements I am interested in, into a new document as:
> <listings>
> <listing>
> <listing_id>1</listing_id>
> <floor_plan>http://example.com/123456</floor_plan>
> </listing>
> <listings>
>
> I have tried commands such as:
> # Create the target file
> xml-echo -e "[listings@updated=20150210]" >listings.xml
> # Copy selected elements into target
> xml-cp page1.xml
> :/response/listing/listing_id[/response/listing/floor_plan != null]
> listings.xml :/listings/
>
> I have read all the man pages and experimented with many different of the
> xml-* commands. Very seldom do they work as I am hoping.
>
> The workflow I would expect, based on the coreutils workflows I normally
> use, would be:
> cat source.xml | while read element; do
> if echo $element | grep -q floor_plan ; then
> echo $element >> target.xml
> fi
> done
>
> It would be nice if I could use a complement of the above technologies for
> processing the XML. The below are imaginary pseudo-commands:
>
> xml-cat source.xml :/response/listing | while xml-read listing; do
> # $listing is now an xml fragment of one listing element's full content
> if echo $listing | xml-grep -q ://floor_plan; then
> cat $element | xml-egrep
> "://listing|://listing/listing_id|://listing/floor_plan" | xml-insert
> target.xml :/listings/
> fi
> done
>
> Perhaps by following my pseudo-logic, you can explain how I can carry out
> these operations with xml-coreutils.
>
> I have attached one of my source files.
>
> Regards,
> Doug
> --
> Douglas Held
> do...@do...
> +447775733093
>
--
Douglas Held
do...@do...
+447775733093
|