From: Misha K. <mis...@gm...> - 2010-06-11 17:36:43
|
Thank you. Will try. It looks like it does not work with XmlSlurper though :( http://www.maclovin.de/2010/02/robust-html-parsing-the-groovy-way/ Thank you! Misha On Fri, 2010-06-11 at 11:45 -0500, Jacob Kjome wrote: > Did you try using a org.cyberneko.html.parsers.DOMFragmentParser yet? > > http://nekohtml.sourceforge.net/usage.html > > Jake > > > On Fri, 11 Jun 2010 11:32:40 -0500 > Misha Koshelev <mis...@gm...> wrote: > > Thank you so much for a great product! > > > > I am trying to parse the following HTML fragment, and I would like to > > get the same fragment as output (without HTML and BODY tags). Is this > > possible? If so, how? > > > > Thank you > > Misha > > > > p.s. I am reading here: > > http://nekohtml.sourceforge.net/faq.html#fragments > > and I believe I have added the correct options below. However, the > > output is still incorrect :( > > > > Thank you > > Misha > > > > #!/usr/bin/env groovy > > > > def text=""" > > <div><h2>Test</h2> > > <div>Hi</div> > > </div> > > """ > > > > // Parse > > def config=new org.cyberneko.html.HTMLConfiguration() > > > > config.setFeature("http://cyberneko.org/html/features/balance-tags/document-fragment",true) > > def html=new XmlSlurper(new > > org.cyberneko.html.parsers.SAXParser()).parseText(text) > > > > // Output > > import groovy.xml.MarkupBuilder > > import groovy.xml.StreamingMarkupBuilder > > import groovy.util.XmlNodePrinter > > import groovy.util.slurpersupport.NodeChild > > > > def printNode(NodeChild node) { > > def writer = new StringWriter() > > writer << new StreamingMarkupBuilder().bind { > > mkp.declareNamespace('':node[0].namespaceURI()) > > mkp.yield node > > } > > new XmlNodePrinter().print(new > > XmlParser().parseText(writer.toString())) > > } > > printNode(html) > > > > Output: > > > > <HTML> > > <tag0:HEAD xmlns:tag0="http://www.w3.org/1999/xhtml"/> > > <BODY> > > <DIV> > > <H2> > > Test > > </H2> > > <DIV> > > Hi > > </DIV> > > </DIV> > > </BODY> > > </HTML> > > > > > > > > ------------------------------------------------------------------------------ > > ThinkGeek and WIRED's GeekDad team up for the Ultimate > > GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the > > lucky parental unit. See the prize list and enter to win: > > http://p.sf.net/sfu/thinkgeek-promo > > _______________________________________________ > > nekohtml-user mailing list > > nek...@li... > > https://lists.sourceforge.net/lists/listinfo/nekohtml-user > > > > > ------------------------------------------------------------------------------ > ThinkGeek and WIRED's GeekDad team up for the Ultimate > GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the > lucky parental unit. See the prize list and enter to win: > http://p.sf.net/sfu/thinkgeek-promo > _______________________________________________ > nekohtml-user mailing list > nek...@li... > https://lists.sourceforge.net/lists/listinfo/nekohtml-user |