Re: [Htmlparser-user] Can't extract any div's, redux.
Brought to you by:
derrickoswald
From: Derrick O. <der...@gm...> - 2011-07-31 13:51:21
|
Using the FilterBuilder tool<http://htmlparser.sourceforge.net/samples.html>is a good way to play with filters. Using that for a minute I got this code which fetches your storybook text: import org.htmlparser.*; import org.htmlparser.filters.*; import org.htmlparser.beans.*; import org.htmlparser.util.*; public class StorytextFilter { public static void main (String args[]) { TagNameFilter filter0 = new TagNameFilter (); filter0.setName ("DIV"); HasAttributeFilter filter1 = new HasAttributeFilter (); filter1.setAttributeName ("id"); filter1.setAttributeValue ("storytext"); NodeFilter[] array0 = new NodeFilter[2]; array0[0] = filter0; array0[1] = filter1; AndFilter filter2 = new AndFilter (); filter2.setPredicates (array0); NodeFilter[] array1 = new NodeFilter[1]; array1[0] = filter2; FilterBean bean = new FilterBean (); bean.setFilters (array1); if (0 != args.length) { bean.setURL (args[0]); System.out.println (bean.getNodes ().toHtml ()); } else System.out.println ("Usage: java -classpath .;htmlparser.jar;htmllexer.jar StorytextFilter <url>"); } } Then you can apply the StringBuiler to the NodeList using the visitor pattern. 2011/7/30 Jan Sokołowski <net...@gm...> > Thanks for answering! However, I'm afraid it didn't help me much :( > > So, all I've changed in the code is the nodeFilter object ( now > constructed as new AndFilter(new TagNameFilter("div"),new > HasAttributeFilter("storytext")); ) > Then, I do the > for(NodeIterator e = parser.elements(); e.hasMoreNodes();){ > e.nextNode().collectInto(nodeList, nodeFilter); > } > > And according to nodeLIst.toNodeArray().lenght, there are no matching > nodes. > > Therefore, I don't have anything to pass to anything you've said, not > to mention I don't know, for example, what a StringBean is (that > means, I've read the javadoc on your page, but I don't have the > foggiest idea how to use it there) (And why couldn't I use the > toPlainTextString() method? I'd like to get the inner HTML of div > without removing any tags there, which StringBean removes, as I've > noticed, unless I've misunderstood it) :( > I'd be very thankful if you could elaborate more on what should I do > there to make it work, please. > > By the way, how do I respond to the posts on that mailing list? I > can't find the response option anywhere? > > > ------------------------------------------------------------------------------ > Got Input? Slashdot Needs You. > Take our quick survey online. Come on, we don't ask for help often. > Plus, you'll get a chance to win $100 to spend on ThinkGeek. > http://p.sf.net/sfu/slashdot-survey > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > |