Re: [Htmlparser-user] Problem with HTMLParser - I can't extract any div's.
Brought to you by:
derrickoswald
From: Derrick O. <der...@gm...> - 2011-07-30 06:14:22
|
You should maybe filter for new AndFilter (new TagNameFilter("div"), new HasAttributeFilter("storytext")) and then pass the resulting (single) node to the StringBean for extracting the text: nodelist.visitAllNodesWith (stringbean) The contents of the string bean after that should be the text you're looking for. 2011/7/29 Jan Sokołowski <net...@gm...> > I've got a small problem there, and I'd like to ask you to help me, please. > Ok, so I'm trying to use HTMLParser in my project, and there's the problem > - > Example page that I'm trying to process: > http://www.fanfiction.net/s/7229512/1/A_Horse_With_No_Name > > Looking at the source code, there's a div with id and class > 'storytext' within a div with id and class 'storytextp', and there's a > lot of <p> tags within the 'storytext' div. I want to extract the > contents of that 'storytext' div to plain text string. > That's what I'm trying to do: > NodeList nodeList = new NodeList(); > NodeFilter nodeFilter = new AndFilter(new > TagNameFilter("div"),new HasChildFilter(new TagNameFilter("p"))); > > for(NodeIterator e = parser.elements(); e.hasMoreNodes();){ > e.nextNode().collectInto(nodeList, nodeFilter); > } > > System.out.println(nodeList.toNodeArray().length); > > for(Node node : nodeList.toNodeArray()){ > System.out.println(node.toPlainTextString()); > } > > The result? Lenght of nodeList.toNodeArray is equal to zero. > Therefore, it means that I'm screwing something up there. I also tried > using RegexFilter("storytext"), but this isn't working anyway. > The question is, how should I do it? > Please, help, I've been trying to run it past the last week :p > > > ------------------------------------------------------------------------------ > Got Input? Slashdot Needs You. > Take our quick survey online. Come on, we don't ask for help often. > Plus, you'll get a chance to win $100 to spend on ThinkGeek. > http://p.sf.net/sfu/slashdot-survey > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > |