Re: [Htmlparser-user] Problem with HTMLParser - I can't extract any div's.
Brought to you by:
derrickoswald
|
From: Derrick O. <der...@gm...> - 2011-07-30 06:14:22
|
You should maybe filter for new AndFilter (new TagNameFilter("div"), new
HasAttributeFilter("storytext"))
and then pass the resulting (single) node to the StringBean for extracting
the text:
nodelist.visitAllNodesWith (stringbean)
The contents of the string bean after that should be the text you're looking
for.
2011/7/29 Jan Sokołowski <net...@gm...>
> I've got a small problem there, and I'd like to ask you to help me, please.
> Ok, so I'm trying to use HTMLParser in my project, and there's the problem
> -
> Example page that I'm trying to process:
> http://www.fanfiction.net/s/7229512/1/A_Horse_With_No_Name
>
> Looking at the source code, there's a div with id and class
> 'storytext' within a div with id and class 'storytextp', and there's a
> lot of <p> tags within the 'storytext' div. I want to extract the
> contents of that 'storytext' div to plain text string.
> That's what I'm trying to do:
> NodeList nodeList = new NodeList();
> NodeFilter nodeFilter = new AndFilter(new
> TagNameFilter("div"),new HasChildFilter(new TagNameFilter("p")));
>
> for(NodeIterator e = parser.elements(); e.hasMoreNodes();){
> e.nextNode().collectInto(nodeList, nodeFilter);
> }
>
> System.out.println(nodeList.toNodeArray().length);
>
> for(Node node : nodeList.toNodeArray()){
> System.out.println(node.toPlainTextString());
> }
>
> The result? Lenght of nodeList.toNodeArray is equal to zero.
> Therefore, it means that I'm screwing something up there. I also tried
> using RegexFilter("storytext"), but this isn't working anyway.
> The question is, how should I do it?
> Please, help, I've been trying to run it past the last week :p
>
>
> ------------------------------------------------------------------------------
> Got Input? Slashdot Needs You.
> Take our quick survey online. Come on, we don't ask for help often.
> Plus, you'll get a chance to win $100 to spend on ThinkGeek.
> http://p.sf.net/sfu/slashdot-survey
> _______________________________________________
> Htmlparser-user mailing list
> Htm...@li...
> https://lists.sourceforge.net/lists/listinfo/htmlparser-user
>
|