Re: [Htmlparser-user] Can't extract any div's, redux.

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Using the FilterBuilder
tool<http://htmlparser.sourceforge.net/samples.html>is a good way to
play with filters.
Using that for a minute I got this code which fetches your storybook text:

import org.htmlparser.*;
import org.htmlparser.filters.*;
import org.htmlparser.beans.*;
import org.htmlparser.util.*;

public class StorytextFilter
{
    public static void main (String args[])
    {
        TagNameFilter filter0 = new TagNameFilter ();
        filter0.setName ("DIV");
        HasAttributeFilter filter1 = new HasAttributeFilter ();
        filter1.setAttributeName ("id");
        filter1.setAttributeValue ("storytext");
        NodeFilter[] array0 = new NodeFilter[2];
        array0[0] = filter0;
        array0[1] = filter1;
        AndFilter filter2 = new AndFilter ();
        filter2.setPredicates (array0);
        NodeFilter[] array1 = new NodeFilter[1];
        array1[0] = filter2;
        FilterBean bean = new FilterBean ();
        bean.setFilters (array1);
        if (0 != args.length)
        {
            bean.setURL (args[0]);
            System.out.println (bean.getNodes ().toHtml ());
        }
        else
            System.out.println ("Usage: java -classpath
.;htmlparser.jar;htmllexer.jar StorytextFilter <url>");
    }
}

Then you can apply the StringBuiler to the NodeList using the visitor
pattern.

2011/7/30 Jan Sokołowski <net...@gm...>

> Thanks for answering! However, I'm afraid it didn't help me much :(
>
> So, all I've changed in the code is the nodeFilter object ( now
> constructed as new AndFilter(new TagNameFilter("div"),new
> HasAttributeFilter("storytext")); )
> Then, I do the
> for(NodeIterator e = parser.elements(); e.hasMoreNodes();){
>                e.nextNode().collectInto(nodeList, nodeFilter);
>            }
>
> And according to nodeLIst.toNodeArray().lenght, there are no matching
> nodes.
>
> Therefore, I don't have anything to pass to anything you've said, not
> to mention I don't know, for example, what a StringBean is (that
> means, I've read the javadoc on your page, but I don't have the
> foggiest idea how to use it there) (And why couldn't I use the
> toPlainTextString() method? I'd like to get the inner HTML of div
> without removing any tags there, which StringBean removes, as I've
> noticed, unless I've misunderstood it) :(
> I'd be very thankful if you could elaborate more on what should I do
> there to make it work, please.
>
> By the way, how do I respond to the posts on that mailing list? I
> can't find the response option anywhere?
>
>
> ------------------------------------------------------------------------------
> Got Input?   Slashdot Needs You.
> Take our quick survey online.  Come on, we don't ask for help often.
> Plus, you'll get a chance to win $100 to spend on ThinkGeek.
> http://p.sf.net/sfu/slashdot-survey
> _______________________________________________
> Htmlparser-user mailing list
> Htm...@li...
> https://lists.sourceforge.net/lists/listinfo/htmlparser-user
>