Re: [Htmlparser-user] Could you help me?

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Sorry, replied without thinking.
You can apply the StringBean directly to a node list:

Parser parser = new Parser ("http://yadda.yadda");
NodeList list = parser.parse (my_spiffo_DIV_finding_filter);
Div div = list.elementAt (0);
StringBean bean = new StringBean ();
div.getChildren ().visitAllNodesWith (bean);
System.out.println (bean.getStrings ());

Derrick

Derrick Oswald wrote:

>Jesse,
>
>The job breaks down into two tasks:
>  1) get the outermost tag (your <div id="video_infobox_con"> tag) using 
>a filter you construct.
>  2) use a StringBean as a visitor on that node and it's children to 
>extract the text, like so:
>
>Parser parser = new Parser ("http://yadda.yadda");
>NodeList list = parser.parse (my_spiffo_DIV_finding_filter);
>Div div = list.elementAt (0);
>// now re-create the HTML and pass it into another Parser
>Parser parser = new Parser (div.toHtml ()); // Note: for older versions 
>you need to use setInputHtml()
>StringBean bean = new StringBean ();
>parser.visitAllNodesWith (bean);
>System.out.println (bean.getStrings ());
>
>Derrick
>
>h pq wrote:
>
>  
>
>>Hi all, I have a question when I parsered the html content.  In the 
>>html content there are many tags, if I want to get a tag text like 
>>LinkTag or TableTag , it's very easy to use the LinkRegexFilter or 
>>TagNameFilter, but if I want to get more than one tag's content , is 
>>there a filter chain ?  Maybe the example following will explain what 
>>I said directly:
>> 
>> <div id="video_infobox_con">
>>    ·add by:<span class="fcolor_03">2006.07.27 - 01:22</span><br />
>>    ·Label: 
>>                 <a href="search.do?q=%B0%CD%B6%FB%C4%E1%D1%C7%C4%E1" 
>>class="lnk_04" target=_self><u>test_a</u></a>              
>>              
>>                 <a href="search.do?q=%D7%B4%D4%AA%D0%E3" 
>>class="lnk_04" target=_self><u>test_b</u></a>              
>>              
>>                 <a href=" search.do?q=%C0%BA%C7%F2" class="lnk_04" 
>>target=_self><u>test_c</u></a>              
>>              
>>                 <a href="search.do?q=%CC%E5%D3%FD" class="lnk_04" 
>>target=_self><u>test_d</u></a>              
>>              
>> </div>
>><input type="text" id="htmlurl" name="htmlurl" value='value_test'  />
>> 
>>there are four tags such as div, span, a ,input, and  all content in 
>>these tags are what I need like 2006.07.27 - 01:22,  test_a,  test_b,  
>> test_c,  test_d and value_test
>>How should I do?  Maybe I can parser the html for 4 times to get the 
>>four tags' content, but I think it'll impact the proformance. Could 
>>you help me ? Thank you very much.
>> 
>>Best Regards
>>Jesse
>> 
>>
>>------------------------------------------------------------------------
>>
>>-------------------------------------------------------------------------
>>Take Surveys. Earn Cash. Influence the Future of IT
>>Join SourceForge.net's Techsay panel and you'll get the chance to share your
>>opinions on IT & business topics through brief surveys -- and earn cash
>>http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
>>
>>------------------------------------------------------------------------
>>
>>_______________________________________________
>>Htmlparser-user mailing list
>>Htm...@li...
>>https://lists.sourceforge.net/lists/listinfo/htmlparser-user
>> 
>>
>>    
>>
>
>
>-------------------------------------------------------------------------
>Take Surveys. Earn Cash. Influence the Future of IT
>Join SourceForge.net's Techsay panel and you'll get the chance to share your
>opinions on IT & business topics through brief surveys -- and earn cash
>http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
>_______________________________________________
>Htmlparser-user mailing list
>Htm...@li...
>https://lists.sourceforge.net/lists/listinfo/htmlparser-user
>
>  
>