Re: [Htmlparser-user] Could you help me?
Brought to you by:
derrickoswald
From: Derrick O. <Der...@Ro...> - 2006-07-31 04:47:16
|
Jesse, The job breaks down into two tasks: 1) get the outermost tag (your <div id="video_infobox_con"> tag) using a filter you construct. 2) use a StringBean as a visitor on that node and it's children to extract the text, like so: Parser parser = new Parser ("http://yadda.yadda"); NodeList list = parser.parse (my_spiffo_DIV_finding_filter); Div div = list.elementAt (0); // now re-create the HTML and pass it into another Parser Parser parser = new Parser (div.toHtml ()); // Note: for older versions you need to use setInputHtml() StringBean bean = new StringBean (); parser.visitAllNodesWith (bean); System.out.println (bean.getStrings ()); Derrick h pq wrote: > Hi all, I have a question when I parsered the html content. In the > html content there are many tags, if I want to get a tag text like > LinkTag or TableTag , it's very easy to use the LinkRegexFilter or > TagNameFilter, but if I want to get more than one tag's content , is > there a filter chain ? Maybe the example following will explain what > I said directly: > > <div id="video_infobox_con"> > ·add by:<span class="fcolor_03">2006.07.27 - 01:22</span><br /> > ·Label: > <a href="search.do?q=%B0%CD%B6%FB%C4%E1%D1%C7%C4%E1" > class="lnk_04" target=_self><u>test_a</u></a> > > <a href="search.do?q=%D7%B4%D4%AA%D0%E3" > class="lnk_04" target=_self><u>test_b</u></a> > > <a href=" search.do?q=%C0%BA%C7%F2" class="lnk_04" > target=_self><u>test_c</u></a> > > <a href="search.do?q=%CC%E5%D3%FD" class="lnk_04" > target=_self><u>test_d</u></a> > > </div> > <input type="text" id="htmlurl" name="htmlurl" value='value_test' /> > > there are four tags such as div, span, a ,input, and all content in > these tags are what I need like 2006.07.27 - 01:22, test_a, test_b, > test_c, test_d and value_test > How should I do? Maybe I can parser the html for 4 times to get the > four tags' content, but I think it'll impact the proformance. Could > you help me ? Thank you very much. > > Best Regards > Jesse > > >------------------------------------------------------------------------ > >------------------------------------------------------------------------- >Take Surveys. Earn Cash. Influence the Future of IT >Join SourceForge.net's Techsay panel and you'll get the chance to share your >opinions on IT & business topics through brief surveys -- and earn cash >http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > >------------------------------------------------------------------------ > >_______________________________________________ >Htmlparser-user mailing list >Htm...@li... >https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > |