Re: [Htmlparser-user] Could you help me?
Brought to you by:
derrickoswald
From: Derrick O. <Der...@Ro...> - 2006-07-31 04:52:07
|
Sorry, replied without thinking. You can apply the StringBean directly to a node list: Parser parser = new Parser ("http://yadda.yadda"); NodeList list = parser.parse (my_spiffo_DIV_finding_filter); Div div = list.elementAt (0); StringBean bean = new StringBean (); div.getChildren ().visitAllNodesWith (bean); System.out.println (bean.getStrings ()); Derrick Derrick Oswald wrote: >Jesse, > >The job breaks down into two tasks: > 1) get the outermost tag (your <div id="video_infobox_con"> tag) using >a filter you construct. > 2) use a StringBean as a visitor on that node and it's children to >extract the text, like so: > >Parser parser = new Parser ("http://yadda.yadda"); >NodeList list = parser.parse (my_spiffo_DIV_finding_filter); >Div div = list.elementAt (0); >// now re-create the HTML and pass it into another Parser >Parser parser = new Parser (div.toHtml ()); // Note: for older versions >you need to use setInputHtml() >StringBean bean = new StringBean (); >parser.visitAllNodesWith (bean); >System.out.println (bean.getStrings ()); > >Derrick > >h pq wrote: > > > >>Hi all, I have a question when I parsered the html content. In the >>html content there are many tags, if I want to get a tag text like >>LinkTag or TableTag , it's very easy to use the LinkRegexFilter or >>TagNameFilter, but if I want to get more than one tag's content , is >>there a filter chain ? Maybe the example following will explain what >>I said directly: >> >> <div id="video_infobox_con"> >> ·add by:<span class="fcolor_03">2006.07.27 - 01:22</span><br /> >> ·Label: >> <a href="search.do?q=%B0%CD%B6%FB%C4%E1%D1%C7%C4%E1" >>class="lnk_04" target=_self><u>test_a</u></a> >> >> <a href="search.do?q=%D7%B4%D4%AA%D0%E3" >>class="lnk_04" target=_self><u>test_b</u></a> >> >> <a href=" search.do?q=%C0%BA%C7%F2" class="lnk_04" >>target=_self><u>test_c</u></a> >> >> <a href="search.do?q=%CC%E5%D3%FD" class="lnk_04" >>target=_self><u>test_d</u></a> >> >> </div> >><input type="text" id="htmlurl" name="htmlurl" value='value_test' /> >> >>there are four tags such as div, span, a ,input, and all content in >>these tags are what I need like 2006.07.27 - 01:22, test_a, test_b, >> test_c, test_d and value_test >>How should I do? Maybe I can parser the html for 4 times to get the >>four tags' content, but I think it'll impact the proformance. Could >>you help me ? Thank you very much. >> >>Best Regards >>Jesse >> >> >>------------------------------------------------------------------------ >> >>------------------------------------------------------------------------- >>Take Surveys. Earn Cash. Influence the Future of IT >>Join SourceForge.net's Techsay panel and you'll get the chance to share your >>opinions on IT & business topics through brief surveys -- and earn cash >>http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV >> >>------------------------------------------------------------------------ >> >>_______________________________________________ >>Htmlparser-user mailing list >>Htm...@li... >>https://lists.sourceforge.net/lists/listinfo/htmlparser-user >> >> >> >> > > >------------------------------------------------------------------------- >Take Surveys. Earn Cash. Influence the Future of IT >Join SourceForge.net's Techsay panel and you'll get the chance to share your >opinions on IT & business topics through brief surveys -- and earn cash >http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV >_______________________________________________ >Htmlparser-user mailing list >Htm...@li... >https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > |