[Htmlparser-user] Could you help me?
Brought to you by:
derrickoswald
From: h p. <hp...@gm...> - 2006-07-31 03:35:57
|
Hi all, I have a question when I parsered the html content. In the html content there are many tags, if I want to get a tag text like LinkTag or TableTag , it's very easy to use the LinkRegexFilter or TagNameFilter, but if I want to get more than one tag's content , is there a filter chain ? Maybe the example following will explain what I said directly: <div id=3D"video_infobox_con"> =B7add by:<span class=3D"fcolor_03">2006.07.27 - 01:22</span><br /> =B7Label: <a href=3D"search.do?q=3D%B0%CD%B6%FB%C4%E1%D1%C7%C4%E1" class=3D"lnk_04" target=3D_self><u>test_a</u></a> <a href=3D"search.do?q=3D%D7%B4%D4%AA%D0%E3" class=3D"lnk_= 04" target=3D_self><u>test_b</u></a> <a href=3D"search.do?q=3D%C0%BA%C7%F2" class=3D"lnk_04" target=3D_self><u>test_c</u></a> <a href=3D"search.do?q=3D%CC%E5%D3%FD" class=3D"lnk_04" target=3D_self><u>test_d</u></a> </div> <input type=3D"text" id=3D"htmlurl" name=3D"htmlurl" value=3D'value_test' = /> there are four tags such as div, span, a ,input, and all content in these tags are what I need like 2006.07.27 - 01:22, test_a, test_b, test_c, test_d and value_test How should I do? Maybe I can parser the html for 4 times to get the four tags' content, but I think it'll impact the proformance. Could you help me = ? Thank you very much. Best Regards Jesse |