[Htmlparser-user] Could you help me?
Brought to you by:
derrickoswald
|
From: h p. <hp...@gm...> - 2006-07-31 03:35:57
|
Hi all, I have a question when I parsered the html content. In the html
content there are many tags, if I want to get a tag text like LinkTag or
TableTag , it's very easy to use the LinkRegexFilter or TagNameFilter, but
if I want to get more than one tag's content , is there a filter chain ?
Maybe the example following will explain what I said directly:
<div id=3D"video_infobox_con">
=B7add by:<span class=3D"fcolor_03">2006.07.27 - 01:22</span><br />
=B7Label:
<a href=3D"search.do?q=3D%B0%CD%B6%FB%C4%E1%D1%C7%C4%E1"
class=3D"lnk_04" target=3D_self><u>test_a</u></a>
<a href=3D"search.do?q=3D%D7%B4%D4%AA%D0%E3" class=3D"lnk_=
04"
target=3D_self><u>test_b</u></a>
<a href=3D"search.do?q=3D%C0%BA%C7%F2" class=3D"lnk_04"
target=3D_self><u>test_c</u></a>
<a href=3D"search.do?q=3D%CC%E5%D3%FD" class=3D"lnk_04"
target=3D_self><u>test_d</u></a>
</div>
<input type=3D"text" id=3D"htmlurl" name=3D"htmlurl" value=3D'value_test' =
/>
there are four tags such as div, span, a ,input, and all content in these
tags are what I need like 2006.07.27 - 01:22, test_a, test_b, test_c,
test_d and value_test
How should I do? Maybe I can parser the html for 4 times to get the four
tags' content, but I think it'll impact the proformance. Could you help me =
?
Thank you very much.
Best Regards
Jesse
|