Re: [Htmlparser-user] Could you help me?
Brought to you by:
derrickoswald
|
From: Derrick O. <Der...@Ro...> - 2006-07-31 04:47:16
|
Jesse,
The job breaks down into two tasks:
1) get the outermost tag (your <div id="video_infobox_con"> tag) using
a filter you construct.
2) use a StringBean as a visitor on that node and it's children to
extract the text, like so:
Parser parser = new Parser ("http://yadda.yadda");
NodeList list = parser.parse (my_spiffo_DIV_finding_filter);
Div div = list.elementAt (0);
// now re-create the HTML and pass it into another Parser
Parser parser = new Parser (div.toHtml ()); // Note: for older versions
you need to use setInputHtml()
StringBean bean = new StringBean ();
parser.visitAllNodesWith (bean);
System.out.println (bean.getStrings ());
Derrick
h pq wrote:
> Hi all, I have a question when I parsered the html content. In the
> html content there are many tags, if I want to get a tag text like
> LinkTag or TableTag , it's very easy to use the LinkRegexFilter or
> TagNameFilter, but if I want to get more than one tag's content , is
> there a filter chain ? Maybe the example following will explain what
> I said directly:
>
> <div id="video_infobox_con">
> ·add by:<span class="fcolor_03">2006.07.27 - 01:22</span><br />
> ·Label:
> <a href="search.do?q=%B0%CD%B6%FB%C4%E1%D1%C7%C4%E1"
> class="lnk_04" target=_self><u>test_a</u></a>
>
> <a href="search.do?q=%D7%B4%D4%AA%D0%E3"
> class="lnk_04" target=_self><u>test_b</u></a>
>
> <a href=" search.do?q=%C0%BA%C7%F2" class="lnk_04"
> target=_self><u>test_c</u></a>
>
> <a href="search.do?q=%CC%E5%D3%FD" class="lnk_04"
> target=_self><u>test_d</u></a>
>
> </div>
> <input type="text" id="htmlurl" name="htmlurl" value='value_test' />
>
> there are four tags such as div, span, a ,input, and all content in
> these tags are what I need like 2006.07.27 - 01:22, test_a, test_b,
> test_c, test_d and value_test
> How should I do? Maybe I can parser the html for 4 times to get the
> four tags' content, but I think it'll impact the proformance. Could
> you help me ? Thank you very much.
>
> Best Regards
> Jesse
>
>
>------------------------------------------------------------------------
>
>-------------------------------------------------------------------------
>Take Surveys. Earn Cash. Influence the Future of IT
>Join SourceForge.net's Techsay panel and you'll get the chance to share your
>opinions on IT & business topics through brief surveys -- and earn cash
>http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
>
>------------------------------------------------------------------------
>
>_______________________________________________
>Htmlparser-user mailing list
>Htm...@li...
>https://lists.sourceforge.net/lists/listinfo/htmlparser-user
>
>
|