Re: [Htmlparser-user] StringBean: Removing unwanted links
Brought to you by:
derrickoswald
From: Derrick O. <Der...@Ro...> - 2006-05-08 00:03:48
|
Riaz, You will probably need to use a filter to pick out the content you want. Run the FilterBuilder tool (bin/filterbuilder) and create a filter that gets the content you want. It has a little help and a tutorial to get you going. Then use the filter code generated by the tool and pass it to a FilterBean, which has a convenience method, called getText() I think, that will apply a StringBean to the results of the filter. Derrick Riaz uddin wrote: > Hi, > > I have this code snippet from htmlparser.sourcefourge.net for StringBean: > >StringBean sb = new StringBean (); > sb.setLinks (false); > sb.setReplaceNonBreakingSpaces (true); > sb.setCollapse (true); > sb.setURL ("http://news.yahoo.com/s/ap/20060507/ap_on_re_mi_ea/iraq;_ylt=AoeY5mkiWMfGQ8KbE6W5xxas0NUE;_ylu=X3oDMTA2Z2szazkxBHNlYwN0bQ--"); // the HTTP is performed here > String s = sb.getStrings (); > > How can I get rid of other text and get only the news content from > this URL? > The unwanted text(links) are like: 'Home', 'U.S.', etc appearing in > the output. > > |