Re: [Htmlparser-user] finding meta data
Brought to you by:
derrickoswald
From: Derrick O. <Der...@Ro...> - 2006-07-29 11:14:28
|
Murat, I'm not sure what you mean by 'pure' text. The stringextractor program uses the StringBean under the hood. It only collects text which would be presented in a browser - or at least it's supposed to. The stringextractor program has an option (-links) to output the links within angle brackets. Make sure this is not used. If you want to remove text within <a></a> pairs you will need to override the default LinkTag to not do this and register it with the PrototypicalNodeFactory. Derrick kavorka wrote: > Hi Oswald, > I have another question. In HTMLPARSER, is it possible to extract only > the text in the webpage. In the stringextractor program, it extract > also link text in the page, i want to extract "pure" text. can i do it? > thanks > Murat > |