Re: [Htmlparser-user] finding meta data

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Murat,

I'm not sure what you mean by 'pure' text.
The stringextractor program uses the StringBean under the hood.
It only collects text which would be presented in a browser - or at 
least it's supposed to.
The stringextractor program has an option (-links) to output the links 
within angle brackets. Make sure this is not used.
If you want to remove text within <a></a> pairs you will need to 
override the default LinkTag to not do this and register it with the 
PrototypicalNodeFactory.

Derrick

kavorka wrote:

> Hi Oswald,
> I have another question. In HTMLPARSER, is it possible to extract only 
> the text in the webpage. In the stringextractor program, it extract 
> also link text in the page, i want to extract "pure" text. can i do it?
> thanks
> Murat
>