Thread: [Htmlparser-user] Presentation tags

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi there,

I use Aperture to extract text which runs Htmlparser when processing HTML.  
My question relates to the handling of presentation tags such as <u>, <b>, <i> when embedded within words - for example:

   <html><body><u>north</u>ern</body></html>

What I would expect is that I should be delivered the word "northern" - but instead I get two tokens: "north" and "ern", which is clearly wrong in this context.
It seems that Htmlparser is replacing tags with whitespace - why is this? 

Thanks for any help.

- Chris

Thread: [Htmlparser-user] Presentation tags

htmlparser-user