[Htmlparser-user] Finding a whole word
Brought to you by:
derrickoswald
From: Jay K. <jy...@eq...> - 2006-05-28 19:10:50
|
Hi, =20 I'm trying to get the word count using htmlparser, but it doesn't seem to be able to handle the following example. Let's say the source html looks like this: =20 <HTML> <head> <title>Test HTML</title> </head> <body> <p>AAAAA BBBBB AAA<font color=3D'red'>AA</font> BBBBB AAAAA</p> </body> </HTML> =20 And, if you load it in a browser, you'll see the word 'AAAAA' three times.=20 But, if you parse this html, it returns following nodes: =20 AAAAA BBBBB AAA AA BBBBB AAAAA =20 So, it breaks down the second 'AAAAA' into two words because of the font tag in the middle. And, the word count from the parsed text would be "2". Is there any way that I can get the same text/string/word that I see on the browser? =20 Thanks, =20 Jay =20 |