Hi Cedric
I couldnt figure out your bug report. On parsing the pages, the =
output seemed (prima facie) to be correct.=20
Can you specifically give the input that we should try with, and =
what the actual output should be, and also post what you are getting. =
Alternatively, tell me which lines in the page are not being parsed =
correctly.
Thanks.
Regards,
Somik
----- Original Message -----=20
From: C=E9dric Rosa=20
To: htm...@li...=20
Sent: Monday, July 15, 2002 11:41 PM
Subject: [Htmlparser-user] Microsoft's ugly web page generation and =
parsing
Hello,
Simply try to parse this ugly document for example:
www.cevipof.msh-paris.fr\moment\ref6.htm (2,6Mo !!!!!)
The text which result from the parse contains lines like :
"" <![endif]--><!--[if supportFields]>"
"v\:* {behavior:url(#default#VML);}"
"mso-font-pitch:variable;"
I think there is a problem in text detection when several tags are =
imbricated.
The solution will be maybe to skip all text after "<!--" until "-->".
I don't have time to patch the code. If someone can fix this problem, =
it=20
will be fantastic.
Thanks by advance,
Cedric Rosa.
-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
Htmlparser-user mailing list
Htm...@li...
https://lists.sourceforge.net/lists/listinfo/htmlparser-user
|