Re: [Htmlparser-user] Microsoft's ugly web page generation and parsing
Brought to you by:
derrickoswald
From: Somik R. <so...@ya...> - 2002-07-16 00:19:06
|
Hi Cedric I couldnt figure out your bug report. On parsing the pages, the = output seemed (prima facie) to be correct.=20 Can you specifically give the input that we should try with, and = what the actual output should be, and also post what you are getting. = Alternatively, tell me which lines in the page are not being parsed = correctly. Thanks. Regards, Somik ----- Original Message -----=20 From: C=E9dric Rosa=20 To: htm...@li...=20 Sent: Monday, July 15, 2002 11:41 PM Subject: [Htmlparser-user] Microsoft's ugly web page generation and = parsing Hello, Simply try to parse this ugly document for example: www.cevipof.msh-paris.fr\moment\ref6.htm (2,6Mo !!!!!) The text which result from the parse contains lines like : "" <![endif]--><!--[if supportFields]>" "v\:* {behavior:url(#default#VML);}" "mso-font-pitch:variable;" I think there is a problem in text detection when several tags are = imbricated. The solution will be maybe to skip all text after "<!--" until "-->". I don't have time to patch the code. If someone can fix this problem, = it=20 will be fantastic. Thanks by advance, Cedric Rosa. ------------------------------------------------------- This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf _______________________________________________ Htmlparser-user mailing list Htm...@li... https://lists.sourceforge.net/lists/listinfo/htmlparser-user |