[Htmlparser-user] Microsoft's ugly web page generation and parsing
Brought to you by:
derrickoswald
From: R. <ced...@fr...> - 2002-07-15 14:40:45
|
Hello, Simply try to parse this ugly document for example: www.cevipof.msh-paris.fr\moment\ref6.htm (2,6Mo !!!!!) The text which result from the parse contains lines like : "" <![endif]--><!--[if supportFields]>" "v\:* {behavior:url(#default#VML);}" "mso-font-pitch:variable;" I think there is a problem in text detection when several tags are imbricated. The solution will be maybe to skip all text after "<!--" until "-->". I don't have time to patch the code. If someone can fix this problem, it will be fantastic. Thanks by advance, Cedric Rosa. |