|
From: Geoff H. <ghu...@ws...> - 2002-03-09 19:36:02
|
On Friday, March 8, 2002, at 05:20 PM, Jim Cole wrote: > It does look like there is a problem with the parser. If a '<' > occurs in a script element, it appears that the parser becomes > somewhat confused with regard to the remaining document content. > For example Yes, this sounds like a bug to me. Actually, the <script> sections and probably other sections as well should be simply skipped by the parser. Right now the code does this: > case 29: // "script" > noindex |= TAGscript; > nofollow |= TAGscript; > break; In short, the parser doesn't *index* the bits inside <script></script> tags, but it does *look* at them. So it hit that "<" character and figured it was a new tag. I would think that we want to treat <script> and probably <style> sections like comments--find the ending tag and completely ignore everything inside. -- -Geoff Hutchison Williams Students Online http://wso.williams.edu/ |