|
From: Gilles D. <gr...@sc...> - 2002-03-11 23:38:34
|
According to Geoff Hutchison: > On Friday, March 8, 2002, at 05:20 PM, Jim Cole wrote: > > It does look like there is a problem with the parser. If a '<' > > occurs in a script element, it appears that the parser becomes > > somewhat confused with regard to the remaining document content. > > For example > > Yes, this sounds like a bug to me. Actually, the <script> sections and > probably other sections as well should be simply skipped by the parser. > Right now the code does this: > > > case 29: // "script" > > noindex |= TAGscript; > > nofollow |= TAGscript; > > break; > > In short, the parser doesn't *index* the bits inside <script></script> > tags, but it does *look* at them. So it hit that "<" character and > figured it was a new tag. > > I would think that we want to treat <script> and probably <style> > sections like comments--find the ending tag and completely ignore > everything inside. I think your assessment of the problem, and proposed solution, are both bang-on. The stuff between the <script> and </script> tag should be stripped out entirely and not parsed for HTML tags. Of course, you can avoid this problem in your HTML if you properly put inline JavaScript code inside an HTML comment. E.g.: <script> <!-- JavaScript code here // --> </script> I'm amazed at how frequently people/programs fail to do this. It's what you're supposed to do to avoid problems with non-JavaScript-aware web clients. -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 |