parse ecmascript

Brought to you by: derrickoswald

#12 parse ecmascript

Milestone: Next Version

Status: open

Owner: nobody

Labels: Standards Compliance (8)

Priority: 5

Updated: 2004-05-22

Created: 2004-01-29

Creator: JozefHovan

Private: No

Hi,

if I created HTML file with javadoc comment containing
one apostrophe, parsing of HTML isn't correct. In
example BODY tag isn't parsed.

Same situation is, when I use \' in JavaScript code.

I am sending test case in attachement.

Jozef

Discussion

JozefHovan - 2004-01-29

Test Case

AposTestCase.java

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Derrick Oswald - 2004-01-30

assigned_to: nobody --> derrickoswald

labels: --> 446199
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Derrick Oswald - 2004-01-30

Logged In: YES
user_id=605407

Thanks for the test case.
I'm not sure what the correct behaviour is though.
Script parsing is 'quote-smart', meaning it balances quotes.
In the case of your example:
<HTML><SCRIPT LANGUAGE="Javascript">//'</SCRIPT><BODY>
</BODY></HTML>
there is no closing quote.
We have test cases like:
<SCRIPT>document.write(\"</script>\");</SCRIPT>
that need to ignore the first </script>, and they do this by
balancing the quotes around the tag.
Can you think of a rule that would allow correct parsing of
both cases?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

JozefHovan - 2004-01-30

Logged In: YES
user_id=962414

I think, in script Tag you need to parse comment, because I
had this javascript comment, which doesn't work in your parser:

<SCRIPT>
// It's problem
</SCRIPT>

This comment in browser works, in HTMLParser doesn't.

Also I found a problem with javascript code like this:
<SCRIPT>
var x='text with one apostrophe \' '
</SCRIPT>

I think you will need to ignore in SCRIPT tag sequence with \'.

If you have other questions, i will be apprecited, if you
ask me.

Jozef

P.S.: Maybe it is better to don't parse content of SCRIPT
tag like quote-smart.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

JozefHovan - 2004-01-30

Logged In: YES
user_id=962414

Hmmm, bug report system uses HTML in messages, so I send my
comment also in attachement

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

JozefHovan - 2004-01-30

htmlparser.txt

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Derrick Oswald - 2004-01-31

assigned_to: derrickoswald --> nobody

summary: One apostrophe in Javascript comment --> parse ecmascript

labels: 446199 -->

milestone: 301581 -->
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Derrick Oswald - 2004-01-31

Logged In: YES
user_id=605407

was: Bug: One apostrophe in Javascript comment

The problem of handling quotes and tags embedded in
<SCRIPT> tags has come up over and over again. This
needs to be resolved satisfactorily, once and for all.

I propose adding code in the ScriptScanner to drop down into
an <A
HREF="http://www.ecma-international.org/publications/files/ecma-st/Ecma-262.pdf">ECMAScript</A>
parser and actually read the script to determine where the
lexer/parser should resume.

For htmllexer.jar this can be handled by a simple parser
that understands double and single line ECMAScript comments,
plus escape slashes on single and double quotes.
Consideration should be given to adding another type of
node, a 'CodeNode', so programs can differentiate between
StringNodes containing text the user would see in a browser
and script.
A full parser using <A HREF="http://antlr.org/">Antlr</A> or
<A HREF="https://javacc.dev.java.net/">JavaCC</A> can be
integrated into htmlparser.jar, to provide full script control.
A 'free' ECMAScript grammar for JavaCC is <A
HREF="http://www.lugrin.ch/fesi/index.html">FESI</A>.
An apparently aborted attempt to create a SableCC grammar
for ECMAScript is <A
HREF="http://sourceforge.net/projects/scriptonite/">Scriptonite</A>.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Derrick Oswald - 2004-05-22

milestone: --> Next Version

labels: --> Standards Compliance
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Trejkaz - 2006-03-08

Logged In: YES
user_id=639492

Skipping over everything inside script comments would
probably solve it.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.