Menu

#2012 Javascript in CDATA block causes EvaluatorException: illegally formed XML syntax

2.33
closed
RBRi
None
1
2019-02-24
2019-02-11
Ron HD
No

Javascript code wrapped in an XML CDATA block within a <script> tag on a web page causes HtmlUnit to throw the following exception:

net.sourceforge.htmlunit.corejs.javascript.EvaluatorException: illegally formed XML syntax

This can be replicated with the following Java code:

(new net.sourceforge.htmlunit.corejs.javascript.Parser()).parse("&lt;![CDATA[obj1.obj2.func1();]]>", "", 1);

This occurs on web pages that use Oracle ADF (I believe this is the same problem reported in Bug #1991). These web pages are accepted by Firefox and other major browsers.

Tracing through the code, I can see that the TokenStream.getNextXMLToken() method successfully scans over the CDATA block, but then rather than processing the contents of the block, it just does:

parser.addError("msg.XML.bad.form");

I'm not sure why it does this, or the right way to fix it. With a little guidance, I may be able to provide a fix. Or if someone can give me one, it would be great. Otherwise, I'm being forced to use Selenium with Firefox, which is introducing other problems.

Discussion

  • Ron HD

    Ron HD - 2019-02-11

    BTW, I don't know why SourceForge trashed the formatting. I didn't enter the text all globbed together like that. But there doesn't seem to be a way to edit it.

     
  • RBRi

    RBRi - 2019-02-24
    • Description has changed:

    Diff:

    --- old
    +++ new
    @@ -1,10 +1,10 @@
    -Javascript code  wrapped in an XML CDATA block within a <script> tag on a web page causes HtmlUnit to throw the following exception:
    +Javascript code  wrapped in an XML CDATA block within a &lt;script> tag on a web page causes HtmlUnit to throw the following exception:
    
     > net.sourceforge.htmlunit.corejs.javascript.EvaluatorException: illegally formed XML syntax
    
     This can be replicated with the following Java code:
    
    -`(new net.sourceforge.htmlunit.corejs.javascript.Parser()).parse("<![CDATA[obj1.obj2.func1();]]>", "", 1);`
    +`(new net.sourceforge.htmlunit.corejs.javascript.Parser()).parse("&lt;![CDATA[obj1.obj2.func1();]]>", "", 1);`
    
     This occurs on web pages that use Oracle ADF (I believe this is the same problem reported in Bug #1991). These web pages are accepted by Firefox and other major browsers.
    
     
  • RBRi

    RBRi - 2019-02-24

    As always it took some time to work on this. From my point of view this has to be handle by the parser. Have done a fix because the neko parser can already handle this.
    Please have a look at twitter (https://twitter.com/HtmlUnit). Will inform about a new snapshot if available.

     
  • RBRi

    RBRi - 2019-02-24

    Same problem with style declarations.

     
  • RBRi

    RBRi - 2019-02-24
    • status: open --> closed
    • assigned_to: RBRi
     
  • RBRi

    RBRi - 2019-02-24

    Should be fixed now. Will inform via twitter (https://twitter.com/HtmlUnit) if a new snapshot is avalialable.
    Please check if your problem is gone.

    Your cases
    (new net.sourceforge.htmlunit.corejs.javascript.Parser()).parse("&lt;![CDATA[obj1.obj2.func1();]]>", "", 1);
    will still fail, because the CDATA processing is done by the (X)Html parser.

    Reopen this if it is still not working.

     

Log in to post a comment.

Auth0 Logo