Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.

Close

#15 Exception handling large text/CDATA sections

v1.03
closed-fixed
Yuval Oren
JFlex (1)
5
2004-07-11
2003-02-25
Tom Roehl
No

If your document has a very large text/CDATA section you get the following exception:

java.lang.ArrayIndexOutOfBoundsException: 16384
at com.bluecast.xml.PiccoloLexer.yynextChar(Unknown Source)
at com.bluecast.xml.PiccoloLexer.parseCdataSection(Unknown Source)
at com.bluecast.xml.PiccoloLexer.yylex(Unknown Source)
at com.bluecast.xml.Piccolo.yylex(Unknown Source)
at com.bluecast.xml.Piccolo.yyparse(Unknown Source)
at com.bluecast.xml.Piccolo.parse(Unknown Source)
at javax.xml.parsers.SAXParser.parse(SAXParser.java:345)

It appears that the parser uses a fixed 16k buffer for the yacc/lexx and will fail if the text section is larger than this.

Discussion



  • 2003-03-06

    Logged In: YES
    user_id=332435

    I am experiencing the same problem on our system, with
    large CDATA sections. The problem has not been fixed in
    1.04b. I unfortunantley havent built Piccolo myself, so I
    cannot fix this myself or create a patch.
    Have to resort to Xerces/Crimson until fixed.

     


  • 2003-03-07

    Logged In: YES
    user_id=332435

    I have tried to parse a large Non Cdata section, wich worked
    ok.

     
  • Steve Magoun
    Steve Magoun
    2004-01-30

    Logged In: YES
    user_id=137219

    The parser's supposed to increase the buffer size, but it never
    happens. The bug seems to be in PiccoloLexer.java, in the
    yy_refill() method. There's an if() that compares yy_currentPos to
    yy_buffer.length, but that's not correct - for very long CDATAs it
    needs to compare yy_markedPos to buffer.length. I'm not sure
    whether the yy_currentPos comparison is valid in other cases, so
    we left it in. We also changed the size of the new buffer to be
    yy_buffer_length*2, not yy_currentPos*2 (since yy_currentPos can
    be 0).

    Now it turns out that yy_refill() isn't Piccolo code per se; it's put
    there by JFlex, so the real bug is in JFlex. The following patch will
    fix things, though, if all you want to do is recompile Piccolo (this is
    against Piccolo 1.4b):

    --- PiccoloLexer.java Sun Jul 7 14:21:18 2002
    +++ PiccoloLexer copy.java Fri Jan 30 15:07:44 2004
    @@ -3291,9 +3291,10 @@
    }

    /* is the buffer big enough? */
    - if (yy_currentPos >= yy_buffer.length) {
    + if (yy_currentPos >= yy_buffer.length)
    + || yy_markedPos >= yy_buffer.length) {
    /* if not: blow it up */
    - char newBuffer[] = new char[yy_currentPos*2];
    + char newBuffer[] = new char[yy_buffer.length*2];
    System.arraycopy(yy_buffer, 0, newBuffer, 0,
    yy_buffer.length);
    yy_buffer = newBuffer;
    }

     
  • Steve Magoun
    Steve Magoun
    2004-01-30

    Logged In: YES
    user_id=137219

    The patch should really be applied to jflex-skeleton.piccolo and
    jflex-skeleton2 if you're doing a full build of the parser (not just
    compile-only like we were)

     
  • Yuval Oren
    Yuval Oren
    2004-07-11

    • assigned_to: nobody --> yuvalo
    • status: open --> closed-fixed
     
  • Yuval Oren
    Yuval Oren
    2004-07-11

    Logged In: YES
    user_id=479054

    This bug has been fixed in the latest release. If you find it's
    still a problem, please re-open or submit a new bug.