Matt Furman - 2011-09-08

I also ran into this issue and "fixed" it locally...

It appears to be a flaw with addByte within Lexer.java. The function assumes that the buffer only gets examined one byte at a time, however in the CDATA function, the call to TidyUtils.getString passes in a length that is greater than 1. I overloaded the appropriate functions to allow to pass in the size the buffer needs to grow by.

public void addByte(int c) {
addByte(c, 1);
}

/**
 * Adds a byte to lexer buffer.
 * @param c byte to add
 */
public void addByte(int c, int size)
{
    if (this.lexsize + size >= this.lexlength)
    {
        while (this.lexsize + size >= this.lexlength)
        {
            if (this.lexlength == 0)
            {
                this.lexlength = 8192;
            }
            else
            {
                this.lexlength = this.lexlength * 2;
            }
        }

        byte[] temp = this.lexbuf;
        this.lexbuf = new byte[this.lexlength];
        if (temp != null)
        {
            System.arraycopy(temp, 0, this.lexbuf, 0, temp.length);
            updateNodeTextArrays(temp, this.lexbuf);
        }
    }

    this.lexbuf[this.lexsize++] = (byte) c;
    this.lexbuf[this.lexsize] = (byte) '\0'; // debug
}

Once I changed the necessary associated functions, it seemed to do the trick.