Migrate from GitHub to SourceForge with this tool. Check out all of SourceForge's recent improvements.
Close

#22 Example code causes StringIndexOutOfBounds Exception

closed
nobody
None
5
2006-12-13
2005-02-23
Anonymous
No

Using the following code:

FileReader fr = new
FileReader("/home/gavares/amazon.html");
String test = "<html><body>asd;lfkasdjl;fj</html>";
TolerantSaxDocumentBuilder
tolerantSaxDocumentBuilder = new
TolerantSaxDocumentBuilder(XMLUnit.getTestParser());
HTMLDocumentBuilder htmlDocumentBuilder = new
HTMLDocumentBuilder(tolerantSaxDocumentBuilder);
Document wellFormedDocument =
htmlDocumentBuilder.parse(fr);

which I pulled from your site, and running the code on
amazon.com's main page generates an exception at line
123 in TolerantSaxDocumentBuilder.java:

String characters = new String(data, start,end);

This is because a scenario occurs where data = {'>',' '}
When new String(data, start, end ) is called, the
OutOfBounds index is 3. I think the string is
constructed like:

new String(char[] c, start, end)
{
end -= 1;
while(start <= end)
{
//new string code here
}
.
.
.
}

This will result in index out of bounds. I modified
TolerantSaxDocumentBuilder.characters(char[] data,
start, end as follows:

public void characters(char[] data, int start, int end) {
String characterData = ""+data[start++];

while\(start &lt; end\)
\{
    characterData += "" + data\[start++\];
\}

//String characterData = new String(data,
start, end);
trace("characters:" + characterData);
if (currentElement == null) {
warn("Can't append text node to null
currentElement");
} else {
Text textNode =
currentDocument.createTextNode(characterData);
currentElement.appendChild(textNode);
}
}

This seems to have fixed the problem. I'm sure there is
a better solution, but this seemed the safest fix for
me at the moment.
gavares@amazon.com

Discussion

  • Nobody/Anonymous

    This file will break the parser.

     
  • opnworks

    opnworks - 2005-09-21

    Logged In: YES
    user_id=1349440

    The StringIndexOutOfBounds issue happens because
    ContentHandler.characters() third parameters should
    be count not length. The bugs appears with String(char[]
    value, int offset, int count) .. note count not length

    org.custommonkey.xmlunit.HTMLDocumentBuilder.handleTex
    t()

    Bug
    if (startPos < data.length) {
    saxContentHandler.characters(data, startPos,
    data.length);
    }

    Fix

    if (startPos < data.length) {
    saxContentHandler.characters(data, startPos,
    data.length-startPos);
    }

    http://java.sun.com/j2se/1.4.2/docs/api/org/xml/sax/ContentH
    andler.html#characters(char[],%20int,%20int)

    http://java.sun.com/j2se/1.4.2/docs/api/java/lang/String.html#
    String(char[],%20int,%20int)

     
  • Stefan Bodewig

    Stefan Bodewig - 2006-12-13

    Logged In: YES
    user_id=113148
    Originator: NO

    fixed in CVS, thanks

     
  • Stefan Bodewig

    Stefan Bodewig - 2006-12-13
    • status: open --> closed
     

Log in to post a comment.