From: SourceForge.net <no...@so...> - 2011-09-08 18:12:39
|
Bugs item #3349161, was opened at 2011-07-01 15:56 Message generated for change (Comment added) made by furman82 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=3349161&group_id=13153 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Tidy functionality Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Aaron Herstein (aarongh2012) Assigned to: Nobody/Anonymous (nobody) Summary: problem parsing CDATA Initial Comment: When parsing this page: http://www.nytimes.com/2011/04/14/world/asia/14quake.html?_r=2, a StringIndexOutOfBoundsException is being thrown with this stack trace: java.lang.StringIndexOutOfBoundsException: String index out of range: 16385 at java.lang.String.checkBounds(Unknown Source) at java.lang.String.<init>(Unknown Source) at org.w3c.tidy.TidyUtils.getString(TidyUtils.java:658) at org.w3c.tidy.Lexer.getCDATA(Lexer.java:1835) at org.w3c.tidy.ParserImpl$ParseScript.parse(ParserImpl.java:667) at org.w3c.tidy.ParserImpl.parseTag(ParserImpl.java:203) at org.w3c.tidy.ParserImpl$ParseBlock.parse(ParserImpl.java:2464) at org.w3c.tidy.ParserImpl.parseTag(ParserImpl.java:203) at org.w3c.tidy.ParserImpl$ParseBlock.parse(ParserImpl.java:2464) at org.w3c.tidy.ParserImpl.parseTag(ParserImpl.java:203) at org.w3c.tidy.ParserImpl$ParseBlock.parse(ParserImpl.java:2464) at org.w3c.tidy.ParserImpl.parseTag(ParserImpl.java:203) at org.w3c.tidy.ParserImpl$ParseBlock.parse(ParserImpl.java:2464) at org.w3c.tidy.ParserImpl.parseTag(ParserImpl.java:203) at org.w3c.tidy.ParserImpl$ParseBlock.parse(ParserImpl.java:2464) at org.w3c.tidy.ParserImpl.parseTag(ParserImpl.java:203) at org.w3c.tidy.ParserImpl$ParseBlock.parse(ParserImpl.java:2464) at org.w3c.tidy.ParserImpl.parseTag(ParserImpl.java:203) at org.w3c.tidy.ParserImpl$ParseBlock.parse(ParserImpl.java:2464) at org.w3c.tidy.ParserImpl.parseTag(ParserImpl.java:203) at org.w3c.tidy.ParserImpl$ParseBlock.parse(ParserImpl.java:2464) at org.w3c.tidy.ParserImpl.parseTag(ParserImpl.java:203) at org.w3c.tidy.ParserImpl$ParseBody.parse(ParserImpl.java:971) at org.w3c.tidy.ParserImpl.parseTag(ParserImpl.java:203) at org.w3c.tidy.ParserImpl$ParseHTML.parse(ParserImpl.java:483) at org.w3c.tidy.ParserImpl.parseDocument(ParserImpl.java:3401) at org.w3c.tidy.Tidy.parse(Tidy.java:435) at org.w3c.tidy.Tidy.parse(Tidy.java:658) ---------------------------------------------------------------------- Comment By: Matt Furman (furman82) Date: 2011-09-08 14:12 Message: I also ran into this issue and "fixed" it locally... It appears to be a flaw with addByte within Lexer.java. The function assumes that the buffer only gets examined one byte at a time, however in the CDATA function, the call to TidyUtils.getString passes in a length that is greater than 1. I overloaded the appropriate functions to allow to pass in the size the buffer needs to grow by. public void addByte(int c) { addByte(c, 1); } /** * Adds a byte to lexer buffer. * @param c byte to add */ public void addByte(int c, int size) { if (this.lexsize + size >= this.lexlength) { while (this.lexsize + size >= this.lexlength) { if (this.lexlength == 0) { this.lexlength = 8192; } else { this.lexlength = this.lexlength * 2; } } byte[] temp = this.lexbuf; this.lexbuf = new byte[this.lexlength]; if (temp != null) { System.arraycopy(temp, 0, this.lexbuf, 0, temp.length); updateNodeTextArrays(temp, this.lexbuf); } } this.lexbuf[this.lexsize++] = (byte) c; this.lexbuf[this.lexsize] = (byte) '\0'; // debug } Once I changed the necessary associated functions, it seemed to do the trick. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=3349161&group_id=13153 |