I am trying to parse a html file with javascript. I found out that it fails if a javascript single line comment ends with a wrod in ' it fails to parse. In followin trace it failed at comment ending with word 'page.' (notice it ends with a ' ). Is there something I am doing wrong or its a bug? Did anyone else had same problem? Would really appreciate your help.
org.htmlparser.util.HTMLParserException: Unexpected Exception occurred while reading file://localhost/C:/DOCUME~1/Owner/LOCALS~1/Temp/tmp46206htm.html, in nextHTMLNode
at Line 98 : //New dummy form required if dest == 'service' to insure that http_referer exists in Internet Explorer upon return to the service 'page.'
Previous Line 97 : };
org.htmlparser.util.HTMLParserException: HTMLReader.readElement() : Error occurred while trying to read the next element,
at Line 98 : //New dummy form required if dest == 'service' to insure that http_referer exists in Internet Explorer upon return to the service 'page.'
Previous Line 97 : };
org.htmlparser.util.HTMLParserException: HTMLReader.readElement() : Error occurred while trying to decipher the tag using scanners
at Line 98 : //New dummy form required if dest == 'service' to insure that http_referer exists in Internet Explorer upon return to the service 'page.'
Previous Line 97 : };
org.htmlparser.util.HTMLParserException: HTMLTag.scan() : Error while scanning tag, tag contents = script LANGUAGE="Javascript", tagLine = <script LANGUAGE="Javascript">;
org.htmlparser.util.HTMLParserException: HTMLScriptScanner.scan() : Error while scanning a script tag, currentLine = <script LANGUAGE="Javascript">;
org.htmlparser.util.HTMLParserException: HTMLReader.readElement() : Error occurred while trying to read the next element,
at Line 98 : //New dummy form required if dest == 'service' to insure that http_referer exists in Internet Explorer upon return to the service 'page.'
Previous Line 97 : };
java.lang.StringIndexOutOfBoundsException: String index out of range: 139
at java.lang.String.charAt(String.java:455)
at org.htmlparser.HTMLStringNode.find(HTMLStringNode.java:102)
at org.htmlparser.HTMLReader.readElement(HTMLReader.java:181)
at org.htmlparser.scanners.HTMLScriptScanner.scan(HTMLScriptScanner.java:127)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I am trying to parse a html file with javascript. I found out that it fails if a javascript single line comment ends with a wrod in ' it fails to parse. In followin trace it failed at comment ending with word 'page.' (notice it ends with a ' ). Is there something I am doing wrong or its a bug? Did anyone else had same problem? Would really appreciate your help.
org.htmlparser.util.HTMLParserException: Unexpected Exception occurred while reading file://localhost/C:/DOCUME~1/Owner/LOCALS~1/Temp/tmp46206htm.html, in nextHTMLNode
at Line 98 : //New dummy form required if dest == 'service' to insure that http_referer exists in Internet Explorer upon return to the service 'page.'
Previous Line 97 : };
org.htmlparser.util.HTMLParserException: HTMLReader.readElement() : Error occurred while trying to read the next element,
at Line 98 : //New dummy form required if dest == 'service' to insure that http_referer exists in Internet Explorer upon return to the service 'page.'
Previous Line 97 : };
org.htmlparser.util.HTMLParserException: HTMLReader.readElement() : Error occurred while trying to decipher the tag using scanners
at Line 98 : //New dummy form required if dest == 'service' to insure that http_referer exists in Internet Explorer upon return to the service 'page.'
Previous Line 97 : };
org.htmlparser.util.HTMLParserException: HTMLTag.scan() : Error while scanning tag, tag contents = script LANGUAGE="Javascript", tagLine = <script LANGUAGE="Javascript">;
org.htmlparser.util.HTMLParserException: HTMLScriptScanner.scan() : Error while scanning a script tag, currentLine = <script LANGUAGE="Javascript">;
org.htmlparser.util.HTMLParserException: HTMLReader.readElement() : Error occurred while trying to read the next element,
at Line 98 : //New dummy form required if dest == 'service' to insure that http_referer exists in Internet Explorer upon return to the service 'page.'
Previous Line 97 : };
java.lang.StringIndexOutOfBoundsException: String index out of range: 139
at java.lang.String.charAt(String.java:455)
at org.htmlparser.HTMLStringNode.find(HTMLStringNode.java:102)
at org.htmlparser.HTMLReader.readElement(HTMLReader.java:181)
at org.htmlparser.scanners.HTMLScriptScanner.scan(HTMLScriptScanner.java:127)
I think I found the problem.........in the HTMLStringNode
Heres fixed code snipplet
if (ch=='\'') {
if (state==PARSE_IGNORE_STATE) state=PARSE_HAS_BEGUN_STATE;
else {
//Added this to remove the bug (comment ending with a ' if((i+1)<inputLen)
if (input.charAt(i+1)=='<'){
state = PARSE_IGNORE_STATE;
}
}
}
I think you are using an old version of the parser. Pls get the latest and try again.
Regards
Somik
Indeed I was :) thanks
Kedar