I also get similar error message when i test some url:
caught exception parsing
java.lang.ArrayIndexOutOfBoundsException null
java.lang.ArrayIndexOutOfBoundsException
at hotsax.html.sax.HtmlLexer.yylex(HtmlLexer.java:612)
at hotsax.html.sax.HtmlLexer._yylex
(HtmlLexer.java:227)
at hotsax.html.sax.HtmlParser.yylex
(HtmlParser.java:377)
at hotsax.html.sax.HtmlParser.yyparse
(HtmlParser.java:602)
at hotsax.html.sax.SaxParser.parse
(SaxParser.java:219)
at hotsax.html.sax.SaxParser.parse
(SaxParser.java:169)
at HotSAXParser.main(HotSAXParser.java:26)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2008-03-16
Logged In: YES
user_id=2037509
Originator: NO
In my case, it was an issue with encoding of some chars in the html file I parsed.
I seems that Hotsax doesn't handle well UTF-8 chars that are 2 bytes-wide.
So, i used recode -d UTF-8..HTML before using hotsax and It was all ok then.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Logged In: NO
I also get similar error message when i test some url:
caught exception parsing
java.lang.ArrayIndexOutOfBoundsException null
java.lang.ArrayIndexOutOfBoundsException
at hotsax.html.sax.HtmlLexer.yylex(HtmlLexer.java:612)
at hotsax.html.sax.HtmlLexer._yylex
(HtmlLexer.java:227)
at hotsax.html.sax.HtmlParser.yylex
(HtmlParser.java:377)
at hotsax.html.sax.HtmlParser.yyparse
(HtmlParser.java:602)
at hotsax.html.sax.SaxParser.parse
(SaxParser.java:219)
at hotsax.html.sax.SaxParser.parse
(SaxParser.java:169)
at HotSAXParser.main(HotSAXParser.java:26)
Logged In: YES
user_id=2037509
Originator: NO
In my case, it was an issue with encoding of some chars in the html file I parsed.
I seems that Hotsax doesn't handle well UTF-8 chars that are 2 bytes-wide.
So, i used recode -d UTF-8..HTML before using hotsax and It was all ok then.
this is a duplicate issue. the fix and some fixed code is discussed at
https://sourceforge.net/tracker/?func=detail&aid=1913288&group_id=29085&atid=395047