Share

HotSAX

Code

Programming Languages: Java

License: GNU Library or Lesser General Public License (LGPL)

Repositories

browse code, statistics, last commit on 2002-05-25 cvs -d:pserver:anonymous@hotsax.cvs.sourceforge.net:/cvsroot/hotsax login

cvs -z3 -d:pserver:anonymous@hotsax.cvs.sourceforge.net:/cvsroot/hotsax co -P modulename

Show:

What's happening?

  • API Doc

    Hi is this project still on-going? is there an api documentation ?.

    2008-05-29 14:43:01 UTC by ashwin_ittoo

  • Comment: ArrayIndexOutOfBoundsException in HtmlLexer.yylex()

    me too!

    2008-04-03 10:30:20 UTC by nobody

  • Comment: ArrayOutOfBounds exception w/ some HTML

    In my case, it was an issue with encoding of some chars in the html file I parsed. I seems that Hotsax doesn't handle well UTF-8 chars that are 2 bytes-wide. So, i used recode -d UTF-8..HTML before using hotsax and It was all ok then.

    2008-03-16 01:23:22 UTC by nikobonnieure

  • Comment: ArrayIndexOutOfBoundsException

    In my case, it was an issue with encoding of some chars in the html file I parsed. I seems that Hotsax doesn't handle well UTF-8 chars that are 2 bytes-wide. So, i used recode -d UTF-8..HTML before using hotsax and It was all ok then.

    2008-03-16 01:22:01 UTC by nikobonnieure

  • Comment: ArrayIndexOutOfBoundsException in HtmlLexer.yylex()

    In my case, it was an issue with encoding of some chars in the html file I parsed. I seems that Hotsax doesn't handle well UTF-8 chars that are 2 bytes-wide. So, i used recode -d UTF-8..HTML before using hotsax and It was all ok then.

    2008-03-16 01:21:02 UTC by nikobonnieure

  • ArrayIndexOutOfBoundsException in HtmlLexer.yylex()

    When parsing the page: http://www.cs.dartmouth.edu/~fabio/ I had the following error: Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 8364 at hotsax.html.sax.HtmlLexer.yylex(HtmlLexer.java:549) at hotsax.html.sax.HtmlLexer._yylex(HtmlLexer.java:218) at hotsax.html.sax.HtmlParser.yylex(HtmlParser.java:285) at...

    2008-03-13 04:26:57 UTC by nobody

  • Problem with nowrap

    hi, Im using HotSAX2 and it seems to very fast & accurate. Good work! While trying stuff, I found out that it would not parse elements inside <td align="right" valign="bottom" nowrap> "nowrap" >> Causes the Parser to skip nodes. Do you have some setting to parse nowrap's too ? Thanks in advance, Gaurav.

    2007-08-08 17:36:37 UTC by hencre

  • it seems that HotSAX doesn't support Chinese html pages!

    When I try to debug xhtmlMaker.java using a html page that contains Chinese words, I encountered a ArraysOutOfBounds error! It's quite frustrating! Hoping there're some solutions. If you know,please email xiao7cn@126.com to tell me how to fix it. Thanks.

    2007-01-26 06:38:01 UTC by nobody

  • NullPointerException in startDTD

    The parser throws a NullPointerException when I use the following HTML as input for xhtmlMaker.java: I use the files from the distribution directory chapt06, version HotSAX-0.1.2c.

    2006-11-14 08:03:29 UTC by brnrd

  • Comment: InputSource with InputStream NPE

    I have the exact same BUG (HotSAX-0.1.2c.tar.gz) by passing an InputSource created from a File.toURI().toString(). The file exists. Pretty frustrating...

    2006-10-14 21:51:05 UTC by nobody

Our Numbers