Menu

#9 InputSource with InputStream NPE

v0.1.3
open
Parser (8)
5
2005-01-09
2005-01-09
Anonymous
No

If I create an InputSource using an InputStream and
then pass that to the parser I get an NPE:

java.lang.NullPointerException
at
hotsax.html.sax.HtmlLexer.yy_advance(HtmlLexer.java:434)
at hotsax.html.sax.HtmlLexer.yylex(HtmlLexer.java:608)
at hotsax.html.sax.HtmlLexer._yylex(HtmlLexer.java:227)
at hotsax.html.sax.HtmlParser.yylex(HtmlParser.java:377)
at hotsax.html.sax.HtmlParser.yyparse(HtmlParser.java:602)
at hotsax.html.sax.SaxParser.parse(SaxParser.java:219)
[remainer of trace omitted]

If I use a Reader instead then the parser works fine.

Discussion

  • Nobody/Anonymous

    Logged In: NO

    I have the exact same BUG (HotSAX-0.1.2c.tar.gz) by passing
    an InputSource created from a File.toURI().toString(). The
    file exists.

    Pretty frustrating...

     
  • Simon Massey

    Simon Massey - 2012-01-05

    You need to wrap whatever raw input you have with a reader.

    So for a file you use FileReader:

    InputSource input = new InputSource(
    new BufferedReader(
    new FileReader( "src/test/resources/smithsonianmag.com.santas-trusty-robot-reindeer.html")));

    For a string you use a StringReader:

    String html = "<x>\n" +
    "Text\n" +
    "<!-- comment -->\n" +
    "</x>";
    InputSource input = new InputSource(new StringReader(html));

    for a URL you use a InputStreamReader:

    File file = new File("src/test/resources/smithsonianmag.com.santas-trusty-robot-reindeer.html");

    @SuppressWarnings("deprecation")
    URL fileUrl = file.toURL();

    TextExtractionContextHandler ch = new TextExtractionContextHandler("p");
    parser.setContentHandler(ch);
    InputSource input = new InputSource(
    new BufferedReader(
    new InputStreamReader(
    fileUrl.openStream())));

     

Log in to post a comment.

MongoDB Logo MongoDB