InputSource with InputStream NPE
Status: Pre-Alpha
Brought to you by:
ulysees2001
If I create an InputSource using an InputStream and
then pass that to the parser I get an NPE:
java.lang.NullPointerException
at
hotsax.html.sax.HtmlLexer.yy_advance(HtmlLexer.java:434)
at hotsax.html.sax.HtmlLexer.yylex(HtmlLexer.java:608)
at hotsax.html.sax.HtmlLexer._yylex(HtmlLexer.java:227)
at hotsax.html.sax.HtmlParser.yylex(HtmlParser.java:377)
at hotsax.html.sax.HtmlParser.yyparse(HtmlParser.java:602)
at hotsax.html.sax.SaxParser.parse(SaxParser.java:219)
[remainer of trace omitted]
If I use a Reader instead then the parser works fine.
Logged In: NO
I have the exact same BUG (HotSAX-0.1.2c.tar.gz) by passing
an InputSource created from a File.toURI().toString(). The
file exists.
Pretty frustrating...
You need to wrap whatever raw input you have with a reader.
So for a file you use FileReader:
InputSource input = new InputSource(
new BufferedReader(
new FileReader( "src/test/resources/smithsonianmag.com.santas-trusty-robot-reindeer.html")));
For a string you use a StringReader:
String html = "<x>\n" +
"Text\n" +
"<!-- comment -->\n" +
"</x>";
InputSource input = new InputSource(new StringReader(html));
for a URL you use a InputStreamReader:
File file = new File("src/test/resources/smithsonianmag.com.santas-trusty-robot-reindeer.html");
@SuppressWarnings("deprecation")
URL fileUrl = file.toURL();
TextExtractionContextHandler ch = new TextExtractionContextHandler("p");
parser.setContentHandler(ch);
InputSource input = new InputSource(
new BufferedReader(
new InputStreamReader(
fileUrl.openStream())));