[Htmlparser-user] Malformed Input Exception
Brought to you by:
derrickoswald
From: Bob L. <bob...@ya...> - 2003-02-24 14:49:22
|
Hi, I am trying to use htmlparser 1.3 to parse the HTML at http://www.flytango.com/en/taschedule.html and http://www.flytango.com/en/index.html. When I attempt to parse these pages, I get com.sun.io.MalformedInputException: sun.io.MalformedInputException at sun.io.ByteToCharUTF8.convert(ByteToCharUTF8.java:105) at java.io.InputStreamReader.convertInto(InputStreamReader.java:132) at java.io.InputStreamReader.fill(InputStreamReader.java:181) at java.io.InputStreamReader.read(InputStreamReader.java:244) at java.io.BufferedReader.fill(BufferedReader.java:134) at java.io.BufferedReader.readLine(BufferedReader.java:294) at java.io.BufferedReader.readLine(BufferedReader.java:357) at org.htmlparser.HTMLReader.getNextLine(HTMLReader.java:139) at org.htmlparser.HTMLReader.readElement(HTMLReader.java:176) at org.htmlparser.util.HTMLEnumerationImpl.peek(HTMLEnumerationImpl.java:60) at org.htmlparser.util.HTMLEnumerationImpl.hasMoreNodes(HTMLEnumerationImpl.java:91) Now, if I copy the source of these pages from a browser into a file and put them on my own webserver, I can parse them without any errors. It's my guess that there is some strange control character in the source that is causing the exception, but I'm not entirely sure. Any suggestions? If it is a bad character, would it be possible to add code to HTMLReader that strips offending characters from the input stream? Here is the code I am using to parse: DefaultHTMLParserFeedback feedback = new DefaultHTMLParserFeedback(DefaultHTMLParserFeedback.DEBUG); HTMLReader reader = null; HTMLParser parser = null; InputStreamReader isr = new InputStreamReader(urlConn.getInputStream()); reader = new HTMLReader(isr, 8192); parser = new HTMLParser(reader, feedback); boolean inForm = false; parser.addScanner(new HTMLInputTagScanner()); HTMLEnumeration tags = parser.elements(); RequestParameters params = new RequestParameters(); while (tags.hasMoreNodes()) { ... } Thanks, Bob Lewis __________________________________________________ Do you Yahoo!? Yahoo! Tax Center - forms, calculators, tips, more http://taxes.yahoo.com/ |