Re: [Htmlparser-user] Malformed Input Exception
Brought to you by:
derrickoswald
From: Somik R. <so...@ya...> - 2003-02-24 18:29:52
|
Hi Bob, Sounds like a bug. Can you file a bug report at http://htmlparser.sourceforge.net? Regards, Somik --- Bob Lewis <bob...@ya...> wrote: > Hi, > > I am trying to use htmlparser 1.3 to parse the HTML > at > http://www.flytango.com/en/taschedule.html and > http://www.flytango.com/en/index.html. When I > attempt > to parse these pages, I get > com.sun.io.MalformedInputException: > > sun.io.MalformedInputException > at > sun.io.ByteToCharUTF8.convert(ByteToCharUTF8.java:105) > at > java.io.InputStreamReader.convertInto(InputStreamReader.java:132) > at > java.io.InputStreamReader.fill(InputStreamReader.java:181) > at > java.io.InputStreamReader.read(InputStreamReader.java:244) > at > java.io.BufferedReader.fill(BufferedReader.java:134) > at > java.io.BufferedReader.readLine(BufferedReader.java:294) > at > java.io.BufferedReader.readLine(BufferedReader.java:357) > at > org.htmlparser.HTMLReader.getNextLine(HTMLReader.java:139) > at > org.htmlparser.HTMLReader.readElement(HTMLReader.java:176) > at > org.htmlparser.util.HTMLEnumerationImpl.peek(HTMLEnumerationImpl.java:60) > at > org.htmlparser.util.HTMLEnumerationImpl.hasMoreNodes(HTMLEnumerationImpl.java:91) > > Now, if I copy the source of these pages from a > browser into a file and put them on my own > webserver, > I can parse them without any errors. > > It's my guess that there is some strange control > character in the source that is causing the > exception, > but I'm not entirely sure. Any suggestions? If it > is > a bad character, would it be possible to add code to > HTMLReader that strips offending characters from the > input stream? > > Here is the code I am using to parse: > > DefaultHTMLParserFeedback feedback > = new > DefaultHTMLParserFeedback(DefaultHTMLParserFeedback.DEBUG); > > HTMLReader reader = null; > HTMLParser parser = null; > InputStreamReader isr > = new > InputStreamReader(urlConn.getInputStream()); > reader = new HTMLReader(isr, 8192); > parser = new HTMLParser(reader, feedback); > boolean inForm = false; > > parser.addScanner(new > HTMLInputTagScanner()); > > HTMLEnumeration tags = parser.elements(); > > RequestParameters params = new > RequestParameters(); > > while (tags.hasMoreNodes()) > { > ... > } > > > Thanks, > > Bob Lewis > > > __________________________________________________ > Do you Yahoo!? > Yahoo! Tax Center - forms, calculators, tips, more > http://taxes.yahoo.com/ > > > ------------------------------------------------------- > This sf.net email is sponsored by:ThinkGeek > Welcome to geek heaven. > http://thinkgeek.com/sf > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user __________________________________________________ Do you Yahoo!? Yahoo! Tax Center - forms, calculators, tips, more http://taxes.yahoo.com/ |