Re: [Htmlparser-user] Malformed Input Exception
Brought to you by:
derrickoswald
From: Bob L. <bob...@ya...> - 2003-02-25 20:07:38
|
I tried using the parser directly, as you suggested, and it seems to work. However, I need to be able work with the URLConnection to set headers, cookies and send POST data. Typically, this is what I'm doing: //create and initialize the URL Connection HttpURLConnection urlConn = null; URL url = new URL("http://somedomain/somepath"); urlConn = (HttpURLConnection)url.openConnection(); urlConn.setDoInput(true); urlConn.setDoOutput(true); urlConn.setUseCaches(false); urlConn.setAllowUserInteraction(false); urlConn.setRequestMethod("POST"); // ... usually many HTTP Headers and cookie values set urlConn.setRequestProperty("someHeader", "someValue"); urlConn.setRequestProperty("anotherHeader", "anotherValue"); StringBuffer postData = new StringBuffer(); // ... generate post data in buffer //Send the post data PrintWriter printWriter = new PrintWriter(urlConn.getOutputStream()); printWriter.println(postData.toString()); printWriter.close(); //parse the response HTMLEnumeration tags = parser.elements(); while (parser.hasMoreNodes()) { // ... Do Something } This works fine on most URLs. I am normally able to execute the server-side web application, obtain and parse the HTML response. However, in the case of these two URLs, I get the MalformedInputException. Is there something I'm missing? Thanks, Bob Lewis --- Somik Raha <so...@ya...> wrote: >Date: 2003-02-24 21:33 >Sender: somik >Logged In: YES >user_id=187944 > >I ran the parser on these pages and it worked fine. Try >runParser.bat http://www.flytango.com/en/index.html. > >It could be that you have intialized your urlconnection >incorrectly. Have you tried using the parser directly, like : > >HTMLParser parser = new HTMLParser >("http://www.flytango.com/en/index.html"); >for (NodeIterator i=parser.elements();i.hasMoreNodes();) { > System.out.println(i.nextNode().toHtml()); >} --- Somik Raha <so...@ya...> wrote: > Hi Bob, > Sounds like a bug. > Can you file a bug report at > http://htmlparser.sourceforge.net? > > Regards, > Somik > --- Bob Lewis <bob...@ya...> wrote: > > Hi, > > > > I am trying to use htmlparser 1.3 to parse the > HTML > > at > > http://www.flytango.com/en/taschedule.html and > > http://www.flytango.com/en/index.html. When I > > attempt > > to parse these pages, I get > > com.sun.io.MalformedInputException: > > > > sun.io.MalformedInputException > > at > > > sun.io.ByteToCharUTF8.convert(ByteToCharUTF8.java:105) > > at > > > java.io.InputStreamReader.convertInto(InputStreamReader.java:132) > > at > > > java.io.InputStreamReader.fill(InputStreamReader.java:181) > > at > > > java.io.InputStreamReader.read(InputStreamReader.java:244) > > at > > > java.io.BufferedReader.fill(BufferedReader.java:134) > > at > > > java.io.BufferedReader.readLine(BufferedReader.java:294) > > at > > > java.io.BufferedReader.readLine(BufferedReader.java:357) > > at > > > org.htmlparser.HTMLReader.getNextLine(HTMLReader.java:139) > > at > > > org.htmlparser.HTMLReader.readElement(HTMLReader.java:176) > > at > > > org.htmlparser.util.HTMLEnumerationImpl.peek(HTMLEnumerationImpl.java:60) > > at > > > org.htmlparser.util.HTMLEnumerationImpl.hasMoreNodes(HTMLEnumerationImpl.java:91) > > > > Now, if I copy the source of these pages from a > > browser into a file and put them on my own > > webserver, > > I can parse them without any errors. > > > > It's my guess that there is some strange control > > character in the source that is causing the > > exception, > > but I'm not entirely sure. Any suggestions? If > it > > is > > a bad character, would it be possible to add code > to > > HTMLReader that strips offending characters from > the > > input stream? > > > > Here is the code I am using to parse: > > > > DefaultHTMLParserFeedback feedback > > = new > > > DefaultHTMLParserFeedback(DefaultHTMLParserFeedback.DEBUG); > > > > HTMLReader reader = null; > > HTMLParser parser = null; > > InputStreamReader isr > > = new > > InputStreamReader(urlConn.getInputStream()); > > reader = new HTMLReader(isr, 8192); > > parser = new HTMLParser(reader, feedback); > > boolean inForm = false; > > > > parser.addScanner(new > > HTMLInputTagScanner()); > > > > HTMLEnumeration tags = parser.elements(); > > > > RequestParameters params = new > > RequestParameters(); > > > > while (tags.hasMoreNodes()) > > { > > ... > > } > > > > > > Thanks, > > > > Bob Lewis > > > > > > __________________________________________________ > > Do you Yahoo!? > > Yahoo! Tax Center - forms, calculators, tips, more > > http://taxes.yahoo.com/ > > > > > > > ------------------------------------------------------- > > This sf.net email is sponsored by:ThinkGeek > > Welcome to geek heaven. > > http://thinkgeek.com/sf > > _______________________________________________ > > Htmlparser-user mailing list > > Htm...@li... > > > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > __________________________________________________ > Do you Yahoo!? > Yahoo! Tax Center - forms, calculators, tips, more > http://taxes.yahoo.com/ > > > ------------------------------------------------------- > This sf.net email is sponsored by:ThinkGeek > Welcome to geek heaven. > http://thinkgeek.com/sf > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user __________________________________________________ Do you Yahoo!? Yahoo! Tax Center - forms, calculators, tips, more http://taxes.yahoo.com/ |