Re: [Htmlparser-user] Malformed Input Exception
Brought to you by:
derrickoswald
From: Somik R. <so...@ya...> - 2003-02-26 06:44:02
|
Hi Bob, Can you try this - get the data from the url in question into a file (using a post request). Then try to parse the file. If it breaks, we would know why. Regards, Somik ----- Original Message ----- From: "Bob Lewis" <bob...@ya...> To: <htm...@li...> Sent: Tuesday, February 25, 2003 12:07 PM Subject: Re: [Htmlparser-user] Malformed Input Exception > > I tried using the parser directly, as you suggested, > and it seems to work. However, I need to be able work > with the URLConnection to set headers, cookies and > send POST data. > > Typically, this is what I'm doing: > > //create and initialize the URL Connection > HttpURLConnection urlConn = null; > URL url = new URL("http://somedomain/somepath"); > urlConn = (HttpURLConnection)url.openConnection(); > urlConn.setDoInput(true); > urlConn.setDoOutput(true); > urlConn.setUseCaches(false); > urlConn.setAllowUserInteraction(false); > urlConn.setRequestMethod("POST"); > > // ... usually many HTTP Headers and cookie values > set > urlConn.setRequestProperty("someHeader", > "someValue"); > urlConn.setRequestProperty("anotherHeader", > "anotherValue"); > > StringBuffer postData = new StringBuffer(); > // ... generate post data in buffer > > //Send the post data > PrintWriter printWriter = new > PrintWriter(urlConn.getOutputStream()); > printWriter.println(postData.toString()); > printWriter.close(); > > //parse the response > HTMLEnumeration tags = parser.elements(); > > while (parser.hasMoreNodes()) > { > // ... Do Something > } > > This works fine on most URLs. I am normally able to > execute the server-side web application, obtain and > parse the HTML response. However, in the case of > these two URLs, I get the MalformedInputException. > > Is there something I'm missing? > > Thanks, > > Bob Lewis > > --- Somik Raha <so...@ya...> wrote: > > >Date: 2003-02-24 21:33 > >Sender: somik > >Logged In: YES > >user_id=187944 > > > >I ran the parser on these pages and it worked fine. > Try > >runParser.bat http://www.flytango.com/en/index.html. > > > >It could be that you have intialized your > urlconnection > >incorrectly. Have you tried using the parser > directly, like : > > > >HTMLParser parser = new HTMLParser > >("http://www.flytango.com/en/index.html"); > >for (NodeIterator > i=parser.elements();i.hasMoreNodes();) { > > System.out.println(i.nextNode().toHtml()); > >} > > --- Somik Raha <so...@ya...> wrote: > > Hi Bob, > > Sounds like a bug. > > Can you file a bug report at > > http://htmlparser.sourceforge.net? > > > > Regards, > > Somik > > --- Bob Lewis <bob...@ya...> wrote: > > > Hi, > > > > > > I am trying to use htmlparser 1.3 to parse the > > HTML > > > at > > > http://www.flytango.com/en/taschedule.html and > > > http://www.flytango.com/en/index.html. When I > > > attempt > > > to parse these pages, I get > > > com.sun.io.MalformedInputException: > > > > > > sun.io.MalformedInputException > > > at > > > > > > sun.io.ByteToCharUTF8.convert(ByteToCharUTF8.java:105) > > > at > > > > > > java.io.InputStreamReader.convertInto(InputStreamReader.java:132) > > > at > > > > > > java.io.InputStreamReader.fill(InputStreamReader.java:181) > > > at > > > > > > java.io.InputStreamReader.read(InputStreamReader.java:244) > > > at > > > > > java.io.BufferedReader.fill(BufferedReader.java:134) > > > at > > > > > > java.io.BufferedReader.readLine(BufferedReader.java:294) > > > at > > > > > > java.io.BufferedReader.readLine(BufferedReader.java:357) > > > at > > > > > > org.htmlparser.HTMLReader.getNextLine(HTMLReader.java:139) > > > at > > > > > > org.htmlparser.HTMLReader.readElement(HTMLReader.java:176) > > > at > > > > > > org.htmlparser.util.HTMLEnumerationImpl.peek(HTMLEnumerationImpl.java:60) > > > at > > > > > > org.htmlparser.util.HTMLEnumerationImpl.hasMoreNodes(HTMLEnumerationImpl.jav a:91) > > > > > > Now, if I copy the source of these pages from a > > > browser into a file and put them on my own > > > webserver, > > > I can parse them without any errors. > > > > > > It's my guess that there is some strange control > > > character in the source that is causing the > > > exception, > > > but I'm not entirely sure. Any suggestions? If > > it > > > is > > > a bad character, would it be possible to add code > > to > > > HTMLReader that strips offending characters from > > the > > > input stream? > > > > > > Here is the code I am using to parse: > > > > > > DefaultHTMLParserFeedback feedback > > > = new > > > > > > DefaultHTMLParserFeedback(DefaultHTMLParserFeedback.DEBUG); > > > > > > HTMLReader reader = null; > > > HTMLParser parser = null; > > > InputStreamReader isr > > > = new > > > InputStreamReader(urlConn.getInputStream()); > > > reader = new HTMLReader(isr, 8192); > > > parser = new HTMLParser(reader, feedback); > > > boolean inForm = false; > > > > > > parser.addScanner(new > > > HTMLInputTagScanner()); > > > > > > HTMLEnumeration tags = parser.elements(); > > > > > > RequestParameters params = new > > > RequestParameters(); > > > > > > while (tags.hasMoreNodes()) > > > { > > > ... > > > } > > > > > > > > > Thanks, > > > > > > Bob Lewis > > > > > > > > > __________________________________________________ > > > Do you Yahoo!? > > > Yahoo! Tax Center - forms, calculators, tips, more > > > http://taxes.yahoo.com/ > > > > > > > > > > > > ------------------------------------------------------- > > > This sf.net email is sponsored by:ThinkGeek > > > Welcome to geek heaven. > > > http://thinkgeek.com/sf > > > _______________________________________________ > > > Htmlparser-user mailing list > > > Htm...@li... > > > > > > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > > > > __________________________________________________ > > Do you Yahoo!? > > Yahoo! Tax Center - forms, calculators, tips, more > > http://taxes.yahoo.com/ > > > > > > > ------------------------------------------------------- > > This sf.net email is sponsored by:ThinkGeek > > Welcome to geek heaven. > > http://thinkgeek.com/sf > > _______________________________________________ > > Htmlparser-user mailing list > > Htm...@li... > > > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > __________________________________________________ > Do you Yahoo!? > Yahoo! Tax Center - forms, calculators, tips, more > http://taxes.yahoo.com/ > > > ------------------------------------------------------- > This sf.net email is sponsored by:ThinkGeek > Welcome to geek heaven. > http://thinkgeek.com/sf > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user |