Re: [Htmlparser-user] Malformed Input Exception

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

That sounds like a good feature request. Derrick ->what do you think ?

Regards,
Somik

----- Original Message -----
From: "Bob Lewis" <bob...@ya...>
To: <htm...@li...>
Sent: Tuesday, February 25, 2003 12:20 PM
Subject: Re: [Htmlparser-user] Malformed Input Exception

> Sorry, there was a typo in my last message:
>
> >     while (parser.hasMoreNodes())
> >     {
> >         // ... Do Something
> >     }
>
> should be
>
>      while (tags.hasMoreNodes())
>      {
>          // ... Do Something
>      }
>
> Also, on another note, if I try to initialize the
> parser directly, I am unable to work with the
> URLConnection.  For example:
>
>     HttpURLConnection urlConn = null;
>     HTMLParser parser = new
> HTMLParser("http://somedomain/somepath");
>     urlConn =
> (HttpURLConnection)parser.getConnection();
>     urlConn.setDoInput(true);
>     // ...
>
> This code throws an exception because the HTTP request
> has already been made.
>
> Exception in thread "main"
> java.lang.IllegalAccessError: Already connected
>         at
> java.net.URLConnection.setDoInput(URLConnection.java:677)
>
> --- Bob Lewis <bob...@ya...> wrote:
> >
> > I tried using the parser directly, as you suggested,
> > and it seems to work.  However, I need to be able
> > work
> > with the URLConnection to set headers, cookies and
> > send POST data.
> >
> > Typically, this is what I'm doing:
> >
> >     //create and initialize the URL Connection
> >     HttpURLConnection urlConn = null;
> >     URL url = new URL("http://somedomain/somepath");
> >     urlConn =
> > (HttpURLConnection)url.openConnection();
> >     urlConn.setDoInput(true);
> >     urlConn.setDoOutput(true);
> >     urlConn.setUseCaches(false);
> >     urlConn.setAllowUserInteraction(false);
> >     urlConn.setRequestMethod("POST");
> >
> >     // ... usually many HTTP Headers and cookie
> > values
> > set
> >     urlConn.setRequestProperty("someHeader",
> > "someValue");
> >     urlConn.setRequestProperty("anotherHeader",
> > "anotherValue");
> >
> >     StringBuffer postData = new StringBuffer();
> >      // ... generate post data in buffer
> >
> >     //Send the post data
> >     PrintWriter printWriter = new
> > PrintWriter(urlConn.getOutputStream());
> >     printWriter.println(postData.toString());
> >     printWriter.close();
> >
> >     //parse the response
> >     HTMLEnumeration tags = parser.elements();
> >
> >     while (parser.hasMoreNodes())
> >     {
> >         // ... Do Something
> >     }
> >
> > This works fine on most URLs.  I am normally able to
> > execute the server-side web application, obtain and
> > parse the HTML response.   However, in the case of
> > these two URLs, I get the MalformedInputException.
> >
> > Is there something I'm missing?
> >
> > Thanks,
> >
> > Bob Lewis
> >
> > --- Somik Raha <so...@ya...> wrote:
> >
> > >Date: 2003-02-24 21:33
> > >Sender: somik
> > >Logged In: YES
> > >user_id=187944
> > >
> > >I ran the parser on these pages and it worked fine.
> > Try
> > >runParser.bat
> > http://www.flytango.com/en/index.html.
> > >
> > >It could be that you have intialized your
> > urlconnection
> > >incorrectly. Have you tried using the parser
> > directly, like :
> > >
> > >HTMLParser parser = new HTMLParser
> > >("http://www.flytango.com/en/index.html");
> > >for (NodeIterator
> > i=parser.elements();i.hasMoreNodes();) {
> > >   System.out.println(i.nextNode().toHtml());
> > >}
> >
> > --- Somik Raha <so...@ya...> wrote:
> > > Hi Bob,
> > >   Sounds like a bug.
> > >   Can you file a bug report at
> > > http://htmlparser.sourceforge.net?
> > >
> > > Regards,
> > > Somik
> > > --- Bob Lewis <bob...@ya...> wrote:
> > > > Hi,
> > > >
> > > > I am trying to use htmlparser 1.3 to parse the
> > > HTML
> > > > at
> > > > http://www.flytango.com/en/taschedule.html and
> > > > http://www.flytango.com/en/index.html. When I
> > > > attempt
> > > > to parse these pages, I get
> > > > com.sun.io.MalformedInputException:
> > > >
> > > > sun.io.MalformedInputException
> > > >         at
> > > >
> > >
> >
> sun.io.ByteToCharUTF8.convert(ByteToCharUTF8.java:105)
> > > >         at
> > > >
> > >
> >
> java.io.InputStreamReader.convertInto(InputStreamReader.java:132)
> > > >         at
> > > >
> > >
> >
> java.io.InputStreamReader.fill(InputStreamReader.java:181)
> > > >         at
> > > >
> > >
> >
> java.io.InputStreamReader.read(InputStreamReader.java:244)
> > > >         at
> > > >
> > >
> > java.io.BufferedReader.fill(BufferedReader.java:134)
> > > >         at
> > > >
> > >
> >
> java.io.BufferedReader.readLine(BufferedReader.java:294)
> > > >         at
> > > >
> > >
> >
> java.io.BufferedReader.readLine(BufferedReader.java:357)
> > > >         at
> > > >
> > >
> >
> org.htmlparser.HTMLReader.getNextLine(HTMLReader.java:139)
> > > >         at
> > > >
> > >
> >
> org.htmlparser.HTMLReader.readElement(HTMLReader.java:176)
> > > >         at
> > > >
> > >
> >
> org.htmlparser.util.HTMLEnumerationImpl.peek(HTMLEnumerationImpl.java:60)
> > > >         at
> > > >
> > >
> >
>
org.htmlparser.util.HTMLEnumerationImpl.hasMoreNodes(HTMLEnumerationImpl.jav
a:91)
> > > >
> > > > Now, if I copy the source of these pages from a
> > > > browser into a file and put them on my own
> > > > webserver,
> > > > I can parse them without any errors.
> > > >
> > > > It's my guess that there is some strange control
> > > > character in the source that is causing the
> > > > exception,
> > > > but I'm not entirely sure.  Any suggestions?  If
> > > it
> > > > is
> > > > a bad character, would it be possible to add
> > code
> > > to
> > > > HTMLReader that strips offending characters from
> > > the
> > > > input stream?
> > > >
> > > > Here is the code I am using to parse:
> > > >
> > > >         DefaultHTMLParserFeedback feedback
> > > >             = new
> > > >
> > >
> >
> DefaultHTMLParserFeedback(DefaultHTMLParserFeedback.DEBUG);
> > > >
> > > >         HTMLReader reader = null;
> > > >         HTMLParser parser = null;
> > > >         InputStreamReader isr
> > > >             = new
> > > > InputStreamReader(urlConn.getInputStream());
> > > >         reader = new HTMLReader(isr, 8192);
> > > >         parser = new HTMLParser(reader,
> > feedback);
> > > >         boolean inForm = false;
> > > >
> > > >         parser.addScanner(new
> > > > HTMLInputTagScanner());
> > > >
> > > >         HTMLEnumeration tags =
> > parser.elements();
> > > >
> > > >         RequestParameters params = new
> > > > RequestParameters();
> > > >
> > > >         while (tags.hasMoreNodes())
> > > >         {
> > > > ...
> > > >         }
> > > >
> > > >
> > > > Thanks,
> > > >
> > > > Bob Lewis
> > > >
> >
> === message truncated ===
>
>
> __________________________________________________
> Do you Yahoo!?
> Yahoo! Tax Center - forms, calculators, tips, more
> http://taxes.yahoo.com/
>
>
> -------------------------------------------------------
> This sf.net email is sponsored by:ThinkGeek
> Welcome to geek heaven.
> http://thinkgeek.com/sf
> _______________________________________________
> Htmlparser-user mailing list
> Htm...@li...
> https://lists.sourceforge.net/lists/listinfo/htmlparser-user