Re: [Htmlparser-user] Malformed Input Exception

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Sorry, there was a typo in my last message:

>     while (parser.hasMoreNodes())
>     {
>         // ... Do Something
>     }

should be

     while (tags.hasMoreNodes())
     {
         // ... Do Something
     }

Also, on another note, if I try to initialize the
parser directly, I am unable to work with the
URLConnection.  For example:

    HttpURLConnection urlConn = null;
    HTMLParser parser = new
HTMLParser("http://somedomain/somepath");
    urlConn =
(HttpURLConnection)parser.getConnection();
    urlConn.setDoInput(true);
    // ...

This code throws an exception because the HTTP request
has already been made.

Exception in thread "main"
java.lang.IllegalAccessError: Already connected
        at
java.net.URLConnection.setDoInput(URLConnection.java:677)

--- Bob Lewis <bob...@ya...> wrote:
> 
> I tried using the parser directly, as you suggested,
> and it seems to work.  However, I need to be able
> work
> with the URLConnection to set headers, cookies and
> send POST data.
> 
> Typically, this is what I'm doing:
> 
>     //create and initialize the URL Connection
>     HttpURLConnection urlConn = null;
>     URL url = new URL("http://somedomain/somepath");
>     urlConn =
> (HttpURLConnection)url.openConnection();
>     urlConn.setDoInput(true);
>     urlConn.setDoOutput(true);
>     urlConn.setUseCaches(false);
>     urlConn.setAllowUserInteraction(false);
>     urlConn.setRequestMethod("POST");
> 
>     // ... usually many HTTP Headers and cookie
> values
> set
>     urlConn.setRequestProperty("someHeader",
> "someValue");
>     urlConn.setRequestProperty("anotherHeader",
> "anotherValue");
> 
>     StringBuffer postData = new StringBuffer();
>      // ... generate post data in buffer
> 
>     //Send the post data
>     PrintWriter printWriter = new
> PrintWriter(urlConn.getOutputStream());
>     printWriter.println(postData.toString());
>     printWriter.close();
> 
>     //parse the response
>     HTMLEnumeration tags = parser.elements();
> 
>     while (parser.hasMoreNodes())
>     {
>         // ... Do Something
>     }
> 
> This works fine on most URLs.  I am normally able to
> execute the server-side web application, obtain and
> parse the HTML response.   However, in the case of
> these two URLs, I get the MalformedInputException.
> 
> Is there something I'm missing?
> 
> Thanks,
> 
> Bob Lewis
> 
> --- Somik Raha <so...@ya...> wrote:
> 
> >Date: 2003-02-24 21:33
> >Sender: somik
> >Logged In: YES 
> >user_id=187944
> >
> >I ran the parser on these pages and it worked fine.
> Try 
> >runParser.bat
> http://www.flytango.com/en/index.html.
> >
> >It could be that you have intialized your
> urlconnection 
> >incorrectly. Have you tried using the parser
> directly, like :
> >
> >HTMLParser parser = new HTMLParser
> >("http://www.flytango.com/en/index.html");
> >for (NodeIterator
> i=parser.elements();i.hasMoreNodes();) {
> >   System.out.println(i.nextNode().toHtml());
> >}
> 
> --- Somik Raha <so...@ya...> wrote:
> > Hi Bob,
> >   Sounds like a bug.
> >   Can you file a bug report at
> > http://htmlparser.sourceforge.net?
> > 
> > Regards,
> > Somik
> > --- Bob Lewis <bob...@ya...> wrote:
> > > Hi,
> > > 
> > > I am trying to use htmlparser 1.3 to parse the
> > HTML
> > > at
> > > http://www.flytango.com/en/taschedule.html and
> > > http://www.flytango.com/en/index.html. When I
> > > attempt
> > > to parse these pages, I get 
> > > com.sun.io.MalformedInputException:
> > > 
> > > sun.io.MalformedInputException
> > >         at
> > >
> >
>
sun.io.ByteToCharUTF8.convert(ByteToCharUTF8.java:105)
> > >         at
> > >
> >
>
java.io.InputStreamReader.convertInto(InputStreamReader.java:132)
> > >         at
> > >
> >
>
java.io.InputStreamReader.fill(InputStreamReader.java:181)
> > >         at
> > >
> >
>
java.io.InputStreamReader.read(InputStreamReader.java:244)
> > >         at
> > >
> >
> java.io.BufferedReader.fill(BufferedReader.java:134)
> > >         at
> > >
> >
>
java.io.BufferedReader.readLine(BufferedReader.java:294)
> > >         at
> > >
> >
>
java.io.BufferedReader.readLine(BufferedReader.java:357)
> > >         at
> > >
> >
>
org.htmlparser.HTMLReader.getNextLine(HTMLReader.java:139)
> > >         at
> > >
> >
>
org.htmlparser.HTMLReader.readElement(HTMLReader.java:176)
> > >         at
> > >
> >
>
org.htmlparser.util.HTMLEnumerationImpl.peek(HTMLEnumerationImpl.java:60)
> > >         at
> > >
> >
>
org.htmlparser.util.HTMLEnumerationImpl.hasMoreNodes(HTMLEnumerationImpl.java:91)
> > > 
> > > Now, if I copy the source of these pages from a
> > > browser into a file and put them on my own
> > > webserver,
> > > I can parse them without any errors.  
> > > 
> > > It's my guess that there is some strange control
> > > character in the source that is causing the
> > > exception,
> > > but I'm not entirely sure.  Any suggestions?  If
> > it
> > > is
> > > a bad character, would it be possible to add
> code
> > to
> > > HTMLReader that strips offending characters from
> > the
> > > input stream?  
> > > 
> > > Here is the code I am using to parse:
> > > 
> > >         DefaultHTMLParserFeedback feedback
> > >             = new
> > >
> >
>
DefaultHTMLParserFeedback(DefaultHTMLParserFeedback.DEBUG);
> > > 
> > >         HTMLReader reader = null;
> > >         HTMLParser parser = null;
> > >         InputStreamReader isr
> > >             = new
> > > InputStreamReader(urlConn.getInputStream());
> > >         reader = new HTMLReader(isr, 8192);
> > >         parser = new HTMLParser(reader,
> feedback);
> > >         boolean inForm = false;
> > > 
> > >         parser.addScanner(new
> > > HTMLInputTagScanner());
> > > 
> > >         HTMLEnumeration tags =
> parser.elements();
> > > 
> > >         RequestParameters params = new
> > > RequestParameters();
> > > 
> > >         while (tags.hasMoreNodes())
> > >         {
> > > ...
> > >         }
> > > 
> > > 
> > > Thanks,
> > > 
> > > Bob Lewis
> > > 
> 
=== message truncated ===

__________________________________________________
Do you Yahoo!?
Yahoo! Tax Center - forms, calculators, tips, more
http://taxes.yahoo.com/