Re: [Htmlparser-user] Malformed Input Exception

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

I tried using the parser directly, as you suggested,
and it seems to work.  However, I need to be able work
with the URLConnection to set headers, cookies and
send POST data.

Typically, this is what I'm doing:

    //create and initialize the URL Connection
    HttpURLConnection urlConn = null;
    URL url = new URL("http://somedomain/somepath");
    urlConn = (HttpURLConnection)url.openConnection();
    urlConn.setDoInput(true);
    urlConn.setDoOutput(true);
    urlConn.setUseCaches(false);
    urlConn.setAllowUserInteraction(false);
    urlConn.setRequestMethod("POST");

    // ... usually many HTTP Headers and cookie values
set
    urlConn.setRequestProperty("someHeader",
"someValue");
    urlConn.setRequestProperty("anotherHeader",
"anotherValue");

    StringBuffer postData = new StringBuffer();
     // ... generate post data in buffer

    //Send the post data
    PrintWriter printWriter = new
PrintWriter(urlConn.getOutputStream());
    printWriter.println(postData.toString());
    printWriter.close();

    //parse the response
    HTMLEnumeration tags = parser.elements();

    while (parser.hasMoreNodes())
    {
        // ... Do Something
    }

This works fine on most URLs.  I am normally able to
execute the server-side web application, obtain and
parse the HTML response.   However, in the case of
these two URLs, I get the MalformedInputException.

Is there something I'm missing?

Thanks,

Bob Lewis

--- Somik Raha <so...@ya...> wrote:

>Date: 2003-02-24 21:33
>Sender: somik
>Logged In: YES 
>user_id=187944
>
>I ran the parser on these pages and it worked fine.
Try 
>runParser.bat http://www.flytango.com/en/index.html.
>
>It could be that you have intialized your
urlconnection 
>incorrectly. Have you tried using the parser
directly, like :
>
>HTMLParser parser = new HTMLParser
>("http://www.flytango.com/en/index.html");
>for (NodeIterator
i=parser.elements();i.hasMoreNodes();) {
>   System.out.println(i.nextNode().toHtml());
>}

--- Somik Raha <so...@ya...> wrote:
> Hi Bob,
>   Sounds like a bug.
>   Can you file a bug report at
> http://htmlparser.sourceforge.net?
> 
> Regards,
> Somik
> --- Bob Lewis <bob...@ya...> wrote:
> > Hi,
> > 
> > I am trying to use htmlparser 1.3 to parse the
> HTML
> > at
> > http://www.flytango.com/en/taschedule.html and
> > http://www.flytango.com/en/index.html. When I
> > attempt
> > to parse these pages, I get 
> > com.sun.io.MalformedInputException:
> > 
> > sun.io.MalformedInputException
> >         at
> >
>
sun.io.ByteToCharUTF8.convert(ByteToCharUTF8.java:105)
> >         at
> >
>
java.io.InputStreamReader.convertInto(InputStreamReader.java:132)
> >         at
> >
>
java.io.InputStreamReader.fill(InputStreamReader.java:181)
> >         at
> >
>
java.io.InputStreamReader.read(InputStreamReader.java:244)
> >         at
> >
> java.io.BufferedReader.fill(BufferedReader.java:134)
> >         at
> >
>
java.io.BufferedReader.readLine(BufferedReader.java:294)
> >         at
> >
>
java.io.BufferedReader.readLine(BufferedReader.java:357)
> >         at
> >
>
org.htmlparser.HTMLReader.getNextLine(HTMLReader.java:139)
> >         at
> >
>
org.htmlparser.HTMLReader.readElement(HTMLReader.java:176)
> >         at
> >
>
org.htmlparser.util.HTMLEnumerationImpl.peek(HTMLEnumerationImpl.java:60)
> >         at
> >
>
org.htmlparser.util.HTMLEnumerationImpl.hasMoreNodes(HTMLEnumerationImpl.java:91)
> > 
> > Now, if I copy the source of these pages from a
> > browser into a file and put them on my own
> > webserver,
> > I can parse them without any errors.  
> > 
> > It's my guess that there is some strange control
> > character in the source that is causing the
> > exception,
> > but I'm not entirely sure.  Any suggestions?  If
> it
> > is
> > a bad character, would it be possible to add code
> to
> > HTMLReader that strips offending characters from
> the
> > input stream?  
> > 
> > Here is the code I am using to parse:
> > 
> >         DefaultHTMLParserFeedback feedback
> >             = new
> >
>
DefaultHTMLParserFeedback(DefaultHTMLParserFeedback.DEBUG);
> > 
> >         HTMLReader reader = null;
> >         HTMLParser parser = null;
> >         InputStreamReader isr
> >             = new
> > InputStreamReader(urlConn.getInputStream());
> >         reader = new HTMLReader(isr, 8192);
> >         parser = new HTMLParser(reader, feedback);
> >         boolean inForm = false;
> > 
> >         parser.addScanner(new
> > HTMLInputTagScanner());
> > 
> >         HTMLEnumeration tags = parser.elements();
> > 
> >         RequestParameters params = new
> > RequestParameters();
> > 
> >         while (tags.hasMoreNodes())
> >         {
> > ...
> >         }
> > 
> > 
> > Thanks,
> > 
> > Bob Lewis
> > 
> > 
> > __________________________________________________
> > Do you Yahoo!?
> > Yahoo! Tax Center - forms, calculators, tips, more
> > http://taxes.yahoo.com/
> > 
> > 
> >
>
-------------------------------------------------------
> > This sf.net email is sponsored by:ThinkGeek
> > Welcome to geek heaven.
> > http://thinkgeek.com/sf
> > _______________________________________________
> > Htmlparser-user mailing list
> > Htm...@li...
> >
>
https://lists.sourceforge.net/lists/listinfo/htmlparser-user
> 
> 
> __________________________________________________
> Do you Yahoo!?
> Yahoo! Tax Center - forms, calculators, tips, more
> http://taxes.yahoo.com/
> 
> 
>
-------------------------------------------------------
> This sf.net email is sponsored by:ThinkGeek
> Welcome to geek heaven.
> http://thinkgeek.com/sf
> _______________________________________________
> Htmlparser-user mailing list
> Htm...@li...
>
https://lists.sourceforge.net/lists/listinfo/htmlparser-user

__________________________________________________
Do you Yahoo!?
Yahoo! Tax Center - forms, calculators, tips, more
http://taxes.yahoo.com/