htmlparser-user Mailing List for HTML Parser (Page 83)

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi Bob,
    Can you try this - get the data from the url in question into a file
(using a post request). Then try to parse the file. If it breaks, we would
know why.

Regards,
Somik
----- Original Message -----
From: "Bob Lewis" <bob...@ya...>
To: <htm...@li...>
Sent: Tuesday, February 25, 2003 12:07 PM
Subject: Re: [Htmlparser-user] Malformed Input Exception


>
> I tried using the parser directly, as you suggested,
> and it seems to work.  However, I need to be able work
> with the URLConnection to set headers, cookies and
> send POST data.
>
> Typically, this is what I'm doing:
>
>     //create and initialize the URL Connection
>     HttpURLConnection urlConn = null;
>     URL url = new URL("http://somedomain/somepath");
>     urlConn = (HttpURLConnection)url.openConnection();
>     urlConn.setDoInput(true);
>     urlConn.setDoOutput(true);
>     urlConn.setUseCaches(false);
>     urlConn.setAllowUserInteraction(false);
>     urlConn.setRequestMethod("POST");
>
>     // ... usually many HTTP Headers and cookie values
> set
>     urlConn.setRequestProperty("someHeader",
> "someValue");
>     urlConn.setRequestProperty("anotherHeader",
> "anotherValue");
>
>     StringBuffer postData = new StringBuffer();
>      // ... generate post data in buffer
>
>     //Send the post data
>     PrintWriter printWriter = new
> PrintWriter(urlConn.getOutputStream());
>     printWriter.println(postData.toString());
>     printWriter.close();
>
>     //parse the response
>     HTMLEnumeration tags = parser.elements();
>
>     while (parser.hasMoreNodes())
>     {
>         // ... Do Something
>     }
>
> This works fine on most URLs.  I am normally able to
> execute the server-side web application, obtain and
> parse the HTML response.   However, in the case of
> these two URLs, I get the MalformedInputException.
>
> Is there something I'm missing?
>
> Thanks,
>
> Bob Lewis
>
> --- Somik Raha <so...@ya...> wrote:
>
> >Date: 2003-02-24 21:33
> >Sender: somik
> >Logged In: YES
> >user_id=187944
> >
> >I ran the parser on these pages and it worked fine.
> Try
> >runParser.bat http://www.flytango.com/en/index.html.
> >
> >It could be that you have intialized your
> urlconnection
> >incorrectly. Have you tried using the parser
> directly, like :
> >
> >HTMLParser parser = new HTMLParser
> >("http://www.flytango.com/en/index.html");
> >for (NodeIterator
> i=parser.elements();i.hasMoreNodes();) {
> >   System.out.println(i.nextNode().toHtml());
> >}
>
> --- Somik Raha <so...@ya...> wrote:
> > Hi Bob,
> >   Sounds like a bug.
> >   Can you file a bug report at
> > http://htmlparser.sourceforge.net?
> >
> > Regards,
> > Somik
> > --- Bob Lewis <bob...@ya...> wrote:
> > > Hi,
> > >
> > > I am trying to use htmlparser 1.3 to parse the
> > HTML
> > > at
> > > http://www.flytango.com/en/taschedule.html and
> > > http://www.flytango.com/en/index.html. When I
> > > attempt
> > > to parse these pages, I get
> > > com.sun.io.MalformedInputException:
> > >
> > > sun.io.MalformedInputException
> > >         at
> > >
> >
> sun.io.ByteToCharUTF8.convert(ByteToCharUTF8.java:105)
> > >         at
> > >
> >
> java.io.InputStreamReader.convertInto(InputStreamReader.java:132)
> > >         at
> > >
> >
> java.io.InputStreamReader.fill(InputStreamReader.java:181)
> > >         at
> > >
> >
> java.io.InputStreamReader.read(InputStreamReader.java:244)
> > >         at
> > >
> > java.io.BufferedReader.fill(BufferedReader.java:134)
> > >         at
> > >
> >
> java.io.BufferedReader.readLine(BufferedReader.java:294)
> > >         at
> > >
> >
> java.io.BufferedReader.readLine(BufferedReader.java:357)
> > >         at
> > >
> >
> org.htmlparser.HTMLReader.getNextLine(HTMLReader.java:139)
> > >         at
> > >
> >
> org.htmlparser.HTMLReader.readElement(HTMLReader.java:176)
> > >         at
> > >
> >
> org.htmlparser.util.HTMLEnumerationImpl.peek(HTMLEnumerationImpl.java:60)
> > >         at
> > >
> >
>
org.htmlparser.util.HTMLEnumerationImpl.hasMoreNodes(HTMLEnumerationImpl.jav
a:91)
> > >
> > > Now, if I copy the source of these pages from a
> > > browser into a file and put them on my own
> > > webserver,
> > > I can parse them without any errors.
> > >
> > > It's my guess that there is some strange control
> > > character in the source that is causing the
> > > exception,
> > > but I'm not entirely sure.  Any suggestions?  If
> > it
> > > is
> > > a bad character, would it be possible to add code
> > to
> > > HTMLReader that strips offending characters from
> > the
> > > input stream?
> > >
> > > Here is the code I am using to parse:
> > >
> > >         DefaultHTMLParserFeedback feedback
> > >             = new
> > >
> >
> DefaultHTMLParserFeedback(DefaultHTMLParserFeedback.DEBUG);
> > >
> > >         HTMLReader reader = null;
> > >         HTMLParser parser = null;
> > >         InputStreamReader isr
> > >             = new
> > > InputStreamReader(urlConn.getInputStream());
> > >         reader = new HTMLReader(isr, 8192);
> > >         parser = new HTMLParser(reader, feedback);
> > >         boolean inForm = false;
> > >
> > >         parser.addScanner(new
> > > HTMLInputTagScanner());
> > >
> > >         HTMLEnumeration tags = parser.elements();
> > >
> > >         RequestParameters params = new
> > > RequestParameters();
> > >
> > >         while (tags.hasMoreNodes())
> > >         {
> > > ...
> > >         }
> > >
> > >
> > > Thanks,
> > >
> > > Bob Lewis
> > >
> > >
> > > __________________________________________________
> > > Do you Yahoo!?
> > > Yahoo! Tax Center - forms, calculators, tips, more
> > > http://taxes.yahoo.com/
> > >
> > >
> > >
> >
> -------------------------------------------------------
> > > This sf.net email is sponsored by:ThinkGeek
> > > Welcome to geek heaven.
> > > http://thinkgeek.com/sf
> > > _______________________________________________
> > > Htmlparser-user mailing list
> > > Htm...@li...
> > >
> >
> https://lists.sourceforge.net/lists/listinfo/htmlparser-user
> >
> >
> > __________________________________________________
> > Do you Yahoo!?
> > Yahoo! Tax Center - forms, calculators, tips, more
> > http://taxes.yahoo.com/
> >
> >
> >
> -------------------------------------------------------
> > This sf.net email is sponsored by:ThinkGeek
> > Welcome to geek heaven.
> > http://thinkgeek.com/sf
> > _______________________________________________
> > Htmlparser-user mailing list
> > Htm...@li...
> >
> https://lists.sourceforge.net/lists/listinfo/htmlparser-user
>
>
> __________________________________________________
> Do you Yahoo!?
> Yahoo! Tax Center - forms, calculators, tips, more
> http://taxes.yahoo.com/
>
>
> -------------------------------------------------------
> This sf.net email is sponsored by:ThinkGeek
> Welcome to geek heaven.
> http://thinkgeek.com/sf
> _______________________________________________
> Htmlparser-user mailing list
> Htm...@li...
> https://lists.sourceforge.net/lists/listinfo/htmlparser-user

2001	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov (1)	Dec
2002	Jan (7)	Feb	Mar (9)	Apr (50)	May (20)	Jun (47)	Jul (37)	Aug (32)	Sep (30)	Oct (11)	Nov (37)	Dec (47)
2003	Jan (31)	Feb (70)	Mar (67)	Apr (34)	May (66)	Jun (25)	Jul (48)	Aug (43)	Sep (58)	Oct (25)	Nov (10)	Dec (25)
2004	Jan (38)	Feb (17)	Mar (24)	Apr (25)	May (11)	Jun (6)	Jul (24)	Aug (42)	Sep (13)	Oct (17)	Nov (13)	Dec (44)
2005	Jan (10)	Feb (16)	Mar (16)	Apr (23)	May (6)	Jun (19)	Jul (39)	Aug (15)	Sep (40)	Oct (49)	Nov (29)	Dec (41)
2006	Jan (28)	Feb (24)	Mar (52)	Apr (41)	May (31)	Jun (34)	Jul (22)	Aug (12)	Sep (11)	Oct (11)	Nov (11)	Dec (4)
2007	Jan (39)	Feb (13)	Mar (16)	Apr (24)	May (13)	Jun (12)	Jul (21)	Aug (61)	Sep (31)	Oct (13)	Nov (32)	Dec (15)
2008	Jan (7)	Feb (8)	Mar (14)	Apr (12)	May (23)	Jun (20)	Jul (9)	Aug (6)	Sep (2)	Oct (7)	Nov (3)	Dec (2)
2009	Jan (5)	Feb (8)	Mar (10)	Apr (22)	May (85)	Jun (82)	Jul (45)	Aug (28)	Sep (26)	Oct (50)	Nov (8)	Dec (16)
2010	Jan (3)	Feb (11)	Mar (39)	Apr (56)	May (80)	Jun (64)	Jul (49)	Aug (48)	Sep (16)	Oct (3)	Nov (5)	Dec (5)
2011	Jan (13)	Feb	Mar (1)	Apr (7)	May (7)	Jun (7)	Jul (7)	Aug (8)	Sep	Oct (6)	Nov (2)	Dec
2012	Jan (5)	Feb	Mar (3)	Apr (3)	May (4)	Jun (8)	Jul (1)	Aug (5)	Sep (10)	Oct (3)	Nov (2)	Dec (4)
2013	Jan (4)	Feb (2)	Mar (7)	Apr (7)	May (6)	Jun (7)	Jul (3)	Aug	Sep (1)	Oct	Nov	Dec
2014	Jan	Feb (2)	Mar (1)	Apr	May (3)	Jun (1)	Jul	Aug	Sep (1)	Oct (4)	Nov (2)	Dec (4)
2015	Jan (4)	Feb (2)	Mar (8)	Apr (7)	May (6)	Jun (7)	Jul (3)	Aug (1)	Sep (1)	Oct (4)	Nov (3)	Dec (4)
2016	Jan (4)	Feb (6)	Mar (9)	Apr (9)	May (6)	Jun (1)	Jul (1)	Aug	Sep	Oct (1)	Nov (1)	Dec (1)
2017	Jan	Feb (1)	Mar (3)	Apr (1)	May	Jun (1)	Jul (2)	Aug (3)	Sep (6)	Oct (3)	Nov (2)	Dec (5)
2018	Jan (3)	Feb (13)	Mar (28)	Apr (5)	May (4)	Jun (2)	Jul (2)	Aug (8)	Sep (2)	Oct (1)	Nov (5)	Dec (1)
2019	Jan (8)	Feb (1)	Mar	Apr (1)	May (4)	Jun	Jul (1)	Aug	Sep	Oct	Nov (2)	Dec (2)
2020	Jan	Feb	Mar (1)	Apr (1)	May (1)	Jun (2)	Jul (1)	Aug (1)	Sep (1)	Oct	Nov (1)	Dec (1)
2021	Jan (3)	Feb (2)	Mar (1)	Apr (1)	May (2)	Jun (1)	Jul (2)	Aug (1)	Sep	Oct	Nov	Dec
2022	Jan	Feb	Mar	Apr (1)	May (1)	Jun (1)	Jul	Aug (1)	Sep	Oct	Nov	Dec
2023	Jan (2)	Feb	Mar	Apr	May	Jun	Jul	Aug (1)	Sep	Oct	Nov	Dec
2024	Jan (2)	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2025	Jan	Feb	Mar	Apr	May	Jun (1)	Jul	Aug	Sep	Oct	Nov	Dec

htmlparser-user Mailing List for HTML Parser (Page 83)

htmlparser-user — The user mailing list for users of the htmlparser library