Re: [Htmlparser-user] Malformed Input Exception
Brought to you by:
derrickoswald
From: Somik R. <so...@ya...> - 2003-02-26 06:44:52
|
That sounds like a good feature request. Derrick ->what do you think ? Regards, Somik ----- Original Message ----- From: "Bob Lewis" <bob...@ya...> To: <htm...@li...> Sent: Tuesday, February 25, 2003 12:20 PM Subject: Re: [Htmlparser-user] Malformed Input Exception > Sorry, there was a typo in my last message: > > > while (parser.hasMoreNodes()) > > { > > // ... Do Something > > } > > should be > > while (tags.hasMoreNodes()) > { > // ... Do Something > } > > Also, on another note, if I try to initialize the > parser directly, I am unable to work with the > URLConnection. For example: > > HttpURLConnection urlConn = null; > HTMLParser parser = new > HTMLParser("http://somedomain/somepath"); > urlConn = > (HttpURLConnection)parser.getConnection(); > urlConn.setDoInput(true); > // ... > > This code throws an exception because the HTTP request > has already been made. > > Exception in thread "main" > java.lang.IllegalAccessError: Already connected > at > java.net.URLConnection.setDoInput(URLConnection.java:677) > > --- Bob Lewis <bob...@ya...> wrote: > > > > I tried using the parser directly, as you suggested, > > and it seems to work. However, I need to be able > > work > > with the URLConnection to set headers, cookies and > > send POST data. > > > > Typically, this is what I'm doing: > > > > //create and initialize the URL Connection > > HttpURLConnection urlConn = null; > > URL url = new URL("http://somedomain/somepath"); > > urlConn = > > (HttpURLConnection)url.openConnection(); > > urlConn.setDoInput(true); > > urlConn.setDoOutput(true); > > urlConn.setUseCaches(false); > > urlConn.setAllowUserInteraction(false); > > urlConn.setRequestMethod("POST"); > > > > // ... usually many HTTP Headers and cookie > > values > > set > > urlConn.setRequestProperty("someHeader", > > "someValue"); > > urlConn.setRequestProperty("anotherHeader", > > "anotherValue"); > > > > StringBuffer postData = new StringBuffer(); > > // ... generate post data in buffer > > > > //Send the post data > > PrintWriter printWriter = new > > PrintWriter(urlConn.getOutputStream()); > > printWriter.println(postData.toString()); > > printWriter.close(); > > > > //parse the response > > HTMLEnumeration tags = parser.elements(); > > > > while (parser.hasMoreNodes()) > > { > > // ... Do Something > > } > > > > This works fine on most URLs. I am normally able to > > execute the server-side web application, obtain and > > parse the HTML response. However, in the case of > > these two URLs, I get the MalformedInputException. > > > > Is there something I'm missing? > > > > Thanks, > > > > Bob Lewis > > > > --- Somik Raha <so...@ya...> wrote: > > > > >Date: 2003-02-24 21:33 > > >Sender: somik > > >Logged In: YES > > >user_id=187944 > > > > > >I ran the parser on these pages and it worked fine. > > Try > > >runParser.bat > > http://www.flytango.com/en/index.html. > > > > > >It could be that you have intialized your > > urlconnection > > >incorrectly. Have you tried using the parser > > directly, like : > > > > > >HTMLParser parser = new HTMLParser > > >("http://www.flytango.com/en/index.html"); > > >for (NodeIterator > > i=parser.elements();i.hasMoreNodes();) { > > > System.out.println(i.nextNode().toHtml()); > > >} > > > > --- Somik Raha <so...@ya...> wrote: > > > Hi Bob, > > > Sounds like a bug. > > > Can you file a bug report at > > > http://htmlparser.sourceforge.net? > > > > > > Regards, > > > Somik > > > --- Bob Lewis <bob...@ya...> wrote: > > > > Hi, > > > > > > > > I am trying to use htmlparser 1.3 to parse the > > > HTML > > > > at > > > > http://www.flytango.com/en/taschedule.html and > > > > http://www.flytango.com/en/index.html. When I > > > > attempt > > > > to parse these pages, I get > > > > com.sun.io.MalformedInputException: > > > > > > > > sun.io.MalformedInputException > > > > at > > > > > > > > > > sun.io.ByteToCharUTF8.convert(ByteToCharUTF8.java:105) > > > > at > > > > > > > > > > java.io.InputStreamReader.convertInto(InputStreamReader.java:132) > > > > at > > > > > > > > > > java.io.InputStreamReader.fill(InputStreamReader.java:181) > > > > at > > > > > > > > > > java.io.InputStreamReader.read(InputStreamReader.java:244) > > > > at > > > > > > > > > java.io.BufferedReader.fill(BufferedReader.java:134) > > > > at > > > > > > > > > > java.io.BufferedReader.readLine(BufferedReader.java:294) > > > > at > > > > > > > > > > java.io.BufferedReader.readLine(BufferedReader.java:357) > > > > at > > > > > > > > > > org.htmlparser.HTMLReader.getNextLine(HTMLReader.java:139) > > > > at > > > > > > > > > > org.htmlparser.HTMLReader.readElement(HTMLReader.java:176) > > > > at > > > > > > > > > > org.htmlparser.util.HTMLEnumerationImpl.peek(HTMLEnumerationImpl.java:60) > > > > at > > > > > > > > > > org.htmlparser.util.HTMLEnumerationImpl.hasMoreNodes(HTMLEnumerationImpl.jav a:91) > > > > > > > > Now, if I copy the source of these pages from a > > > > browser into a file and put them on my own > > > > webserver, > > > > I can parse them without any errors. > > > > > > > > It's my guess that there is some strange control > > > > character in the source that is causing the > > > > exception, > > > > but I'm not entirely sure. Any suggestions? If > > > it > > > > is > > > > a bad character, would it be possible to add > > code > > > to > > > > HTMLReader that strips offending characters from > > > the > > > > input stream? > > > > > > > > Here is the code I am using to parse: > > > > > > > > DefaultHTMLParserFeedback feedback > > > > = new > > > > > > > > > > DefaultHTMLParserFeedback(DefaultHTMLParserFeedback.DEBUG); > > > > > > > > HTMLReader reader = null; > > > > HTMLParser parser = null; > > > > InputStreamReader isr > > > > = new > > > > InputStreamReader(urlConn.getInputStream()); > > > > reader = new HTMLReader(isr, 8192); > > > > parser = new HTMLParser(reader, > > feedback); > > > > boolean inForm = false; > > > > > > > > parser.addScanner(new > > > > HTMLInputTagScanner()); > > > > > > > > HTMLEnumeration tags = > > parser.elements(); > > > > > > > > RequestParameters params = new > > > > RequestParameters(); > > > > > > > > while (tags.hasMoreNodes()) > > > > { > > > > ... > > > > } > > > > > > > > > > > > Thanks, > > > > > > > > Bob Lewis > > > > > > > === message truncated === > > > __________________________________________________ > Do you Yahoo!? > Yahoo! Tax Center - forms, calculators, tips, more > http://taxes.yahoo.com/ > > > ------------------------------------------------------- > This sf.net email is sponsored by:ThinkGeek > Welcome to geek heaven. > http://thinkgeek.com/sf > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user |