Thread: Re: [Htmlparser-user] use with Google App Engine
Brought to you by:
derrickoswald
From: Andy W. <an...@aw...> - 2009-12-15 12:09:12
|
Thanks for the reply. You wrote that I can turn off cookie processing with: parser.getConnectionManager ().setCookieProcessingEnabled(false) However, I am using the StringBean class that has no access to the Parser it uses: My code is like: StringBean sb = new StringBean (); sb.setURL (http://blah-blah); String result = sb.getStrings (); Any idea how I can get to the parser? Thanks |
From: Andy W. <an...@aw...> - 2009-12-16 15:05:19
|
Thanks again. Doesn't seem to work for me though.... The code now is: Parser.getConnectionManager().setCookieProcessingEnabled(false); Parser parser = new Parser (http://TEST_URL); StringBean sb = new StringBean (); parser.visitAllNodesWith (sb); String result = sb.getStrings (); and the stack trace is: at com.google.apphosting.utils.security.urlfetch.URLFetchServiceStreamHandler$Connection.getHeaderFields(URLFetchServiceStreamHandler.java:211) at com.google.apphosting.utils.security.urlfetch.URLFetchServiceStreamHandler$Connection.getHeaderField(URLFetchServiceStreamHandler.java:196) at org.htmlparser.http.ConnectionManager.parseCookies(ConnectionManager.java:1097) at org.htmlparser.http.ConnectionManager.openConnection(ConnectionManager.java:669) at org.htmlparser.http.ConnectionManager.openConnection(ConnectionManager.java:848) at org.htmlparser.Parser.<init>(Parser.java:301) at org.htmlparser.Parser.<init>(Parser.java:313) May be a red herring though, as I can get the same result testing locally if I switch off the http server that servers the TEST_URL. |
From: Derrick O. <der...@gm...> - 2009-12-16 19:52:22
|
There's nothing we can do about the exception thrown in a third party package (com.google.apphosting.utils.security.urlfetch). You might check what you added to parseCookies() to get it to behave this way... or maybe just catch the exception. On Wed, Dec 16, 2009 at 4:05 PM, Andy Wickson <an...@aw...> wrote: > Thanks again. > Doesn't seem to work for me though.... > The code now is: > > Parser.getConnectionManager().setCookieProcessingEnabled(false); > Parser parser = new Parser (http://TEST_URL); > StringBean sb = new StringBean (); > parser.visitAllNodesWith (sb); > String result = sb.getStrings (); > > and the stack trace is: > > at > com.google.apphosting.utils.security.urlfetch.URLFetchServiceStreamHandler$Connection.getHeaderFields(URLFetchServiceStreamHandler.java:211) > at > com.google.apphosting.utils.security.urlfetch.URLFetchServiceStreamHandler$Connection.getHeaderField(URLFetchServiceStreamHandler.java:196) > at > org.htmlparser.http.ConnectionManager.parseCookies(ConnectionManager.java:1097) > at > org.htmlparser.http.ConnectionManager.openConnection(ConnectionManager.java:669) > at > org.htmlparser.http.ConnectionManager.openConnection(ConnectionManager.java:848) > at org.htmlparser.Parser.<init>(Parser.java:301) > at org.htmlparser.Parser.<init>(Parser.java:313) > > May be a red herring though, as I can get the same result testing locally > if I switch off the http server that servers the TEST_URL. > > > ------------------------------------------------------------------------------ > This SF.Net email is sponsored by the Verizon Developer Community > Take advantage of Verizon's best-in-class app development support > A streamlined, 14 day to market process makes app distribution fast and > easy > Join now and get one step closer to millions of Verizon customers > http://p.sf.net/sfu/verizon-dev2dev > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > |
From: Andy W. <an...@aw...> - 2009-12-17 10:44:14
|
For anyone having this problem in future, here is a workaround (exception handling omitted): // get the html 'the Google way' URL url = new URL(TEST_URL); BufferedReader reader = new BufferedReader(new InputStreamReader(url.openStream())); String line = null; StringBuilder buf = new StringBuilder(); while ((line = reader.readLine()) != null) { buf.append(line); buf.append(System.getProperty ("line.separator")); } reader.close(); // use createPaser method that expects html string Parser parser = Parser.createParser(buf.toString(), null); StringBean sb = new StringBean (); parser.visitAllNodesWith (sb); return (sb.getStrings ()); |
From: Derrick O. <der...@gm...> - 2009-12-15 16:39:22
|
As it says in the StringBean header, you can use the StringBean as a visitor: * StringBean sb = new StringBean (); * Parser parser = new Parser ("http://cbc.ca"); * parser.visitAllNodesWith (sb); * String s = sb.getStrings (); * sb.setLinks (true); * parser.reset (); * parser.visitAllNodesWith (sb); * String sl = sb.getStrings (); On Tue, Dec 15, 2009 at 1:09 PM, Andy Wickson <an...@aw...> wrote: > Thanks for the reply. > You wrote that I can turn off cookie processing with: > parser.getConnectionManager > ().setCookieProcessingEnabled(false) > > However, I am using the StringBean class that has no access to the Parser > it uses: > > My code is like: > > StringBean sb = new StringBean (); > sb.setURL (http://blah-blah); > String result = sb.getStrings (); > > Any idea how I can get to the parser? > > Thanks > > > > > > > ------------------------------------------------------------------------------ > Return on Information: > Google Enterprise Search pays you back > Get the facts. > http://p.sf.net/sfu/google-dev2dev > > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > |