Thread: [Htmlparser-user] Not able to catch ParserException
Brought to you by:
derrickoswald
From: <oys...@me...> - 2007-01-10 15:22:02
|
Hi list! :-] I seem to have a problem catching the ParserException when using StringExtractor. My console says: org.htmlparser.util.ParserException: Exception getting input stream from (and so on...) ...but my logger (log4j) does not print and the string "test" is not written to the console. String content = ""; try { StringExtractor se = new StringExtractor(url); content = se.extractStrings(false); System.out.println(content); } catch(ParserException e){ logger.error("Could not parse url",e); System.out.println("test"); } Could the exception be handled some place else that I'm not aware of? The string "content" is sometimes empty due to a 401 error. Thanks in advance for any reply! Best regards, Øystein |
From: Martin S. <mst...@gm...> - 2007-01-10 16:31:33
|
Hi, Are you sure you are running your own program? StringExtractor is only a sample application for the StringBean, so it woul= d be better to use the StringBean directly instead of StringExtractor. I did not experience your problem (I use StringBean rather intensively) and it would mean that the ParserException is catched somewhere in the HTMLParser codebase and printed directly to standard out. This is not very likely I think. However, the main() method of StringExtractor does catch ParserException an= d print it to standard out. Maybe you can try to use StringBean directly insttead of StringExtractor: String content =3D ""; try { StringBean sb; sb =3D new StringBean (); sb.setLinks (false); sb.setURL (url); content =3D sb.getStrings(); System.out.println(content); } catch (ParserException e) { logger.error("Could not parse url", e); System.out.println("test"); } -- Martin Sturm 2007/1/10, =D8ystein Lervik Larsen <oys...@me...>: > > Hi list! :-] > > I seem to have a problem catching the ParserException when using > StringExtractor. > > My console says: > org.htmlparser.util.ParserException: Exception getting input stream from > (and so on...) > > ...but my logger (log4j) does not print and the string "test" is not > written to the console. > > > String content =3D ""; > try { > StringExtractor se =3D new StringExtractor(url); > content =3D se.extractStrings(false); > System.out.println(content); > } > catch(ParserException e){ > logger.error("Could not parse url",e); > System.out.println("test"); > } > > > Could the exception be handled some place else that I'm not aware of? > The string "content" is sometimes empty due to a 401 error. > > Thanks in advance for any reply! > > Best regards, > =D8ystein > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share > your > opinions on IT & business topics through brief surveys - and earn cash > http://www.techsay.com/default.php?page=3Djoin.php&p=3Dsourceforge&CID=3D= DEVDEV > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > |
From: <oys...@me...> - 2007-01-11 09:19:40
|
Martin Sturm wrote: > Hi, Good morning, thanks for your reply! > Are you sure you are running your own program? Not quite sure what you mean there... > StringExtractor is only a sample application for the StringBean, so it > would be better to use the StringBean directly instead of StringExtractor. I tried that too and the same thing happened, see below. > I did not experience your problem (I use StringBean rather intensively) > and it would mean that the ParserException is catched somewhere in the > HTMLParser codebase and printed directly to standard out. This is not > very likely I think. That is exactly what I'm suspecting. Though unlikely - is it possible? > Maybe you can try to use StringBean directly insttead of StringExtractor: > > String content = ""; > try { > StringBean sb; > > sb = new StringBean (); > sb.setLinks (false); > sb.setURL (url); > content = sb.getStrings(); > System.out.println(content); > } catch (ParserException e) { > logger.error("Could not parse url", e); > System.out.println("test"); > } I modified my code according to your example but removed the try/catch and the exception was still written to the console the same way as it used to. When I'm not catching the exception I guess it get caught some place else? The exception occurs when the web server responds with http error code 401 Unauthorized. I'm developing a web application using Tomcat and Spring framework if that's relevant. -Øystein |
From: Martin S. <mst...@gm...> - 2007-01-11 12:35:43
|
Hi, While I'm not a developer of HTMLParser, I think it is very unlikely that somewhere in the code a ParseException is printed to standard out. I use StringBean rather intensively in my application, but did not have this problem. The only thing I can think of is that your logger is printing the exception to standard out or something (you are passing the exception to the error method). Does the problem occur when you comment out the logger.error("Coul= d not parse url", e); ? -- Martin 2007/1/11, =D8ystein Lervik Larsen <oys...@me...>: > > Martin Sturm wrote: > > Hi, > > Good morning, thanks for your reply! > > > Are you sure you are running your own program? > > Not quite sure what you mean there... > > > StringExtractor is only a sample application for the StringBean, so it > > would be better to use the StringBean directly instead of > StringExtractor. > > I tried that too and the same thing happened, see below. > > > I did not experience your problem (I use StringBean rather intensively) > > and it would mean that the ParserException is catched somewhere in the > > HTMLParser codebase and printed directly to standard out. This is not > > very likely I think. > > That is exactly what I'm suspecting. Though unlikely - is it possible? > > > Maybe you can try to use StringBean directly insttead of > StringExtractor: > > > > String content =3D ""; > > try { > > StringBean sb; > > > > sb =3D new StringBean (); > > sb.setLinks (false); > > sb.setURL (url); > > content =3D sb.getStrings(); > > System.out.println(content); > > } catch (ParserException e) { > > logger.error("Could not parse url", e); > > System.out.println("test"); > > } > > I modified my code according to your example but removed the try/catch > and the exception was still written to the console the same way as it > used to. When I'm not catching the exception I guess it get caught some > place else? > > The exception occurs when the web server responds with http error code > 401 Unauthorized. > > I'm developing a web application using Tomcat and Spring framework if > that's relevant. > > > -=D8ystein > > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share > your > opinions on IT & business topics through brief surveys - and earn cash > http://www.techsay.com/default.php?page=3Djoin.php&p=3Dsourceforge&CID=3D= DEVDEV > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > |
From: <oys...@me...> - 2007-01-11 14:03:53
|
Hi, > The only thing I can think of is that your logger is printing the > exception to standard out or something (you are passing the exception to > the error method). Does the problem occur when you comment out the > logger.error ("Could not parse url", e); ? The exception is written to the console even though I'm not using any try/catch. StringBean sb; sb = new StringBean (); sb.setLinks(false); sb.setURL(url); content = sb.getStrings(); return content; Weird... thanks for your time though! -Øystein |
From: Martin S. <mst...@gm...> - 2007-01-11 14:26:55
|
Hi, It is indeed strange and I think it should be filed as a bug if you are sur= e it isn't a problem with your code. When I was reading the FAQ of the projec= t this morning (here), my eye catches the following question: http://htmlparser.sourceforge.net/faq.html#quiet It seems a bit related to your problem. What version of the HTMLParser are you using? (I forgot to ask that questio= n earlier). -- Martin 2007/1/11, =D8ystein Lervik Larsen <oys...@me...>: > > Hi, > > > The only thing I can think of is that your logger is printing the > > exception to standard out or something (you are passing the exception t= o > > the error method). Does the problem occur when you comment out the > > logger.error ("Could not parse url", e); ? > > The exception is written to the console even though I'm not using any > try/catch. > > StringBean sb; > sb =3D new StringBean (); > sb.setLinks(false); > sb.setURL(url); > content =3D sb.getStrings(); > return content; > > > Weird... thanks for your time though! > > -=D8ystein > > > > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share > your > opinions on IT & business topics through brief surveys - and earn cash > http://www.techsay.com/default.php?page=3Djoin.php&p=3Dsourceforge&CID=3D= DEVDEV > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > |
From: <oys...@me...> - 2007-01-11 17:20:54
|
Martin Sturm wrote: > Hi, > > It is indeed strange and I think it should be filed as a bug if you > are sure it isn't a problem with your code. The problem(for me) is when i issue sb.setURL(url) the method itself catches the ParserException and appends the exception-message as part of the string I receive in the end. I don't get to handle the real exception my self ...I think. If I use the Parser class and parser.setURL(url) I can handle the exception as I want. So that problem is now solved! But I also need the parser to figure out the encoding of the page by it self because the encoding sometimes varies. So this parser might not be what I'm looking for. Btw, it seems to stick to ISO-8859-1 no matter what encoding I set with parser.setEncoding("UTF-8");. (I have not tested this very much). > What version of the HTMLParser are you using? (I forgot to ask that > question earlier). I'm using HTMLParser Version 1.6 (Release Build Jun 10, 2006). -Øystein |