hi,
I am using the parser to extract links from HTML pages. It is a part of my application. The problem is that for some URLs like 'http://news.google.com', which cannot be fetched, the parser throws an error and exits. This makes my appilcation stop. Can anyone suggest a way to catch and ignore this error rather than stopping the application.
Thank you.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
A sophisticated application would use parser.setConnection() with a valid connection (whatever it takes for the URL not to issue a 403). See http://htmlparser.sourceforge.net/wiki/index.php/PostOperation for an example using POST, but the same technique applies to exotic GET requests.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
hi,
I am using the parser to extract links from HTML pages. It is a part of my application. The problem is that for some URLs like 'http://news.google.com', which cannot be fetched, the parser throws an error and exits. This makes my appilcation stop. Can anyone suggest a way to catch and ignore this error rather than stopping the application.
Thank you.
That URL issues a 403 - forbidden, with the simple GET that the parser uses. Catch the IOException it's throwing:
try
{
...do the parse...
}
catch (IOException ioe)
{
ioe.printStackTrace();
}
A sophisticated application would use parser.setConnection() with a valid connection (whatever it takes for the URL not to issue a 403). See http://htmlparser.sourceforge.net/wiki/index.php/PostOperation for an example using POST, but the same technique applies to exotic GET requests.