every other day all our indexes stop crawling because of this Error: Error (org.apache.http.conn.HttpHostConnectException: Connect to domain.com:80 [domain.com/83.65.246.198] failed: Connection refused)
Guess this happens when it crawls our Pages when the DB-Backup is running. (Site not reachable for a few Minutes) - but what can I do here? It's not possible to restart all Indexes every other day ... :-(
Thank you
Andreas Schnederle-Wagner
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Yes the crawler can sometimes stops if it encounters an error. What you could do is create a Job in the Scheduler to automatically and regularly restart the crawler. Use task "Web crawler - start" with parameter "Run once : false". Schedule this job to run every night or more often if you want. It the crawler is already started then the job will do nothing else.
Regards,
Alexandre
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Maybe in some Future-Release it would make sense to silently ignore Connection Errors and not stop the whole Crawl Process ... Cron with Restart is a bit "hacky" ;-)
Andreas
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hey there,
every other day all our indexes stop crawling because of this Error: Error (org.apache.http.conn.HttpHostConnectException: Connect to domain.com:80 [domain.com/83.65.246.198] failed: Connection refused)
Guess this happens when it crawls our Pages when the DB-Backup is running. (Site not reachable for a few Minutes) - but what can I do here? It's not possible to restart all Indexes every other day ... :-(
Thank you
Andreas Schnederle-Wagner
Hi,
Yes the crawler can sometimes stops if it encounters an error. What you could do is create a Job in the Scheduler to automatically and regularly restart the crawler. Use task "Web crawler - start" with parameter "Run once : false". Schedule this job to run every night or more often if you want. It the crawler is already started then the job will do nothing else.
Regards,
Alexandre
alright - will try that ... thx :)
Maybe in some Future-Release it would make sense to silently ignore Connection Errors and not stop the whole Crawl Process ... Cron with Restart is a bit "hacky" ;-)
Andreas
A crawl session may fail if any unexpected error occurs: out of memory, disk failures. In this case the solution submitted by Alexandre is ok.
However this error (HttpHostConnectException) should be handled by updating the URL fetch status without stopping the crawl session.
I just created the issue.
https://github.com/jaeksoft/opensearchserver/issues/1494