OpenSearchServer Search Engine / Discussion / Help: Crawling Problem

Crawling Problem

Forum: Help

Creator: randome

Created: 2015-04-03

Updated: 2015-04-03

randome - 2015-04-03

Hey there,

every other day all our indexes stop crawling because of this Error: Error (org.apache.http.conn.HttpHostConnectException: Connect to domain.com:80 [domain.com/83.65.246.198] failed: Connection refused)

Guess this happens when it crawls our Pages when the DB-Backup is running. (Site not reachable for a few Minutes) - but what can I do here? It's not possible to restart all Indexes every other day ... :-(

Thank you
Andreas Schnederle-Wagner

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Alexandre Toyer - 2015-04-03

Hi,

Yes the crawler can sometimes stops if it encounters an error. What you could do is create a Job in the Scheduler to automatically and regularly restart the crawler. Use task "Web crawler - start" with parameter "Run once : false". Schedule this job to run every night or more often if you want. It the crawler is already started then the job will do nothing else.

Regards,
Alexandre

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

randome - 2015-04-03

alright - will try that ... thx :)

Maybe in some Future-Release it would make sense to silently ignore Connection Errors and not stop the whole Crawl Process ... Cron with Restart is a bit "hacky" ;-)

Andreas

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Emmanuel Keller - 2015-04-03

A crawl session may fail if any unexpected error occurs: out of memory, disk failures. In this case the solution submitted by Alexandre is ok.

However this error (HttpHostConnectException) should be handled by updating the URL fetch status without stopping the crawl session.

I just created the issue.
https://github.com/jaeksoft/opensearchserver/issues/1494

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.