Menu

Crawling Problem

Help
randome
2015-04-03
2015-04-03
  • randome

    randome - 2015-04-03

    Hey there,

    every other day all our indexes stop crawling because of this Error: Error (org.apache.http.conn.HttpHostConnectException: Connect to domain.com:80 [domain.com/83.65.246.198] failed: Connection refused)

    Guess this happens when it crawls our Pages when the DB-Backup is running. (Site not reachable for a few Minutes) - but what can I do here? It's not possible to restart all Indexes every other day ... :-(

    Thank you
    Andreas Schnederle-Wagner

     
  • Alexandre Toyer

    Alexandre Toyer - 2015-04-03

    Hi,

    Yes the crawler can sometimes stops if it encounters an error. What you could do is create a Job in the Scheduler to automatically and regularly restart the crawler. Use task "Web crawler - start" with parameter "Run once : false". Schedule this job to run every night or more often if you want. It the crawler is already started then the job will do nothing else.

    Regards,
    Alexandre

     
  • randome

    randome - 2015-04-03

    alright - will try that ... thx :)

    Maybe in some Future-Release it would make sense to silently ignore Connection Errors and not stop the whole Crawl Process ... Cron with Restart is a bit "hacky" ;-)

    Andreas

     
  • Emmanuel Keller

    Emmanuel Keller - 2015-04-03

    A crawl session may fail if any unexpected error occurs: out of memory, disk failures. In this case the solution submitted by Alexandre is ok.

    However this error (HttpHostConnectException) should be handled by updating the URL fetch status without stopping the crawl session.

    I just created the issue.
    https://github.com/jaeksoft/opensearchserver/issues/1494

     

Log in to post a comment.