Number of simultaneous crawls

  • Ralf Krenft

    Ralf Krenft - 2010-09-22

    First of all I would like to thank you and your team for this great open
    source search server.

    Currently our OSS instance crawls an old webserver with more than 100,000

    To obtain a current search-index, we have made following settings in the web-

    Fetch interval between re-fetches (days): 1

    Number of simultaneous threads: 20

    Number of URLs to crawl: 100

    Maximum number of URLs per host: 1000000

    Delay between each successive access, in seconds: 1

    Unfortunately, with this config only one or two threads used to crawl.

    Is there a way to use more threads to crawl.

    Regards Ralf

  • Emmanuel Keller

    Emmanuel Keller - 2010-09-22

    Hi Ralf,

    Thank you for your support !

    Currently, to avoid uncontrolled spam, OSS use one thread per hostname. To use
    the 20 threads, you should crawl at least 20 distinct hostname.

    To expedite the indexation you can also remove the delay by entering 0



  • Ralf Krenft

    Ralf Krenft - 2010-09-22

    Hi Emmanuel,

    Thanks for your quick reply.

    The value of 0 for the delay works great.

    Is there a way to run the crawling and the optimization of the index in
    different threads?

    The optimization of the index requires a lot of time.




Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

No, thanks