Basic understanding: Crawl-process parameters

Help
reinhard
2013-11-14
2013-11-18
  • reinhard

    reinhard - 2013-11-14

    Sorry, I lack understanding of the Crawler > Crawl-process Tab
    Specifically I am confused by "Number of Urls to crawl" and "Maximum number of URLs per host"
    and "RunOnce" versus "RunForever"

    My site has some 8000+ Urls (mostly Webshop-items from a database)

    I want OSS to:
    1. check for new pages
    2. delete gone pages
    3. update page informations

    I want this once a day around midnight.

    What would the optimal settings for the above parameters be?

    RunOnce - RunForever ?
    Number of URLS to crawl....-- eg.: 10.000 ?
    Max Nr. of Urls per host... -- also 10.000 as I have only one host...?

    Fetch interval: 1 per Day... but at which time?

    Thanks!

     
    • reinhard

      reinhard - 2013-11-18

      Can anyone clarify this?

      How would I crawl some 1000 pages?
      Thank you!

       

Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks