CPU usage 99 percent - How to best crawl BIG site?

Help
reinhard
2013-11-15
2013-11-15
  • reinhard
    reinhard
    2013-11-15

    I am still experimenting with the crawler-parameters - see prev. question.

    I re-read the help about crawler-params and set the max-nr-to-crawl to 10.000
    as my site has some 8000 pages.
    To shorten the time, I set Delay between access to 1 sec.
    I RunOnce.
    After some minutes (URLS processed: 890) my Tomcat uses 99% cpu and 1.7 Gb mem (of 4G total)
    The server is nearly unresponsive and I have to kill the process.

    OTH if I leave max-nr at 100 and set RunForever the same happens after a while.

    How would you crawl such a large amount of URLs ?

    Thank you,
    Reinhard

     
  • Naveen A.N
    Naveen A.N
    2013-11-15

    Hello Reinhard,

    May i know how much GB of memory has been allocated to OpenSearchServer?

    Naveen.A.N

     
  • reinhard
    reinhard
    2013-11-15

    Not sure where to find this, but under http://<root>/manager -> Server Status -> JVM it says:
    Free memory: 106.03 MB Total memory: 433.81 MB Max memory: 989.87 MB

    under OSS-Url/ -> System -> General: Max memory 989,9 MB

     
  • Naveen A.N
    Naveen A.N
    2013-11-15

    Hello Reinhard,

    It seems the memory allocated to JVM is very low.

    If you are using OpenSearchServer version below 1.5 you can refer the below documentation to allocate more memory.
    http://www.open-search-server.com/confluence/display/EN/Out+of+memory+issue

    If you are using the version above 1.5 you can open the start.sh or start.bat file
    Remove the "#" from the line JAVA_OPTS="-Xms1G -Xmx1G". Which allows 1GB of RAM to the Server. You can add more by changing the values in Xms and Xms.

    For Example:
    JAVA_OPTS="-Xms3G -Xmx3G"

    Use 2-3 GB of ram so that you wont run into memory issues.

    Naveen.A.N

     
    Last edit: Naveen A.N 2013-11-15