Learn how easy it is to sync an existing GitHub or Google Code repo to a SourceForge project! See Demo

Close

Excessive Crawl Rate?

Help
Thomas52
2013-09-22
2013-09-24
  • Thomas52
    Thomas52
    2013-09-22

    I received a message as follows:

    Hits Files KBytes Visits Hostname

    ===============================================
    1 84480 12.99% 74609 13.26% 480768 7.13% 25 0.44% blexn18.webmeup.com
    2 72667 11.17% 53134 9.44% 220347 3.27% 155 2.70% host-173-246-211-196.morrisbb.com
    3 47156 7.25% 42080 7.48% 790450 11.72% 279 4.86% crawl-66-249-73-196.googlebot.com
    4 9839 1.51% 8729 1.55% 67175 1.00% 233 4.06% crawl-66-249-73-59.googlebot.com
    5 4944 0.76% 53 0.01% 2533 0.04% 2 0.03% server.chicken8.com
    6 4698 0.72% 2611 0.46% 27176 0.40% 52 0.91% msnbot-157-56-93-52.search.msn.com
    7 4658 0.72% 2488 0.44% 21187 0.31% 14 0.24% ip68-98-77-62.ph.ph.cox.net
    8 3954 0.61% 3954 0.70% 23141 0.34% 10 0.17% google-proxy-66-249-82-122.google.com
    9 2909 0.45% 1511 0.27% 20504 0.30% 31 0.54% msnbot-157-55-34-176.search.msn.com
    10 2516 0.39% 2516 0.45% 14208 0.21% 11 0.19% google-proxy-66-249-85-196.google.com
    11 2390 0.37% 573 0.10% 9469 0.14% 28 0.49% spider-199-21-99-115.yandex.com
    12 2348 0.36% 2348 0.42% 13672 0.20% 5 0.09% google-proxy-66-249-81-184.google.com
    13 2215 0.34% 2213 0.39% 8498 0.13% 1 0.02% host9773005051.direcway.com
    14 2155 0.33% 1257 0.22% 17332 0.26% 38 0.66% msnbot-157-56-229-138.search.msn.com
    15 2118 0.33% 1565 0.28% 8130 0.12% 5 0.09% 24-240-18-247.dhcp.mdvl.ga.charter.com
    16 1981 0.30% 1970 0.35% 4888 0.07% 1 0.02% ec2-54-226-42-60.compute-1.amazonaws.com
    17 1954 0.30% 932 0.17% 11007 0.16% 23 0.40% msnbot-157-56-229-188.search.msn.com
    18 1892 0.29% 921 0.16% 13429 0.20% 26 0.45% msnbot-157-55-32-76.search.msn.com
    19 1882 0.29% 902 0.16% 10877 0.16% 18 0.31% msnbot-157-55-32-186.search.msn.com
    20 1841 0.28% 934 0.17% 11632 0.17% 23 0.40% msnbot-65-55-52-115.search.msn.com
    21 1793 0.28% 1793 0.32% 12105 0.18% 1 0.02% ip72-219-39-165.br.br.cox.net
    22 1698 0.26% 1697 0.30% 9791 0.15% 0 0.00% ip68-4-133-43.oc.oc.cox.net
    23 1570 0.24% 788 0.14% 10739 0.16% 25 0.44% msnbot-157-55-34-177.search.msn.com
    24 1561 0.24% 1028 0.18% 15712 0.23% 27 0.47% 218.30.103.142
    25 1515 0.23% 902 0.16% 12935 0.19% 24 0.42% msnbot-157-56-92-160.search.msn.com

    I would appreciate any recommendations to minimize this problem. Thanks

     
  • Gerry Kroll
    Gerry Kroll
    2013-09-24

    There are a lot of badly behaved crawlers out there, and some are actually hackers pretending to be crawlers.

    You need to be more proactive, and block those folks by IP address.

    Send me an e-mail, and I'll give you the list of the IP addresses I block on the "manage servers" page of the Admin menu.