Menu

Robots not being detected

2013-03-07
2013-03-08
  • alvin

    alvin - 2013-03-07

    Hello,

    I am running Awstats 7.0 on openSUSE 12.2. I installed it via the openSUSE repos.

    For some reason, bots such as Baidu, googlebot, msn, etc. are not being detected. In Robot
    visitors I see:

    Unknown robot (identified by hit on 'robots.txt')
    Unknown robot (identified by empty user agent string)

    However, in Hosts, there are a plethora of:
    googlebot.com
    crawl.baidu.com
    msnbot

    I am using perl 5.16.

    I have cleared my awstats cached data results and force it to regenerate from the original
    log files. An example of the Baidu crawl entry is as follows (sorry in advanced for line
    wrapping):

    180.76.6.212 - - [02/Mar/2013:12:19:12 -0500] "GET / HTTP/1.1" 200 3833 "-" "Mozilla/5.0
    (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"

    Any ideas of how I can fix this?

    Cheers,

    Alvin

     
    • Laurent Destailleur (Eldy)

      There is a bug into 7.0 version with recent perl version. Did you try with
      AWStats 7.1 ?

      2013/3/7 vir2phs vir2phys@users.sf.net

      Hello,

      I am running Awstats 7.0 on openSUSE 12.2. I installed it via the openSUSE
      repos.

      For some reason, bots such as Baidu, googlebot, msn, etc. are not being
      detected. In Robot
      visitors I see:

      Unknown robot (identified by hit on 'robots.txt')
      Unknown robot (identified by empty user agent string)

      However, in Hosts, there are a plethora of:
      googlebot.com
      crawl.baidu.com
      msnbot

      I am using perl 5.16.

      I have cleared my awstats cached data results and force it to regenerate
      from the original
      log files. An example of the Baidu crawl entry is as follows (sorry in
      advanced for line
      wrapping):

      180.76.6.212 - - [02/Mar/2013:12:19:12 -0500] "GET / HTTP/1.1" 200 3833"-" "Mozilla/5.0
      (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"

      Any ideas of how I can fix this?

      Cheers,

      Alvin

      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/awstats/discussion/43428/

      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/prefs/

       
  • Albrecht Mueller

    I think I saw a similar symptom during my experiments to make awstats.pl work under perl 5.14. See my post "AWSTats and Perl 5.14" https://sourceforge.net/p/awstats/discussion/43428/thread/9afcd0f2/#991c

    The essential thing was to add a caret character in a regular expression. This was necessary due to changes introduced in perl 5.14. These changes break the function "UnCompileRegexp". Therefore "OptimizeArray" which uses "UncompileRegexp" does not work properly, causing a variety of symptoms.

    Hope this helps

    Albrecht

     
  • alvin

    alvin - 2013-03-08

    Hello Albrecht,

    I found your post after I sent the email. I applied the fix to UnCompile() and that fixed the problem...mostly.

    Most of the bots are now detected, however, there seem to be a few that still fall into the Hosts listings. These are ones that do not have an agent string. I know they are bots because I have the IP address set to be resolved. The resolved IP address contains the bots name. For example:

    I see "baiduspider-123-125-71-41.crawl.baidu.com" in the hosts lists. The log has "123.125.71.41". I did a nslookup on the IP address and I get the entry above "baiduspider-....". Other similar entries have different IPs but ultimately resolve to a similar name "baiduspider-....". The same thing happens with the msnbot "msnbot-65-55-213-60.search.msn.com".

    Is there anyway to detect such entries and have them be classified as a robot?

    As for switching to 7.1, I would love too! I am just waiting for it to appear in the openSUSE repo: http://download.opensuse.org/repositories/network:/utilities/openSUSE_12.2/

    In case anyone else finds it useful, I have created a .patch file for Albrecht's fix to awstats.pl

     

Log in to post a comment.