speed with many unique hosts

  • Marki

    I have one vhost with 12 GB (about 30 million lines) nginx logfile per day. As
    it is ad-serving vhost, it has too many unique hosts. Running sort+uniq on
    random day says it has 3 millions unique IPs. After piping through sed script
    for stripping last part of IP (so I have only xxx.yyy.zzz.1 then) goes down to
    700.000, but that is still too much for awstats. It processes about 3-7.000
    lines per second, but stays at "Flush history file on disk (unique hosts reach
    flush limit of 20000)" for many seconds or even minutes as the number of IPs
    is rising. I am using GeoIP.

    Is there any other way how to speed this up? I'm not interested in exact per-
    IP stats, per country as per provider is OK, but I don't know how to reduce
    the unique IPs anymore. Plain reduction (like making whole /23 to one IP)
    might jump the IP to different provider and even country. And I don't know how
    to easily obtain provider data from whois (so that I could replace each IP
    with first IP of provider's subnet from whois).

    Any ideas? Or maybe some awstats tuning? (I'm running awstats 6.7 from debian