I have one vhost with 12 GB (about 30 million lines) nginx logfile per day. As
it is ad-serving vhost, it has too many unique hosts. Running sort+uniq on
random day says it has 3 millions unique IPs. After piping through sed script
for stripping last part of IP (so I have only xxx.yyy.zzz.1 then) goes down to
700.000, but that is still too much for awstats. It processes about 3-7.000
lines per second, but stays at "Flush history file on disk (unique hosts reach
flush limit of 20000)" for many seconds or even minutes as the number of IPs
is rising. I am using GeoIP.
Is there any other way how to speed this up? I'm not interested in exact per-
IP stats, per country as per provider is OK, but I don't know how to reduce
the unique IPs anymore. Plain reduction (like making whole /23 to one IP)
might jump the IP to different provider and even country. And I don't know how
to easily obtain provider data from whois (so that I could replace each IP
with first IP of provider's subnet from whois).
Any ideas? Or maybe some awstats tuning? (I'm running awstats 6.7 from debian