We are using Awstats for 30 domains. So we have 30 Awstats config file and updating automatically.To process one day log it took nearly 8 to 10 hrs because our log file is very large.Unfortunately each and month we are missing one day or two day stats in awstats report.We don't have jun 3 stats but Awstats already build jun 30 2009 , but 3-jun-2009 was messing in the awstats report. I want to update the date 3-jun only. I know that the biggest limitation of awstats is it does not does.removing the current month Awstats data base and updating from june 1 to 30 again for 30 domain will take too much time. Here i am trying to get the missing day in different way,it processing the day but does not give expected result(gives low visitors). what i am doing is just I am modifying the "Last Line" and "Last Time" in awstats data base file to 2-jun-2009 "Last Line" and "Last Time" . What exactly i am doing is first I restore jun 30th "Last Line" and "Last Time" and manually getting "Last Line" and "Last Time" value in jun 2 log file and updating the values in "Last Line" and "Last Time" then updating 2-jun 2009 .It is updating but it does not give expected values.Any suggestion will appreciate.
Thanks & Rg
While such a tool is definitely possible, there are some major issues to overcome. Mainly data structure related. I have been playing around with this myself, thinking about merging all our AWStats data for a specific month (one file per site) into one big "global view".
The main problem came when trying to merge 15 million unique(?) visitors...
In order to get anything capable of running somewhat fast, you would basically need to keep it all in memory. You need to find the latest visit, and total number of visits, for each unique IP/host. Then you also need to locate the most recent X ones, and put them at the top. What happened for me was that I basically ran out of memory.
To me, the solution to this problem, and I have thought of implementing it (today I just ignore it if a day is missing - takes me 12 hours to update one day of logs), is to store the data in one-file-per-day. I think AWStats already supports this(?), in which case I would just have to accept having to deal with thousands of files per month... :)
You wrote: "The main problem came when trying to merge 15 million unique(?) visitors...
In order to get anything capable of running somewhat fast, you would basically need to keep it all in memory. "
I think this problem can be solved using clever sorting algorithms that are fast even if the data do not fit into available memory. The basic idea: Sort as much as you can in memory and write the results to temporary files. Then merge the already sorted files.
If you figure this out, please post your solution here.
I'm have a similar problem. Our IIS logs are 80-100 MB an hour, per server in our cluster, which is quite time consuming to reprocess a whole month's worth.
I think it could be possible to write a utility that takes two AWStats database files and merges them into a third file in the same format. Using this utiility the following approach were possible:
a) Move the incomplete database to a save place.
b) Process the missing log files only. This gives a new AWStats database.
c) Use the proposed utility to merge the incomplete database with the new database.
d) Put the resulting database file to the place where AWStats expects it. AWStats should show the complete statistics.
Would such a utility solve your problem?
Log in to post a comment.
Sign up for the SourceForge newsletter:
You seem to have CSS turned off.
Please don't fill out this field.