Menu

Interpretation of SARG reporting

sarg
2008-11-14
2015-04-10
  • Andrew Weisz

    Andrew Weisz - 2008-11-14

    Hi,

    I have successfully installed the latest version of SARG (v2.2.5) on Red Hat Enterprise Linux 3 and can generate daily, weekly, and monthly reports on the squid access logfiles.

    The squid in production is currently using user-based authentication for access (authenticates users to an Active Directory Server).

    I have a question on the interpretation (or perhaps even the correctness) of the report though...

    When I drill down to some users to view their internet activity, why does the average...
    a) elapsed time exceed the total time spent
    b) Milisec time exceeds total milisec
    c) % time exceed total %

    Please advise (and apologies if this question has been asked... I haven't found anyone to post a similar issue).

    Thanks.

     
  • asa

    asa - 2015-04-07

    Hi Frederic,

    On this same note I am still not sure how to interpret the report. looks all munbo jumbo to me. take this example the "USERID" column has "tcp_miss/200" why is that..

     
  • Frederic Marchal

    The decoding failed.

    Can you send the access.log to me please (fmarchal at users.sourceforge.net). Make sure it is the one you used to generate the report and not a fresh one created by LogRotate or similar.

    The file may be corrupted or it may contain unexpected data. I'll see which one it is when I see the file.

    I also see that the report date is still in 1970. Wasn't that problem solved in this thread https://sourceforge.net/p/sarg/discussion/363374/thread/ae468602/?

     
  • asa

    asa - 2015-04-07

    Here we go...access.log

     
  • asa

    asa - 2015-04-07

    Yes Frederic, I have still not found a way to change the FILE/PERIOD

    My latest attempt is as follows:

    [root@hdc505lbwsov001 tmp]# ls -lrt /var/log/squid/access.log
    -rw-r----- 1 squid squid 24563139 Mar 24 17:06 /var/log/squid/access.log-20150324.gz
    -rw-r----- 1 squid squid 2720629 Mar 25 19:22 /var/log/squid/access.log-20150325.gz
    -rw-r----- 1 squid squid 1620489 Mar 26 03:07 /var/log/squid/access.log-20150326.gz
    -rw-r----- 1 squid squid 19540459 Mar 30 17:20 /var/log/squid/access.log.2.gz
    -rw-r----- 1 squid squid 1959846 Mar 30 18:29 /var/log/squid/access.log.1.gz
    -rw------- 1 squid squid 502119561 Apr 7 17:24 /var/log/squid/access.log
    [root@hdc505lbwsov001 tmp]# sarg -d 30/03/2015-06/04/2015 -l /var/log/squid/access.log
    -o /var/www/html/sarg-reports
    SARG: Records in file: 3133495, reading: 100.00%
    SARG: Decompressing log file "/var/log/squid/access.log.1.gz" with zcat
    SARG: Ignoring old log file /var/log/squid/access.log-20150324.gz
    SARG: Ignoring old log file /var/log/squid/access.log-20150325.gz
    SARG: Ignoring old log file /var/log/squid/access.log-20150326.gz
    SARG: Decompressing log file "/var/log/squid/access.log.2.gz" with zcat
    SARG: No records found
    SARG: End
    [root@hdc505lbwsov001 tmp]#

     
  • Frederic Marchal

    That last access.log file is perfectly fine and produces a valid report with the correct date if you apply the solution I described in https://sourceforge.net/p/sarg/discussion/363374/thread/ae468602/#8e97

    I simply ran sarg like this:

    sed -n -e 's/^20[0-9][0-9].* (squid): //p' ../log/access-asa3.log | sarg -x -z -
    

    The output shows that nearly all the records are read and understood:

    SARG: SARG version: 2.3.10 Feb-15-2015
    SARG: Sarg compiled to report warnings if the output is inconsistent
    SARG: Reading access log file: from stdin
    SARG:    Records read: 3096410, written: 3096402, excluded: 0
    SARG: Squid log format
    SARG: (info) date=07/04/2015
    SARG: (info) period=30 Mar 2015-07 Apr 2015
    SARG: Period: 30 Mar 2015-07 Apr 2015
    

    I have yet to find what the eight ignored records are but the report looks as it should:

    In your case, as explained in the post linked above, you have to edit the cron job running sarg (or whatever you use to generate the reports) and invoke sarg like this to produce the daily report:

    sed -n -e 's/^20[0-9][0-9].* (squid): //p' /var/log/squid/access.log | /usr/bin/sarg -d day-1 -
    

    As sed can't uncompress access.log.1.gz and access.log.2.gz, they have to be left out. That's the reason the above command only read access.log alone.

    If you need to process the rotated files, you have to configure LogRotate to rotate access.log without compressing it (remove the "compress" command in the logrotate file rotating access.log away).

     
  • asa

    asa - 2015-04-09

    Frederic, you have been great help/resource. Pointed me in the right direction...

    Finally managed to get the reports in the correct format.

    Last thing is there a full document I can use for my presentation to my management.. I mean one explaining the fields in the report.

    very appreciate your help and patience...

    asa

     
  • Frederic Marchal

    Unfortunately, sarg documentation is severely lacking. Fortunately, most fields are straightforward.

    The only field that can be misunderstood is the elapsed time. It's not the time spent by the user on the web. That information is impossible to deduce from the logs.

    The elapsed time is the time the proxy spent serving the user's requests. The elapsed time tells how long the proxy worked (or waited) to retrieve the pages requested by the user.

    That time might add up to more than 24h a day because browsers usually send many simultaneous requests (to read the images, css, javascript,...) or several programs can access the web without any user action (such as a virus killer update, a Windows update, a mail client checking for new mail and so on). All of them add up individually to make the total elapsed time reported by sarg.

    From a manager's point of view, that definition of the elapsed time is useless but it is impossible to know how much time the employee was unproductively surfing the web. The reason is that the proxy and sarg don't know what the user did with the downloaded data. He/she may have done nothing at all (in the case of an automatic update running in the background), the user may have discarded the page after a few seconds or read it for 30 minutes. In the worst case, the user may have downloaded a HTML5 or flash game and played with it for hours. We just don't know.

    Other fields are briefly described in this post: https://sourceforge.net/p/sarg/discussion/363374/thread/c764030c/#afa9.

    Feel free to ask more questions if something is unclear but please start a new thread to make the information easier to find by others.

     

Log in to post a comment.

MongoDB Logo MongoDB