Menu

#160 Useragent report in 2.3.4

v1.0_(example)
open
nobody
5
2015-06-13
2015-05-25
No

Hello!
I'm using sarg-2.3.4 on CentOS-6.6 x86-64. The rotation of logs is done everyday. I'm trying to make a report with useragent information.

Case 1.
The option useragent_log is not used in conf file.
Command

sarg -x -o /var/www/html/sarg/daily -d day-1 -l /var/log/squid/access.log-$(date +%Y%m%d).gz -b /var/log/squid/useragent.log-$(date +%Y%m%d).gz

Output:

SARG: Deleting temporary directory "/tmp/sarg"
SARG: Parameters:
SARG:           Hostname or IP address (-a) =
SARG:                    Useragent log (-b) = /var/log/squid/useragent.log-20150525.gz
SARG:                     Exclude file (-c) =
SARG:                  Date from-until (-d) = 24/05/2015-24/05/2015
SARG:    Email address to send reports (-e) =
SARG:                      Config file (-f) = /etc/sarg/sarg.conf
SARG:                      Date format (-g) = USA (mm/dd/yyyy)
SARG:                        IP report (-i) = No
SARG:             Keep temporary files (-k) = No
SARG:                        Input log (-l) = /var/log/squid/access.log-20150525.gz
SARG:               Resolve IP Address (-n) = Yes
SARG:                       Output dir (-o) = /var/www/html/sarg/daily/
SARG: Use Ip Address instead of userid (-p) = No
SARG:                    Accessed site (-s) =
SARG:                             Time (-t) =
SARG:                             User (-u) =
SARG:                    Temporary dir (-w) = /tmp/sarg
SARG:                   Debug messages (-x) = Yes
SARG:                 Process messages (-z) = No
SARG:  Previous reports to keep (--lastlog) = 20
SARG:
SARG: sarg version: 2.3.4 Jan-05-2013
SARG: Decompressing log file "/var/log/squid/access.log-20150525.gz" with zcat
SARG: Reading access log file: /var/log/squid/access.log-20150525.gz
SARG:    Records read: 781456, written: 401915, excluded: 373415
SARG: Squid log format
SARG: Period covered by log files: 24/05/2015-24/05/2015
SARG: Period: 2015 Май 24
SARG: Sorting log /tmp/sarg/iivanov.user_unsort
SARG: Making file: /tmp/sarg/iivanov
SARG: Sorting log /tmp/sarg/apetrov.user_unsort
SARG: Making file: /tmp/sarg/apetrov
...............
SARG: Sorting file: /tmp/sarg/192_168_1_141.utmp
SARG: Making report: 192.168.1.141
SARG: Sorting file: /tmp/sarg/192_168_1_164.utmp
SARG: Making report: 192.168.1.164
SARG: Making index.html
SARG: Purging temporary file sarg-general
SARG: End

and there is no useragent information in yesterday's statistics.

Case 2.

In config file useragent_log option is used:
useragent_log /var/log/squid/useragent.log

Command

sarg -x -o /var/www/html/sarg/daily -d day-1 -l /var/log/squid/access.log-$(date +%Y%m%d).gz -b /var/log/squid/useragent.log-$(date +%Y%m%d).gz

Output

SARG: Parameters:
SARG:           Hostname or IP address (-a) =
SARG:                    Useragent log (-b) = **/var/log/squid/useragent.log-20150525.gz**
SARG:                     Exclude file (-c) =
SARG:                  Date from-until (-d) = 24/05/2015-24/05/2015
SARG:    Email address to send reports (-e) =
SARG:                      Config file (-f) = /etc/sarg/sarg.conf
SARG:                      Date format (-g) = USA (mm/dd/yyyy)
SARG:                        IP report (-i) = No
SARG:             Keep temporary files (-k) = No
SARG:                        Input log (-l) = /var/log/squid/access.log-20150525.gz
SARG:               Resolve IP Address (-n) = Yes
SARG:                       Output dir (-o) = /var/www/html/sarg/daily/
SARG: Use Ip Address instead of userid (-p) = No
SARG:                    Accessed site (-s) =
SARG:                             Time (-t) =
SARG:                             User (-u) =
SARG:                    Temporary dir (-w) = /tmp/sarg
SARG:                   Debug messages (-x) = Yes
SARG:                 Process messages (-z) = No
SARG:  Previous reports to keep (--lastlog) = 20
SARG:
SARG: sarg version: 2.3.4 Jan-05-2013
SARG: Decompressing log file "/var/log/squid/access.log-20150525.gz" with zcat
SARG: Reading access log file: /var/log/squid/access.log-20150525.gz
SARG:    Records read: 781456, written: 401915, excluded: 373415
SARG: Squid log format
SARG: Period covered by log files: 24/05/2015-24/05/2015
SARG: Period: 2015 Май 24
**SARG: Reading useragent log: /var/log/squid/useragent.log**
**SARG:    Records read: 455840**
**SARG: Sorting file: /tmp/sarg/squagent.int_log**
**SARG: Making Useragent report**
ARG: Sorting log /tmp/sarg/iivanov.user_unsort
SARG: Making file: /tmp/sarg/iivanov
SARG: Sorting log /tmp/sarg/apetrov.user_unsort
SARG: Making file: /tmp/sarg/apetrov
...............
SARG: Sorting file: /tmp/sarg/192_168_1_141.utmp
SARG: Making report: 192.168.1.141
SARG: Sorting file: /tmp/sarg/192_168_1_164.utmp
SARG: Making report: 192.168.1.164
SARG: Making index.html
SARG: Purging temporary file sarg-general
SARG: End

Sarg seems like ignoring -b option. I can get useragent information only when option useragent_log in config file is turned on and it points to a certain file (no '*' can be used). One way to get useragent information is to turn off rotation of /var/log/squid/useragent.log. But this file can grow very huge and that will slow down report making.
Is it a bug?

Discussion

  • Frederic Marchal

    Thanks for reporting this bug. It has been around since the beginning of the commit log. I guess it isn't a much used feature :-)

    The patch is one line long [4b64c31617179fefa5452ecfef4deb418c83b03d] in case you want to backport it.

     

    Related

    Commit: [4b64c3]

  • Evgeniy Yakushev

    There is one more thing with this parameter.

    If I run

    sarg -x -l /var/log/squid/access.log-$(date +%Y%m%d).gz -b /var/log/squid/useragent.log-$(date +%Y%m%d).gz -o /var/www/html/sarg/daily-d day-1
    

    I get this output:

    ...
    SARG:                    Useragent log (-b) = /var/log/squid/useragent.log-20150525.gz
    ...
    SARG:                        Input log (-l) = /var/log/squid/access.log-20150525.gz
    ...
    

    Okay. Now if I run

    sarg -x -l /var/log/squid/access.log* -b /var/log/squid/useragent.log*
    

    the output looks like this:

    ...
    SARG:                    Useragent log (-b) = /var/log/squid/useragent.log-20150520.gz
    ...
    SARG:                        Input log (-l) = /var/log/squid/access.log
    SARG:                        Input log (-l) = /var/log/squid/access.log-20150501.gz
    SARG:                        Input log (-l) = /var/log/squid/access.log-20150520.gz
    SARG:                        Input log (-l) = /var/log/squid/access.log-20150521.gz
    SARG:                        Input log (-l) = /var/log/squid/access.log-20150522.gz
    SARG:                        Input log (-l) = /var/log/squid/access.log-20150523.gz
    SARG:                        Input log (-l) = /var/log/squid/access.log-20150524.gz
    SARG:                        Input log (-l) = /var/log/squid/access.log-20150525.gz
    SARG:                        Input log (-l) = /var/log/squid/useragent.log-20150521.gz
    SARG:                        Input log (-l) = /var/log/squid/useragent.log-20150522.gz
    SARG:                        Input log (-l) = /var/log/squid/useragent.log-20150523.gz
    SARG:                        Input log (-l) = /var/log/squid/useragent.log-20150524.gz
    ....
    

    It seems like only one file goes to useragent logfile (-b), and the others go to access logfiles (-l). And report breaks with this error:

    SARG: getword_atoll loop detected after 0 bytes.
    SARG: Line="Microsoft-CryptoAPI/6.1""
    SARG: Record="Microsoft-CryptoAPI/6.1""
    SARG: searching for 'x2f'
    SARG: getword backtrace:
    SARG: 1:sarg() [0x406367]
    SARG: 2:sarg() [0x4067b9]
    SARG: 3:sarg() [0x40d15f]
    SARG: 4:/lib64/libc.so.6(__libc_start_main+0xfd) [0x7f2e71696d5d]
    SARG: 5:sarg() [0x402a39]
    SARG: Wrong date format in /var/log/squid/useragent.log-20150520.gz
    
     
  • Frederic Marchal

    It can't be avoided. The getopt library (responsible for parsing the command line options as required by the POSIX standard) reads only one file name after an option requiring a file name. Every additional file name is taken as a non-option.

    In this case, the shell where you type the command replaces "/var/log/squid/access.log" and "/var/log/squid/useragent.log" by the matching file names before sarg is even started. The result is that sarg really sees the call like this (assuming only two access.log and two useragent.log to shorten the example):

    sarg -x -l /var/log/squid/access.log /var/log/squid/access.log-20150501.gz -b /var/log/squid/useragent.log /var/log/squid/useragent.log-20150521.gz
    

    When parsed by getopt, the options are returned in that order:

    • -x
    • -l /var/log/squid/access.log
    • -b /var/log/squid/useragent.log
    • /var/log/squid/access.log-20150501.gz
    • /var/log/squid/useragent.log-20150521.gz

    Due to the option reordering required by POSIX, the non-option files are at the end and it is not possible to know that /var/log/squid/useragent.log-20150521.gz was linked to the -b option.

    Sarg 2.2 and earlier would have ignored the two lone file names but, to make it easier to process rotated access.log file, sarg 2.3 and later accepts a file name without option as an alias to -l.

    You can therefore write a much simpler cron job simply calling

    sarg -d day-1 /var/log/squid/access.log*
    

    It automatically takes every rotated access.log file into account.

     
  • Evgeniy Yakushev

    So in (-b) parameter I can specify only one file.
    Can it be .gz file? Recently I had an error:

    SARG: Reading useragent log: /var/log/squid/useragent.log-20150526.gz
    SARG: getword loop detected after 4 bytes.
    SARG: Line="▒▒▒-▒▒+▒▒▒▒+$▒[▒{6�▒r▒E▒▒cܑ,\ѧ;=▒▒▒▒▒9▒/H'ƗV▒▒▒m▒7k▒▒ʹ▒D▒▒▒v▒▒d▒▒▒?)4▒r~▒x▒▒󁛔p▒y▒▒▒r▒,▒▒Aڇ#▒&l▒▒9
    "
    SARG: Record="▒▒▒-▒▒+▒▒▒▒+$▒[▒{6�▒r▒E▒▒cܑ,\ѧ;=▒▒▒▒▒9▒/H'ƗV▒▒▒m▒7k▒▒ʹ▒D▒▒▒v▒▒d▒▒▒?)4▒r~▒x▒▒󁛔p▒y▒▒▒r▒,▒▒Aڇ#▒&l▒▒9
    "
    SARG: searching for 'x2f'
    SARG: getword backtrace:
    SARG: 1:sarg() [0x406367]
    SARG: 2:sarg() [0x406cca]
    SARG: 3:sarg() [0x41aeaf]
    SARG: 4:sarg() [0x40ff58]
    SARG: 5:sarg() [0x40e05c]
    SARG: Maybe you have a broken date in your /var/log/squid/useragent.log-20150526.gz file
    

    Should it be unpacked?

     

    Last edit: Evgeniy Yakushev 2015-05-26
  • Frederic Marchal

    Unfortunately, the answer is yes to all the questions.

    The useragent is a very old feature that was completely overlooked after its initial development. It never benefited from the improvements made to the access.log.

    There can only be one useragent log on the command line or in sarg.conf. The command line takes precedence over sarg.conf.

    The file cannot be compressed.

    I'll try to improve that with the next version. I'll leave this bug open as a reminder that the useragent log feature is lacking.

     
  • Frederic Marchal

    I uploaded patch [137eb6] to the master branch. With this change, sarg accepts several user agent log files.

    Both command line option -b and configuration option useragent_log can be repeated as many time as necessary (-b takes precedence over useragent_log).

    It is still not possible to use wildcards or shell globing in the file name.

    Compressed files are not yet supported.

    Unfortunately, I have only one useragent.log file at my disposal. That's not enough to test this feature. If someone can test it for me, please report any success or failure!

     

    Related

    Commit: [137eb6]

  • Evgeniy Yakushev

    I've tested. Works fine, thank you!
    Is it possible to make support for compressed files?

     
  • Frederic Marchal

    I'm working on it during my spare time. I may have it ready for next week or, at least, before July 2015" :-)

     
  • Evgeniy Yakushev

    That's great! Waiting for a new release!
    Can you also make an option to cut off a domain name from users authenticated by Kerberos? Is it possible?

     
  • Frederic Marchal

    I just committed a set of changes to read gzipped and bzipped useragent files.

    I completely rewrote the decompression functions. I dropped the old and outdated Z "compress" algorithm.

    To benefit from the gz and bz2 code, the zlib and bzlib development packages must be installed on the system where sarg is built. If one of them is missing, the configure script will disable the corresponding decompression functions.

    The big advantage is that sarg doesn't need zcat and bzcat to be in the path. It will handle the compressed log file itself.

     

Log in to post a comment.

MongoDB Logo MongoDB