#1 speed up sqlgrey-logstats.pl

David Landgren

I wondered why the stats program was so slow. At first I thought it was the way the report tables were processed, so I optimised that. This didn't make much difference (heh) so I profiled the code and discovered the real hot spots.

-- It turns out that the code spent most of the time parsing each line of the logfile for no purpose, since most lines have nothing to do with sqlgrey. So I added a cheap check with index() to decide whether it is worth calling parse_line(). This single change produces a dramatic reduction in the execution time.

-- A number of regular expressions were rewritten to avoid using character classes (which are slow).

-- The getopt definition had a bug in it which prevented the program from looking for an arbitrary service name in the syslog file.

-- A number of constructs were rewritten to eliminate the need for calling reverse();

-- The debug() function was rather costly regardless of whether debugging was enabled.

-- A qr// compiled regexp was used in place of the //o construct in split_date_time. It is also pointless to return an array of undefs. Returning nothing works just as well.

-- print_distribution was rewritten to avoid a lot of pointless intermediate sortings. Top n listings had the operands of a subtraction reversed.

If you include this patch, please feel free to credit me.


  • David Landgren
    David Landgren

    unified diff for sqlgrey-logstats.pl version 1.7.4