I wondered why the stats program was so slow. At first I thought it was the way the report tables were processed, so I optimised that. This didn't make much difference (heh) so I profiled the code and discovered the real hot spots.
-- It turns out that the code spent most of the time parsing each line of the logfile for no purpose, since most lines have nothing to do with sqlgrey. So I added a cheap check with index() to decide whether it is worth calling parse_line(). This single change produces a dramatic reduction in the execution time.
-- A number of regular expressions were rewritten to avoid using character classes (which are slow).
-- The getopt definition had a bug in it which prevented the program from looking for an arbitrary service name in the syslog file.
-- A number of constructs were rewritten to eliminate the need for calling reverse();
-- The debug() function was rather costly regardless of whether debugging was enabled.
-- A qr// compiled regexp was used in place of the //o construct in split_date_time. It is also pointless to return an array of undefs. Returning nothing works just as well.
-- print_distribution was rewritten to avoid a lot of pointless intermediate sortings. Top n listings had the operands of a subtraction reversed.
If you include this patch, please feel free to credit me.
unified diff for sqlgrey-logstats.pl version 1.7.4