#348 Finish the slashdErrnote feature

Slash 2.3/2.4
open
Daemons (15)
7
2004-02-08
2004-02-08
Jamie McCarthy
No

I've committed code for a new table, slashd_errnotes, and a slashd
function which tasks can call to drop data into it.

The idea is that sometimes our tasks have a real problem that
admins need to correct, and admins never know about it unless we
scan slashd.log obsessively, which we don't. These problems must
be brought to our attention. In those situations, now, a task should
call slashdErrnote (probably in addition to slashdLog), which inserts
a row into the new table.

We now need three more things:

1. A task which runs every hour or something, which scans that
table for new errors, where "new" means "timestamp newer than
the contents of a var." Define the var, I don't care what you call it
but I suggest it starts with "slashd_". On each run, select NOW(),
then pull rows >= DATE_SUB($now, 'INTERVAL 1 HOUR'), then
write $now into the var.

I suggest the same admins who want adminmail should also get an
email containing the problems logged in the table.

I would also suggest that this task pull out the data "GROUP BY
taskname," and for each taskname found, only send the info from
the most recent logged error in the email. If freshenup has an
error that occurs on every run, it will drop a LOT of entries in the
table and we don't need to see them all -- just the most recent,
and a line saying "there are 59 more of these in the last hour."

This task should also expire old rows, where a var defines "old,"
and I would suggest it defaults to 90 days or something.

2. Start using it! Look at every call of slashdLog() and return() in
our current tasks, and if they are for an error condition that
admins should know about, call slashdErrnote. I did the first one,
in counthits.pl.

The "moreinfo" field is for when the error condition can't be
reported in the 255 chars of "errnote" -- so things like Data::
Dumper debug output should go in there. I would suggest that that
field DOES go into email, but truncated to something reasonable
(10K?).

3. Last and least, an admin.pl interface, so errors can be looked at
in more detail. But this can wait.

Discussion

  • Jonathan Pater
    Jonathan Pater
    2004-04-27

    Logged In: YES
    user_id=24936

    Part 1 (and a little of part 2) is now done, but every task
    could use a thorough going-over.