A "SHOW PROCESSLIST" on a slave DB tells us how many
seconds behind that DB is on replication, in the Time
column of the Slave SQL thread. E.g., on one of our slaves:
| Id | User | Host | db | Command | Time
| State | Info |
| 8 | system user | localhost | NULL | Connect | 1
| Slave: waiting for binlog update | NULL |
This slave is 1 second behind the master.
I would suggest that a new task be written which runs
every minute and sees how far behind each slave DB is
(readers and log_slaves). Store this info in a new
daily table. Then adminmail.pl would find the median,
90th-, and 99th-percentile values for each slave, and
write that into stats_daily (and then purge the
previous day's values from the new daily table of course).
If we get a baseline now on how well this is working,
then as we change to new DB machines, as load changes,
and as code changes, we can see the effects.
Log in to post a comment.