[Slashcode-general] db replication - permissions, is_alive, was_running and dbs_readerstatus

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Anyone have (or know of) a complete list of the required permissions  
for a the mysql-user's privileges (which each site's virtual user  
needs to use) when doing database replication w/ multiple slaves?

I ask because we ran into a bit of an issue. We added our 2nd slave.  
And the slave would never 'go live'. It's isalive='Yes' in the dbs  
table (as  'reader'). When we would query our slash-jabber-bot for  
database-status it'd say:

lag ? bog  0.0 STOPPED! - it_read2

I had a heck of a time figuring out how to get it to the 'unstopped'.

What I ended up figuring out was that it was a permissions problem.  
The site's VU's mysql user had relatively minimal permissions on the  
slave. We had it setup that way because, well, it's only ever doing  
select's on that server, the replication-user is doing everything else.

I started looking at permissions because of this:

mysql> select * from dbs_readerstatus order by ts DESC limit 4;
+---------------------+------+-----------+--------------- 
+-------------+----------------+----------------+----------- 
+------------+-------------------+
| ts                  | dbid | was_alive | was_reachable |  
was_running | slave_lag_secs | query_bog_secs | bog_rsqid |  
had_weight | had_weight_adjust |
+---------------------+------+-----------+--------------- 
+-------------+----------------+----------------+----------- 
+------------+-------------------+
| 2007-07-19 10:13:04 |    6 | yes       | yes           |  
no          |           NULL |              0 |      NULL |           
1 |               0.1 |
| 2007-07-19 10:13:04 |    2 | yes       | yes           |  
yes         |              0 |              0 |      NULL |           
1 |                 1 |
| 2007-07-19 10:12:59 |    6 | yes       | yes           |  
no          |           NULL |              0 |      NULL |           
1 |               0.1 |
| 2007-07-19 10:12:59 |    2 | yes       | yes           |  
yes         |              3 |              0 |      NULL |           
1 |                 1 |
+---------------------+------+-----------+--------------- 
+-------------+----------------+----------------+----------- 
+------------+-------------------+

(the id=6 is the new slave). It was never ending up with  
'was_running='yes''

which I knew was wrong, because I could ssh into the machine and  
attach to the processes and see that it was replicating just fine.

I ended up figuring out that the VU that is accessing the slave needs  
Process_priv.

ie

update user set Process_priv='Y' where User='foo';

Because it does a show full processlist, and it looks for the  
processes by the 'System user'. If the VU exec'ing that command  
doesn't have the priv, it won't see that, then the code will put an  
odd entry in your slashd logs;

	slashdLog("Process id $hr->{Id} on vu '$vu'"
						. " has unknown system user state"
						. " '$state'");

and you'll eventually get marked as stopped:

		# Determine the two big numbers we care about:  how far
		# behind its slave sql thread is, and what's its worst
		# repeated bogged-down query.

		# First check to be sure it has a slave sql thread running.
		my $slave_sql_id = $slave_sql_id{$vu};
		if (!$slave_sql_id) {
			# If not, its slave is stopped and we can't calculate
			# its lag.
			$reader_info->{$vu}{stopped} = 1;
		} else {
			# If so, pull the lag of that process in seconds.
			my $slave_sql_lag = $process{$vu}{$slave_sql_id}{Time};
			$reader_info->{$vu}{slave_lag_secs} = $slave_sql_lag;
		}

(I'm pulling all the code from the balance_readers.pl task in themes/ 
slashcode/tasks)

I did look through the docs before I attempted this. I couldn't find  
a section discussing the specifics of replication (it'd be helpful  
for the weighting system to be spelled out somewhere in the docs!)  
though it is talked about in slashguide.pod.

Thanks,
  Shane