Re: [Spock Proxy Devel] Feature Requests

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

On Jun 10, 2009, at 3:16 AM, Graham Finlayson wrote:

> First of all, thanks for the development that has gone into this, as  
> it looks to be just what I was looking for. I have a couple of  
> suggestions which I think would be useful additions
>
> 1. I want to create shards based on a character database key. E.g.  
> All users from 'a' to 'b' are stored in shard1. The current  
> implementation only allows the sharding key to be an integer value.  
> I can easily 'hack' this in by mapping the first character of the  
> key to an integer value, but it would be nicer to have this properly  
> supported. Are there any plans to do this?

This is less useful than you might think.  Consider a schema with a  
'users' table sharded as you describe, name starting with 'a' or 'b'  
in shard 1, 'c' or 'd' in shard 2 and so on.  This works well for the  
users table, but only that table.  If we have a 'payment' table which  
records each user payment that payment table must have the shard key  
(user_name) in it or the proxy cannot know where to insert a new  
payment row.  That is the same for all the other tables; you would do  
much better to shard on user_id and index users.name, users.id so that  
the query to lookup a users id from a name is quick.  Now it will be  
fast to either join this to other tables or to retrieve the users.id  
and query using this value.

There is another aspect about this where I would be very cautious - I  
once worked for a library who had a reservation system where they  
would store your requested material by last name. Turned out there  
that 'm' was almost 1/3 of their customers.  I don't have any kind of  
metrics of how typical this is - perhaps they had a large Scottish  
population in their customer base but my point is that it can be hard  
to balance your shards when your key is an attribute that you don't  
control.  An arbitrary key such as an id column is easier to manipulate.

> 2. I am looking to use SpockProxy in conjunction with the MySQL  
> Master Master Replication Monitor, where all the Shards are Master/ 
> Master Active/Passive MySQL servers (MMM). MMM can define a  
> writerRole, to write to the currently active MySQL server, and two  
> readerRoles to allow reads to be done on both MySQL servers.  
> SpockProxy supports this out of the box if you configure the  
> host_name to be the MMM writerRole. However, this means that the  
> Passive MySQL server sits fairly idle, which is a waste of resources  
> in my mind. Therefore, a useful extention, I think, to SpockProxy  
> would be to allow the readerRoles to be configured, and have  
> SpockProxy round robin any "read" requests between the Active and  
> Passive MySQL servers to balance the load a little when both servers  
> are active.

My thought is that Spockproxy should become a plugin as MySQL proxy  
starts to support plugins (shortly after version 1 according to Jan  
Kneschke).  Originally I thought this would be very complex but in  
writing this answer I'm thinking it might not be too horrible.  Let's  
say we have some number of shards (say 4) and the same number of read  
slaves for a total of 8 databases - and this is a simple case it gets  
more complex if you have several  sets of read slaves.

When a query comes in the Spockproxy / Replication Monitor will have  
to follow some logic like this:

Given the pool of all the data servers (except the universal DB which  
is only ever used for universal writes)

Query contains a shard key?
	YES -> Reduce the pool of available DB servers to only those serving  
the one needed shard
	NO -> do nothing
	Query is READ or WRITE?
		READ - choose the least busy server(s) from the pool one for each  
shard in the pool
		WRITE - Are you writing UNIVERSAL or SHARDED data?
			UNIVERSAL - choose the one universal db (this is never in the pool)
			SHARDED - choose the active master server(s) from the pool one for  
each shard in the pool

The issues is Spock proxy does not have any of the Replication Monitor  
features and as it is we've probably removed some vital components to  
implement this in our code base (remember we're a fork off of 0.6).   
This is why I mentioned the plug in as a goal; so folks could mix and  
match features in order to create a solution to meet their specific  
needs.

In the short term you might run both in series - for each shard pair  
run an MMM and present these to the Spockproxy as the database  
shards.  That ought to work (let us know if it does).

> I am willing to help with any implementation/testing of these  
> features, but would want to make sure that the design of these  
> features would be acceptable before I proceed.

We are not doing much development on Spockproxy right now; as I said  
my thought was to move into a simpler plugin but feel the freedom if  
you like or as MySQL proxy supports plugins help us then move to a  
plug in.

Frank