Thread: [Spock Proxy Devel] Different Shards on the same server
Status: Alpha
Brought to you by:
kaotao
From: steve <iam...@gm...> - 2008-10-23 04:19:07
|
Hi all, Spock Proxy looks very interesting! I have a couple of questions: My 'primary key' for partitioning is in two columns, not one. What to do? (Two is useful, by the way -- sometimes data is useful to a group of shards and you can put that in the first column, sort of a shard group id and then a shard id for that group and when combined, a primary key of sorts for a shard). How do you handle failure of a shard, or is that handled elsewhere? I'm looking at possibly moving to AWS/EC2, and was looking at scalr (http://code.google.com/p/scalr/) as a way to manage the spinning up of and auto-managing instances. It would be awesome if Spock Proxy were directly supported there. Looking over our queries I see that we do on occasion have three tables in a query. Often in the case of any join, one table is one in which to shard upon, and the other (two) are what you call 'universal' tables. By the look of the shard_database_directory, by having port_number in there, it looks like you support having multiple shards running on the same server. I am a fan of having a mysql master and mysql slaves on the same server. Background: In a basic setup of a single master and a single slave on there own servers, you might run into issues of slave lag since the mysql slave only has two threads (one each for io and for sql). If the master was sharded into four mysql instances, as was the slave, then the slave server would have eight threads running (4 IO and 4 SQL). A more ideal setup would be that the master and slave(s) would be mixed. In this case, two master shards on server A and two on server B and the same for the slaves. To avoid a certain amount of IO contention, I put each mysql server instance on a separate dirve/array on a particular machine. Cheaper than raid cards for the same/better performance. -s |
From: Frank F. <fr...@co...> - 2008-10-23 05:35:08
|
Steve, Great questions - see the answers below: On Oct 22, 2008, at 9:19 PM, steve wrote: > Hi all, > > Spock Proxy looks very interesting! I have a couple of questions: > > My 'primary key' for partitioning is in two columns, not one. What to > do? (Two is useful, by the way -- sometimes data is useful to a group > of shards and you can put that in the first column, sort of a shard > group id and then a shard id for that group and when combined, a > primary key of sorts for a shard). Currently we only shard by a single key (doesn't have to be primary or unique). I'm not sure if it is possible to work around this, I'll check and post further on it. > How do you handle failure of a shard, or is that handled elsewhere? It is handled elsewhere, and currently not quite a automatically as I'd like. Since we have two sets of shards (one master and one slave for each shard) we point both proxies at the single surviving DB. This works well but of course it will have less throughput. > I'm looking at possibly moving to AWS/EC2, and was looking at scalr > (http://code.google.com/p/scalr/) as a way to manage the spinning up > of and auto-managing instances. It would be awesome if Spock Proxy > were directly supported there. Indeed - we also use a lot of EC2 and have discussed things like this. But so far just talk. If we do anything I'll let you know. > Looking over our queries I see that we do on occasion have three > tables in a query. Often in the case of any join, one table is one in > which to shard upon, and the other (two) are what you call 'universal' > tables. There is no problem joining 2, 3, 4. 5, or more tables so long as you as joining so that all related data would be in the same shard (typically that would be on the shard key for sharded tables - non sharded tables, or 'universal' tables will always work). Even if the results come from more than one shard, which is typical, so long the data for any particular result row came from the same shard it will work. > By the look of the shard_database_directory, by having port_number in > there, it looks like you support having multiple shards running on the > same server. I am a fan of having a mysql master and mysql slaves on > the same server. Yes, it will work just as you expect. We do this in development but in production of course you'd loose the greater capacity that we did all this to get. > Background: In a basic setup of a single master and a single slave on > there own servers, you might run into issues of slave lag since the > mysql slave only has two threads (one each for io and for sql). If the > master was sharded into four mysql instances, as was the slave, then > the slave server would have eight threads running (4 IO and 4 SQL). A > more ideal setup would be that the master and slave(s) would be mixed. > In this case, two master shards on server A and two on server B and > the same for the slaves. To avoid a certain amount of IO contention, I > put each mysql server instance on a separate dirve/array on a > particular machine. Cheaper than raid cards for the same/better > performance. Very interesting, we have discussed doing something similar but simpler just one master and one slave on a single server. The issue is we use replication for the universal data and you don't want to get into a spot where you can't replicate (because MySQL will only support one master - at least for now). Also in out case we want the master DB to have all the machine resources (memory, disk IO, ...). Frank |
From: steve <iam...@gm...> - 2008-10-23 06:49:28
|
>> I'm looking at possibly moving to AWS/EC2, and was looking at scalr >> (http://code.google.com/p/scalr/) as a way to manage the spinning up >> of and auto-managing instances. It would be awesome if Spock Proxy >> were directly supported there. > > Indeed - we also use a lot of EC2 and have discussed things like this. But > so far just talk. If we do anything I'll let you know. Great. I just found the scalr project this week, and seems to be one of the only ones that is open source (I don't think RightScale is, for example). I would like to have shard migration from server to server, including having a shard run on a server already serving a shard. For either the mast or slave. I'm sure an algorithm could be created that did rebalancing, and could be plugged into scalr or similar to include creating (or terminating) new servers if the rebalancing required it. > Very interesting, we have discussed doing something similar but simpler > just one master and one slave on a single server. The issue is we use > replication for the universal data and you don't want to get into a spot > where you can't replicate (because MySQL will only support one master - at > least for now). Also in out case we want the master DB to have all the > machine resources (memory, disk IO, ...). I hear you on memory... but on IO I think that you find that you can ditch hardware raid for software mirrors and multiple mysql instances pointing to the different mirrors -- for orthogonal data. I don't know enough about EC2 to know how things will perform there though... But it is also helpful when mysql does not scale well to a lot of cores -- you can use CPU affinity for the master process (say 4 of 8 cores) and have four other mysql slaves running on the other 4 cores. This works well if you like many replicas of each shard. |