From: Lionel B. <lio...@bo...> - 2004-12-15 13:33:29
|
Farkas Levente wrote the following on 12/15/04 13:04 : > > - my first assumption if one of the mx can't access to one sql server, > then none can do it. otherwise it's a real strange thing in case we > can accept no greylisting just dunno. This is the root of the problem, this assumption is incorrect. There are several cases where it can happen : - temporary network link failure (cable unplugged, hardware failing then resynchronising) : temporary split of your network, - SQLgrey automatically reconnects after an error, so if you take the RW database down for a short time, some SQLgrey instances will have to access the database and some not during this short time. The former ones won't be able to reconnect to the database they were using so they will look for another. The latter ones *will* be able to reconnect to the database. > - try to use replication between sql servers. You have to be more precise on this, there are very different implementations of replication between databases, from the simple dump to file/reload to the Oracle cluster. Each one comes with its advantages and limitations, the one you will use will change what the applications using the database pool can/cannot do with it. > - allow write to the slave to and when the master wake up then > replicate back the data too. This won't work : your slave could be used at any moment by a SQLgrey which for whatever reason couldn't contact your master : you'll corrupt your data. > - in my case actualy there is no master and slave just there is two > sql server with the same database (or almost the same and there are > certain point when they are syncing) and there is always one which is > rw by all greylist server (first). > > imho the greylist database is not so complicated. it's easy to > recognize which records should have to replicate. only old/expired > record have to delete and always the last updated one is the latest > and all record has timestemp (because that's the main purpose the > database) so it's easy to know which is the last updated. > >> Here are simple questions to make sure we speak of the same things. >> Do you agree with the following statements ? >> - one and only one sql server should accept writes from every SQLgrey >> instances. Let's call it the RW server (read-write). > > > no. both, but all greylist server rw one of them at the same time. > Won't work as explained above. You can't be sure one SQLgrey instance won't fail to contact the database you chose as a master while others will. There's no point discussing the rest until you understand this. Reliable database failover is *hard*, please take the time to understand these hard facts : - there's no affordable database system that allows multiple replicated read/write database on the market *yet* (only commercial databases in the hundreds of thousands euros/dollars range allow this and they even have limitations), you can only bet on master/slaves schemes, - when using master/slave schemes you *can't* write directly to the slaves you must use one and only one database in read/write mode for *every* SQL client accessing the database pool, - you cannot prevent the case where one instance among a pool of SQL clients won't be able to contact the "master" server and only this one. Seriously, what do you find wrong with a take over IP solution ? Reminder : slave replication in place, master fail, admin scripts detect the failure, take IP down on the master's interface and set up the same IP on the slave, switching it to read-write mode if needed (depends on how the replication work, it might or might not need to put the slave database in read-only mode). This is easily workable as it ensures you can't access 2 databases at the same time and SQLgrey will make the take over IP process transparent as it will automatically reconnect to the server replacing the failing one. Best regards, Lionel. |