There should be a persistent unique identifier for an instance of ZooKeeper. Currently, if you bring a cluster down without stopping clients and reinitialize the servers, the servers will start logging client zxid errors because the clients have seen a later transaction than the server has. In reality the clients should detect that they are now talking to a new instance of the database and close the session.
A similar problem occurs when a server fails in a cluster of three machines, and the other two machines are reinitialized and restarted. If the failed machine starts up again, there is a chance that the old machine may get elected leader (since it will have the highest zxid) and overwrite new data.
A unique random id should probably get generated when a new cluster comes up. (It is easy to detect since the zxid will be zero.) Leader Election and the Leader should validate that the peers have the same database id. Clients should also validate that they are talking to servers with the same database id during a session.
Log in to post a comment.