Justin Cranford - 2015-04-30

I am wondering how to deal with two specific split-brain edge cases. The setup is two servers each with Tomcat and MySQL. Each server is in a different data center, but they are close (same region) with low latency like a LAN.

Situation 1)

Sometimes I see the HA-JDBC cluster go from 2 active nodes to 0. Each HA-JDBC node simultaneously receives a deactivation message from the other node.

  • Tomcat 1 receives deactivate MySQL 1 from Tomcat 2
  • Tomcat 2 receives deactivate MySQL 2 from Tomcat 1

I thought HA-JDBC prevents deactivation of last node, even if temporarily unavailable. Is this a bug? Currently the only way to recover is stop both Tomcats, clear the DB state, re-synchronize databases manually, and then restart the Tomcat cluster.

Is it possible for me to implement a listener to intervene and prevent or recover from this situation at run-time? For example, could I intervene and have Tomcat 1 verify MySQL 1 is down before accepting the deactivate MySQL 1 message from Tomcat 2?

Situation 2)

Sometimes the link between nodes goes down. It might go down for 30 minutes. If HA-JDBC only had database 1 active, then Tomcat 1 is OK to continue during this outage. However, Tomcat 2 cannot reach MySQL 1, and it will not attempt to connect to MySQL 2.

In my application (i.e. an SSH/RDP gateway), running Tomcat 2 with an out-of-date MySQL 2 is better than not running at all (i.e. business continuity). I am wondering if I can implement a workaround to temporarily activate database 2, deactivate database 1, and let Tomcat 2 keep running.

My concern is what happens when the link comes back up, and the nodes need to merge from split-brain. Will cluster state from Tomcat 1 win because it is older than cluster state on Tomcat 2? If not, can I customize the merge from split-brain algorithm?

Thank you

 

Last edit: Justin Cranford 2015-04-30