HA-JDBC / Discussion / Help: DatabaseClusterImpl.activate() hangs if called on non-primary application node

Justin Cranford - 2015-11-03

I ran into this problem on HA-JDBC 3.0.4-SNAPSHOT. It is running on Tomcat 7u62, Java 7u80 x32, MySQL 5.1.58, and Linux. Each server runs a copy of Tomcat and MySQL.

Steps to reproduce:
1) Start MySQL on each server.
2) Start Tomcat on server 1. HA-JDBC starts with active MySQL on server 1 and server 2.
3) Start Tomcat on server 2. HA-JDBC queries state from Tomcat 1, both MySQL servers still active.
4) Trigger deactivation of MySQL on server 2 (ex: /etc/init.d/mysql restart)
5) Invoke DatabaseClusterImpl.activate(String databaseId,String strategyId)

If I execute step 5) on Tomcat 1, HA-JDBC successfully reactivates MySQL using truncate/insert re-synchronization strategy. However, if I go through all the starts and execute step 5) on Tomcat 2, HA-JDBC hangs indefinitely.

I can provide FINE logging from HA-JDBC and JGroups.

Last edit: Justin Cranford 2015-11-03

hajdbc-tomcat1-succeeded.log

hajdbc-tomcat2-hung.log

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Justin Cranford - 2015-11-03

I noticed the writeLock() messages are not balanced if initiating from Tomcat 1 (coordinator) versus Tomcat 2 (non-coordinator). I am wondering if there is a problem with my JGroups or HA-JDBC config.

a) If initiating reactivate() from Tomcat 1 (coordinator), the writeLock() only appears on Tomcat 2 (non-coordinator). There are no writeLock() messages on Tomcat 1. This was a successful reactivation.

Tomcat 1

<no writeLock="" messages="">

Tomcat 2

Nov 03, 2015 8:15:02 PM net.sf.hajdbc.logging.slf4j.SLF4JLogger log
FINE: Received AcquireLockCommand(writeLock()) from 10.0.0.214
Nov 03, 2015 8:15:10 PM net.sf.hajdbc.logging.slf4j.SLF4JLogger log
FINE: Received ReleaseLockCommand(writeLock()) from 10.0.0.214

b) If initiating reactivate() from Tomcat 2 (non-coordinator), the writeLock() appears on both Tomcat 1 and Tomcat 2. This was an unsuccessful reactivation because HA-JDBC or JGroups is hung inside the call to reactivate().

Tomcat 1

Nov 03, 2015 8:28:49 PM net.sf.hajdbc.logging.slf4j.SLF4JLogger log
FINE: Received AcquireLockCommand(writeLock()) from 10.0.0.215
Nov 03, 2015 8:28:49 PM net.sf.hajdbc.logging.slf4j.SLF4JLogger log
FINE: Received AcquireLockCommand(writeLock()) from 10.0.0.215

Tomcat 2

Nov 03, 2015 8:28:49 PM net.sf.hajdbc.logging.slf4j.SLF4JLogger log
FINE: Received AcquireLockCommand(writeLock()) from 10.0.0.215
<hung>

Last edit: Justin Cranford 2015-11-03

ha-jdbc-jgroups-tcp.xml

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Justin Cranford - 2015-11-04

Hi Paul,

Ignore the previous attachments. I re-ran with FINER logging for net.sf.hajdbc and org.jgroups so there is much more detail.

Tomcat #1 - 10.0.0.214 (coordinator)
Tomcat #2 - 10.0.0.215 (non-coordinator)

I tried to invoke reactivate on 10.0.0.215 and it hung. I am attaching the log files, so if you could look at 10.0.0.215 first maybe you can figure out if this is a HA-JDBC bug or a JGroups bug/config issue. Thanks.

Last edit: Justin Cranford 2015-11-04

hajdbc-10.0.0.214-coordinator.log

hajdbc-10.0.0.215-noncoordinator-resync-from-here-hangs.log

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

DatabaseClusterImpl.activate() hangs if called on non-primary application node

High-Availability JDBC

Forums

Help

DatabaseClusterImpl.activate() hangs if called on non-primary application node

DatabaseClusterImpl.activate() hangs if called on non-primary application node

High-Availability JDBC

Forums

Help

DatabaseClusterImpl.activate() hangs if called on non-primary application node document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

DatabaseClusterImpl.activate() hangs if called on non-primary application node