HA-JDBC / Discussion / Help: Upgraded JGroups from 3.4.4 to 3.4.6, but HA-JDBC 3.0.3 throws NullPointerException during activate

This scenario worked with HA-JDBC 3.0.3 and JGroups 3.4.4.

1) Start MySQL 1-3
2) Start Tomcat 1-3 in HA-JDBC/JGroups cluster (all DB members active)
3) Stop and start MySQL 1 (HA-JDBC deactivates db1)
4) Activate MySQL 1 from Tomcat 1 (all DB members active again)

I tried with JGroups 3.4.6 and it fails. I get a NullPointerException in HA-JDBC trying to get a lock.

Nov 08, 2014 9:39:25 PM com.company.DataSourceManager activateClusterDatabaseById
SEVERE: DataSourceManager.activateClusterDatabaseById Failed to activate database db1 due to exception:
java.lang.NullPointerException
    at net.sf.hajdbc.lock.distributed.DistributedLockManager$DistributedLock.lockMembers(DistributedLockManager.java:407)
    at net.sf.hajdbc.lock.distributed.DistributedLockManager$DistributedLock.lockInterruptibly(DistributedLockManager.java:308)
    at net.sf.hajdbc.sql.DatabaseClusterImpl.activate(DatabaseClusterImpl.java:864)
    at net.sf.hajdbc.sql.DatabaseClusterImpl.activate(DatabaseClusterImpl.java:156)
    at com.company.DataSourceManager.activateClusterDatabaseById(SourceFile:1435)

I was hoping upgrade to JGroups 3.4.6 might resolve the HA-JDBC lock hanging further down the line in my test, but I did not get that far:

5) Stop and start MySQL 1 again
6) Stop and start Tomcat 1 (dropped message errors)
7) Activate MySQL 1 from Tomcat 1 (HA-JDBC acquire lock fails)

Last edit: Justin Cranford 2014-11-08

I switched HA-JDBC from 3.0.3 to to 3.0.4-SNAPSHOT while keeping JGroups 3.4.6. I get a NullPointerException again, but with a different stack trace this time. The stack trace appears twice, once for each remote Tomcat according to the exception message referring to each remote Tomcat address.

    Nov 08, 2014 10:23:05 PM net.sf.hajdbc.logging.slf4j.SLF4JLogger log
    WARNING: Failed to acquire writeLock() on 10.0.0.200
    java.util.concurrent.ExecutionException: java.lang.NullPointerException
            at net.sf.hajdbc.distributed.jgroups.JGroupsCommandDispatcher$RspCommandResponse.get(JGroupsCommandDispatcher.java:347)
            at net.sf.hajdbc.lock.distributed.DistributedLockManager$DistributedLock.readAcquireResponse(DistributedLockManager.java:524)
            at net.sf.hajdbc.lock.distributed.DistributedLockManager$DistributedLock.lockMembers(DistributedLockManager.java:481)
            at net.sf.hajdbc.lock.distributed.DistributedLockManager$DistributedLock.lockInterruptibly(DistributedLockManager.java:330)
            at net.sf.hajdbc.sql.DatabaseClusterImpl.activate(DatabaseClusterImpl.java:864)
            at net.sf.hajdbc.sql.DatabaseClusterImpl.activate(DatabaseClusterImpl.java:156)
            at com.company.DataSourceManager.activateClusterDatabaseById(SourceFile:1435)
    Caused by: java.lang.NullPointerException
            at net.sf.hajdbc.lock.distributed.AcquireLockCommand.execute(AcquireLockCommand.java:60)
            at net.sf.hajdbc.lock.distributed.AcquireLockCommand.execute(AcquireLockCommand.java:30)
            at net.sf.hajdbc.distributed.jgroups.JGroupsCommandDispatcher.handle(JGroupsCommandDispatcher.java:239)
            at org.jgroups.blocks.MessageDispatcher.handle(MessageDispatcher.java:479)
            at org.jgroups.blocks.RequestCorrelator.handleRequest(RequestCorrelator.java:472)
            at org.jgroups.blocks.RequestCorrelator.receiveMessage(RequestCorrelator.java:377)
            at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:247)
            at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:667)
            at org.jgroups.JChannel.up(JChannel.java:708)
            at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:1015)
            at org.jgroups.protocols.pbcast.STATE_TRANSFER.up(STATE_TRANSFER.java:178)
            at org.jgroups.protocols.FRAG2.up(FRAG2.java:165)
            at org.jgroups.protocols.FlowControl.up(FlowControl.java:370)
            at org.jgroups.protocols.pbcast.GMS.up(GMS.java:1010)
            at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:234)
            at org.jgroups.protocols.UNICAST3.up(UNICAST3.java:391)
            at org.jgroups.protocols.pbcast.NAKACK2.handleMessage(NAKACK2.java:774)
            at org.jgroups.protocols.pbcast.NAKACK2.up(NAKACK2.java:570)
            at org.jgroups.protocols.BARRIER.up(BARRIER.java:107)
            at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:147)
            at org.jgroups.protocols.FD.up(FD.java:255)
            at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:301)
            at org.jgroups.protocols.MERGE2.up(MERGE2.java:209)
            at org.jgroups.protocols.Discovery.up(Discovery.java:379)
            at org.jgroups.protocols.TP$ProtocolAdapter.up(TP.java:2615)
            at org.jgroups.protocols.TP.passMessageUp(TP.java:1405)
            at org.jgroups.protocols.TP$MyHandler.run(TP.java:1591)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
            at java.lang.Thread.run(Unknown Source)

Justin Cranford - 2014-11-09

Sorry, I don't understand what is happening.

I retested with 3.0.3 and 3.0.4-SNAPSHOT, and using JGroups versions 3.4.3, 3.4.4, 3.4.5, and 3.4.6. I always get a NullPointerException, either the first posted exception if using 3.0.3, or the second posted exception if using 3.0.4-SNAPSHOT.

I am not sure if anything has changed other than a JRE upgrade to 7u71.

Sorry to bother you, but any ideas? I can turn on debug logging if that helps.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Justin Cranford - 2014-11-09

I found the root cause. I tried backing down to Java 7u45 but that did not work. So next I started rollback back my changes to programmatic JGroups configuration one by one. Reactivate worked as soon as I reverted the JGroupsCommandDispatcherFactory extension. The factory override was a workaround you recommended to inject my JGroups membership listener.

public static class CustomJGroupsCommandDispatcherFactory extends JGroupsCommandDispatcherFactory { private static final long serialVersionUID = 7382393165820252762L; @Override public <C> CommandDispatcher<C> createCommandDispatcher(String id, C context, Stateful stateful, final MembershipListener listener) throws Exception { try { return super.createCommandDispatcher(id, context, stateful, MEMBERSHIP_LISTENER); // ignore "listener" parameter, inject our own } catch(Exception e) { LOGGER.log(Level.WARNING, "CustomJGroupsCommandDispatcherFactory.createCommandDispatcher Unexpected exception:", e); } return null; } }

If you recall, HA-JDBC did not expose a programmatic API to add a JGroups membership listener, so you suggested I try injecting it by overriding JGroupsCommandDispatcherFactory.

Using that workaround, I was able to track JGroups membership changes. However, it had a problem with firing twice - once for Lock cluster and another for State cluster.

Adding a cluster identifier to the MembershipListener methods was an outstanding issue so I could filter on the state cluster. However, I guess the class extension broke something in HA-JDBC distributed locking.

I was meaning to get back to the MembershipListener change now that I might have some bandwidth to help you out, but this puts me back. I cannot use the JGroupsCommandDispatcherFactory extension because it causes unexpected side effects.

Any ideas?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Upgraded JGroups from 3.4.4 to 3.4.6, but HA-JDBC 3.0.3 throws...

High-Availability JDBC

Forums

Help

Upgraded JGroups from 3.4.4 to 3.4.6, but HA-JDBC 3.0.3 throws NullPointerException during activate