Menu

Upgraded JGroups from 3.4.4 to 3.4.6, but HA-JDBC 3.0.3 throws NullPointerException during activate

Help
2014-11-08
2014-11-09
  • Justin Cranford

    Justin Cranford - 2014-11-08

    This scenario worked with HA-JDBC 3.0.3 and JGroups 3.4.4.

    1) Start MySQL 1-3
    2) Start Tomcat 1-3 in HA-JDBC/JGroups cluster (all DB members active)
    3) Stop and start MySQL 1 (HA-JDBC deactivates db1)
    4) Activate MySQL 1 from Tomcat 1 (all DB members active again)
    

    I tried with JGroups 3.4.6 and it fails. I get a NullPointerException in HA-JDBC trying to get a lock.

    Nov 08, 2014 9:39:25 PM com.company.DataSourceManager activateClusterDatabaseById
    SEVERE: DataSourceManager.activateClusterDatabaseById Failed to activate database db1 due to exception:
    java.lang.NullPointerException
        at net.sf.hajdbc.lock.distributed.DistributedLockManager$DistributedLock.lockMembers(DistributedLockManager.java:407)
        at net.sf.hajdbc.lock.distributed.DistributedLockManager$DistributedLock.lockInterruptibly(DistributedLockManager.java:308)
        at net.sf.hajdbc.sql.DatabaseClusterImpl.activate(DatabaseClusterImpl.java:864)
        at net.sf.hajdbc.sql.DatabaseClusterImpl.activate(DatabaseClusterImpl.java:156)
        at com.company.DataSourceManager.activateClusterDatabaseById(SourceFile:1435)
    

    I was hoping upgrade to JGroups 3.4.6 might resolve the HA-JDBC lock hanging further down the line in my test, but I did not get that far:

    5) Stop and start MySQL 1 again
    6) Stop and start Tomcat 1 (dropped message errors)
    7) Activate MySQL 1 from Tomcat 1 (HA-JDBC acquire lock fails)
    
     

    Last edit: Justin Cranford 2014-11-08
  • Justin Cranford

    Justin Cranford - 2014-11-08

    I switched HA-JDBC from 3.0.3 to to 3.0.4-SNAPSHOT while keeping JGroups 3.4.6. I get a NullPointerException again, but with a different stack trace this time. The stack trace appears twice, once for each remote Tomcat according to the exception message referring to each remote Tomcat address.

        Nov 08, 2014 10:23:05 PM net.sf.hajdbc.logging.slf4j.SLF4JLogger log
        WARNING: Failed to acquire writeLock() on 10.0.0.200
        java.util.concurrent.ExecutionException: java.lang.NullPointerException
                at net.sf.hajdbc.distributed.jgroups.JGroupsCommandDispatcher$RspCommandResponse.get(JGroupsCommandDispatcher.java:347)
                at net.sf.hajdbc.lock.distributed.DistributedLockManager$DistributedLock.readAcquireResponse(DistributedLockManager.java:524)
                at net.sf.hajdbc.lock.distributed.DistributedLockManager$DistributedLock.lockMembers(DistributedLockManager.java:481)
                at net.sf.hajdbc.lock.distributed.DistributedLockManager$DistributedLock.lockInterruptibly(DistributedLockManager.java:330)
                at net.sf.hajdbc.sql.DatabaseClusterImpl.activate(DatabaseClusterImpl.java:864)
                at net.sf.hajdbc.sql.DatabaseClusterImpl.activate(DatabaseClusterImpl.java:156)
                at com.company.DataSourceManager.activateClusterDatabaseById(SourceFile:1435)
        Caused by: java.lang.NullPointerException
                at net.sf.hajdbc.lock.distributed.AcquireLockCommand.execute(AcquireLockCommand.java:60)
                at net.sf.hajdbc.lock.distributed.AcquireLockCommand.execute(AcquireLockCommand.java:30)
                at net.sf.hajdbc.distributed.jgroups.JGroupsCommandDispatcher.handle(JGroupsCommandDispatcher.java:239)
                at org.jgroups.blocks.MessageDispatcher.handle(MessageDispatcher.java:479)
                at org.jgroups.blocks.RequestCorrelator.handleRequest(RequestCorrelator.java:472)
                at org.jgroups.blocks.RequestCorrelator.receiveMessage(RequestCorrelator.java:377)
                at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:247)
                at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:667)
                at org.jgroups.JChannel.up(JChannel.java:708)
                at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:1015)
                at org.jgroups.protocols.pbcast.STATE_TRANSFER.up(STATE_TRANSFER.java:178)
                at org.jgroups.protocols.FRAG2.up(FRAG2.java:165)
                at org.jgroups.protocols.FlowControl.up(FlowControl.java:370)
                at org.jgroups.protocols.pbcast.GMS.up(GMS.java:1010)
                at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:234)
                at org.jgroups.protocols.UNICAST3.up(UNICAST3.java:391)
                at org.jgroups.protocols.pbcast.NAKACK2.handleMessage(NAKACK2.java:774)
                at org.jgroups.protocols.pbcast.NAKACK2.up(NAKACK2.java:570)
                at org.jgroups.protocols.BARRIER.up(BARRIER.java:107)
                at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:147)
                at org.jgroups.protocols.FD.up(FD.java:255)
                at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:301)
                at org.jgroups.protocols.MERGE2.up(MERGE2.java:209)
                at org.jgroups.protocols.Discovery.up(Discovery.java:379)
                at org.jgroups.protocols.TP$ProtocolAdapter.up(TP.java:2615)
                at org.jgroups.protocols.TP.passMessageUp(TP.java:1405)
                at org.jgroups.protocols.TP$MyHandler.run(TP.java:1591)
                at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
                at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
                at java.lang.Thread.run(Unknown Source)
    
     
  • Justin Cranford

    Justin Cranford - 2014-11-09

    Sorry, I don't understand what is happening.

    I retested with 3.0.3 and 3.0.4-SNAPSHOT, and using JGroups versions 3.4.3, 3.4.4, 3.4.5, and 3.4.6. I always get a NullPointerException, either the first posted exception if using 3.0.3, or the second posted exception if using 3.0.4-SNAPSHOT.

    I am not sure if anything has changed other than a JRE upgrade to 7u71.

    Sorry to bother you, but any ideas? I can turn on debug logging if that helps.

     
  • Justin Cranford

    Justin Cranford - 2014-11-09

    I found the root cause. I tried backing down to Java 7u45 but that did not work. So next I started rollback back my changes to programmatic JGroups configuration one by one. Reactivate worked as soon as I reverted the JGroupsCommandDispatcherFactory extension. The factory override was a workaround you recommended to inject my JGroups membership listener.

    public static class CustomJGroupsCommandDispatcherFactory extends JGroupsCommandDispatcherFactory {
        private static final long serialVersionUID = 7382393165820252762L;
    
        @Override
        public <C> CommandDispatcher<C> createCommandDispatcher(String id, C context, Stateful stateful, final MembershipListener listener) throws Exception {
            try {
                return super.createCommandDispatcher(id, context, stateful, MEMBERSHIP_LISTENER);   // ignore "listener" parameter, inject our own
            } catch(Exception e) {
                LOGGER.log(Level.WARNING, "CustomJGroupsCommandDispatcherFactory.createCommandDispatcher Unexpected exception:", e);
            }
            return null;
        }
    }
    

    If you recall, HA-JDBC did not expose a programmatic API to add a JGroups membership listener, so you suggested I try injecting it by overriding JGroupsCommandDispatcherFactory.

    Using that workaround, I was able to track JGroups membership changes. However, it had a problem with firing twice - once for Lock cluster and another for State cluster.

    Adding a cluster identifier to the MembershipListener methods was an outstanding issue so I could filter on the state cluster. However, I guess the class extension broke something in HA-JDBC distributed locking.

    I was meaning to get back to the MembershipListener change now that I might have some bandwidth to help you out, but this puts me back. I cannot use the JGroupsCommandDispatcherFactory extension because it causes unexpected side effects.

    Any ideas?

     

Log in to post a comment.