Menu

How to detect when a JGroups member rejoins the HA-JDBC application cluster?

Help
2014-08-08
2014-11-11
  • Justin Cranford

    Justin Cranford - 2014-08-08

    Hi Paul,

    I am looking for a way in HA-JDBC 3.0.3 to detect when a JGroups member rejoins HA-JDBC's JGroups application cluster. And if there is a way to differentiate member restarts versus merge from split-brain, that is even more ideal. I would like to trigger some logging and business logic when merge from split-brain occurs. Here are some use cases to clarify what I mean:

    1) Tomcat starts and it is the first member of the application cluster
    2) Tomcat starts and joins an existing application cluster
    3) Tomcat restarts, and rejoins the existing application cluster
    4) Tomcat becomes isolated, and rejoins the existing application cluster (with merging)

    I see HA-JDBC has a MembershipListener class, but it only offers bare minimum information. Specifically, it has added() and removed() methods with net.sf.hajdbc.distributed.Member parameters. The Member interface has no useful methods, and the only implementation I could find is net.sf.hajdbc.distributed.jgroups.AddressMember with a simple getAddress() method.

    Using MembershipListener.added() on its own is not enough to differentiate between these use cases. Ideally I would like to detect 4). However, I can make due with a solution that only fires for 3) and 4), and cannot differentiate any further.

    I did something similar with Hazelcast application clustering. I manually tracked member history by querying the list of all attached members immediately after startup, and then using its similar added() membership listener to detect if a member was seen before. If it was being added, and it was seen before, it falls into use case 3 or 4. It is not enough to differentiate between 3 and 4, but it was better than using the added() membership listener on its own.

    In short:

    Does HA-JDBC have a listener to notify when DB cluster state is being merged (due to split-brain)? Or it not, then what is the HA-JDBC API to register the membership listener, and is there another HA-JDBC API to query currently attached JGroups members?

    Thanks,
    Justin

     
  • Paul Ferraro

    Paul Ferraro - 2014-08-10

    In general, you would detect this via org.jgroups.MembershipListener. The viewAccepted(View) method is triggered when a node joins or leaves. If the argument is a MergeView, then this represents a healed network partition. We can certainly expand the existing CommandDispatcher API to expose these events.

     
  • Justin Cranford

    Justin Cranford - 2014-08-11

    Are you proposing to expose MergeView events via a new MembershipListener method or parameter? If so that would be great.

    So, how to actually register a MembershipListener instance programmatically? I cannot find any methods or parameters in these core programmatic API classes I am using:

    • JGroupsCommandDispatcherFactory
    • DataSourceDatabaseClusterConfiguration
    • SimpleDatabaseClusterConfigurationFactory
    • DatabaseClusterImpl

    I see references to MembershipListener inside some of this code, but no external API to inject my MembershipListener instance. Does this make sense?

    In addition to the above, is there an API to query what JGroups members are actively connected to the application cluster? The instance of DatabaseClusterImpl<DataSource, DataSourceDatabase=""> has a getActiveDatabases() API, but no getActiveMembers() API or something along those lines. Does a programmatic API exist to query that list of AddressMember objects??

     
  • Paul Ferraro

    Paul Ferraro - 2014-08-14

    For now, the easiest way to add custom MembershipListener logic is to supply your own CommandDispatcherFactory implementation to HA-JDBC that uses a decorator. It would look something like:

    public class MyCommandDispatcherFactory extends JGroupsCommandDispatcherFactory
    {
        @Override
        public <C> CommandDispatcher<C> createCommandDispatcher(String id, C context, Stateful stateful, final MembershipListener listener) throws Exception
        {
            MembershipListener myListener = new MembershipListener()
            {
                @Override
                public void viewAccepted(View view)
                {
                    // Insert your logic here
                    listener.viewAccepted(view);
                }
    
                @Override
                public void block()
                {
                    listener.block();
                }
    
                @Override
                public void suspect(Address member)
                {
                    listener.suspect(member);
                }
    
                @Override
                public void unblock()
                {
                    listener.unblock();
                }
            };
            super.createCommandDispatcher(id, context, stateful, myListener);
        }
    }
    

    You can use this listener to expose the cluster membership to your application as well.

     
  • Justin Cranford

    Justin Cranford - 2014-08-14

    I tried the example but it did not work. It is mixing two interface classes with the same MembershipListener name:

    • org.jgroups.MembershipListener: viewAccepted(), block(), suspect(), unblock()
    • net.sf.hajdbc.distributed.MembershipListener: added(), removed()

    Your example seems to implement org.jgroups.MembershipListener, but then it tries to pass into JGroupsCommandDispatcherFactory.createCommandDispatcher which needs a different net.sf.hajdbc.distributed.MembershipListener parameter.

    I can still use this example to inject my own net.sf.hajdbc.distributed.MembershipListener implementation, but that means no JGroups view to get initial membership at startup, or for catching merge view events at run-time.

    Is there something I can do in addition of tweaking this example, or does it require new HA-JDBC APIs?

     
  • Justin Cranford

    Justin Cranford - 2014-09-04

    I fixed my JGroups listener and started the cluster ok, but I ran into an unexpected issue. The JGroups membership listener fires twice for each node that joins the HA-JDBC/JGroups cluster.

    I suspect the root cause is HA-JDBC 3.0 using two separate JGroups clusters, one for lock and the other for state. If I could filter in the listener by cluster that would help, but the listener only exposes a net.sf.hajdbc.distributed.Member object which does not specify the cluster.

    Any ideas? I can think of a few, but I don't know what would be feasible in the scope of 3.0.4:

    • Multiplex locks and state on a single JGroups cluster again. I recall you mentioned this was the plan for a future release, though, and not for 3.0.x.
    • Expose the JGroups merge viewAccepted() events. Your example in this thread tried to do that, but it did not compile due to class incompatibility - the custom JGroupsCommandDispatcherFactory can only register a net.sf.hajdbc.distributed.MembershipListener object with added() and removed() methods, not the org.jgroups.MembershipListener object with the viewAccepted() method.
    • Extend net.sf.hajdbc.distributed.MembershipListener beyond added() and removed() to relay the four methods in org.jgroups.MembershipListener - viewAccepted(), block(), suspect(), unblock().
    • Expose the JGroups cluster instance in net.sf.hajdbc.distributed.MembershipListener methods added() and removed(), so I can filter based on the cluster name, lock or state, to avoid double processing.

    Here are the HA-JDBC/JGroups log messages from my Tomcat 1 log when my 3-node application cluster booted up with addresses 10.0.0.88,10.0.0.89,10.0.0.90:

    Tomcat 1 starts JGroups 3-node application cluster, but added() listener fires twice

    -------------------------------------------------------------------
    GMS: address=10.0.0.88, cluster=mycluster.lock, physical address=10.0.0.88:7900
    -------------------------------------------------------------------
    Sep 04, 2014 1:56:32 PM com.mycompany.server.dao.impl.DataSourceManager$CustomJGroupsMembershipListener added
    WARNING: CustomJGroupsMembershipListener.added Detected event for '10.0.0.88' after member first joined.
    
    -------------------------------------------------------------------
    GMS: address=10.0.0.88, cluster=mycluster.state, physical address=10.0.0.88:7900
    -------------------------------------------------------------------
    Sep 04, 2014 1:56:36 PM com.mycompany.server.dao.impl.DataSourceManager$CustomJGroupsMembershipListener added
    WARNING: CustomJGroupsMembershipListener.added Detected event for '10.0.0.88' after member rejoined (MERGE FROM SPLIT/BRAIN DETECTED).
    

    Tomcat 2 joins JGroups 3-node application cluster, but added() listener fires twice

    Sep 04, 2014 1:56:49 PM com.mycompany.server.dao.impl.DataSourceManager$CustomJGroupsMembershipListener added
    WARNING: CustomJGroupsMembershipListener.added Detected event for '10.0.0.89' after member first joined.
    Sep 04, 2014 1:56:49 PM com.mycompany.server.dao.impl.DataSourceManager f
    WARNING: DataSourceManager.showClusterMembers MEMBERS: 10.0.0.88,10.0.0.89
    
    Sep 04, 2014 1:56:50 PM com.mycompany.server.dao.impl.DataSourceManager$CustomJGroupsMembershipListener added
    WARNING: CustomJGroupsMembershipListener.added Detected event for '10.0.0.89' after member rejoined (MERGE FROM SPLIT/BRAIN DETECTED).
    Sep 04, 2014 1:56:50 PM com.mycompany.server.dao.impl.DataSourceManager f
    WARNING: DataSourceManager.showClusterMembers MEMBERS: 10.0.0.88,10.0.0.89
    

    Tomcat 2 joins Hazelcast 3-node application cluster, which triggers my listener to reset my application-specific global cache

    Sep 04, 2014 1:56:55 PM com.hazelcast.cluster.ClusterService
    INFO: [10.0.0.88]:5900 [ClusterManager] [3.2.4]
    
    Members [2] {
            Member [10.0.0.88]:5900 this
            Member [10.0.0.89]:5900
    }
    
    Sep 04, 2014 1:56:55 PM com.mycompany.server.security.b memberAdded
    WARNING: ApplicationClusterManager.memberAdded Detected for Tomcat '10.0.0.89:5900' in application cluster. Event: MembershipEvent {member=Member [10.0.0.89]:5900,type=added}
    Sep 04, 2014 1:56:55 PM com.mycompany.server.c.a$b e
    WARNING: T74: SequenceManager.reset Reset distributed cache
    

    Tomcat 3 joins JGroups 3-node application cluster, but added() listener fires twice

    Sep 04, 2014 1:57:08 PM com.mycompany.server.dao.impl.DataSourceManager$CustomJGroupsMembershipListener added
    WARNING: CustomJGroupsMembershipListener.added Detected event for '10.0.0.90' after member first joined.
    Sep 04, 2014 1:57:08 PM com.mycompany.server.dao.impl.DataSourceManager f
    WARNING: DataSourceManager.showClusterMembers MEMBERS: 10.0.0.88,10.0.0.89,10.0.0.90
    
    Sep 04, 2014 1:57:08 PM com.mycompany.server.dao.impl.DataSourceManager$CustomJGroupsMembershipListener added
    WARNING: CustomJGroupsMembershipListener.added Detected event for '10.0.0.90' after member rejoined (MERGE FROM SPLIT/BRAIN DETECTED).
    Sep 04, 2014 1:57:08 PM com.mycompany.server.dao.impl.DataSourceManager f
    WARNING: DataSourceManager.showClusterMembers MEMBERS: 10.0.0.88,10.0.0.89,10.0.0.90
    

    Tomcat 3 joins Hazelcast application cluster, which triggers my listener to reset my application-specific global cache

    Sep 04, 2014 1:57:13 PM com.hazelcast.cluster.ClusterService
    INFO: [10.0.0.88]:5900 [ClusterManager] [3.2.4]
    
    Members [3] {
            Member [10.0.0.88]:5900 this
            Member [10.0.0.89]:5900
            Member [10.0.0.90]:5900
    }
    
    Sep 04, 2014 1:57:13 PM com.mycompany.server.security.b memberAdded
    WARNING: ApplicationClusterManager.memberAdded Detected for Tomcat '10.0.0.90:5900' in application cluster. Event: MembershipEvent {member=Member [10.0.0.90]:5900,type=added}
    Sep 04, 2014 1:57:13 PM com.mycompany.server.c.a$b e
    WARNING: T75: SequenceManager.reset Reset distributed cache
    
     
  • Paul Ferraro

    Paul Ferraro - 2014-09-10

    Of your suggestions, it makes the most sense to provide the appropriate cluster context to the MembershipListener.

    The plan for the master branch is to leverage the FORK protocol to unify the two channels.

     
    • Justin Cranford

      Justin Cranford - 2014-10-04

      I pulled 3.0 branch from github and build 3.0.4-SNAPSHOT. Unfortunately I cannot find the change you mentioned - adding cluster context to MembershipListener to differentiate events for lock cluster vs state cluster? I am getting close to my release time using HA-JDBC 3.0.x so I would like to wrap up that piece. Thanks.

       
      • Paul Ferraro

        Paul Ferraro - 2014-10-07

        I haven't gotten around to adding that yet. I'll get around to it this week. Feel free to submit a pull request if you have some bandwidth.

         
  • Justin Cranford

    Justin Cranford - 2014-09-10

    Sounds good for 3.0.4. I am just wondering about master. If you go back to one cluster context using FORK protocol, does it make sense to return it in the listener on master? This is more a curiosity because I think it will be ok either way.

     
  • Justin Cranford

    Justin Cranford - 2014-11-11

    Hi Paul,

    I would like to submit this patch to 3.0.4-SNAPSHOT. This is required for me to integrate a custom MembershipListener into HA-JDBC which fires in addition to DistributedLockManager and DistributedStateManager, instead of replacing them.

    The challenge I was having with overriding JGroupsCommandDispatcherFactory.createCommandDispatcher() was it was replacing the instances of DistributedLockManager and DistributedStateManager passed to JGroupsCommandDispatcher. When JGroupsCommandDispatcher.viewAccepted() was triggered by JGroups, it would call my MembershipListener twice, and skip calls to DistributedLockManager and DistributedStateManager.

    The patch is the only way I could get JGroupsCommandDispatcher to call my MembershipListener alongside HA-JDBC's DistributedLockManager and DistributedStateManager listeners.

    Please review, and if you like it you can add it to 3.0.4-SNAPSHOT. The logic is backwards-compatible, and custom MembershipListener(s) are totally optional.

    BTW, here is the code for my custom JGroupsCommandDispatcherFactory class which I used to add my CustomJGroupsMembershipListener instance:

    public static class CustomJGroupsCommandDispatcherFactory extends JGroupsCommandDispatcherFactory {
        @Override
        public <C> CommandDispatcher<C> createCommandDispatcher(String id, C context, Stateful stateful, final MembershipListener membershipListener) throws Exception {
            // ASSUMPTION: membershipListener is either DistributedLockManager or DistributedStateManager
            JGroupsCommandDispatcher<C> jgroupsCommandDispatcher = (JGroupsCommandDispatcher<C>) super.createCommandDispatcher(id, context, stateful, membershipListener);
            if (membershipListener instanceof DistributedStateManager) {
                // Only add our MembershipListener to JGroupsCommandDispatcher for DistributedStateManager,
                // otherwise JGroupsCommandDispatcher.viewAccepted() calls our MembershipListener twice.
                jgroupsCommandDispatcher.addUserMembershipListener(new CustomJGroupsMembershipListener());
            }
            return jgroupsCommandDispatcher;
        }
    }
    

    Sincerely,
    Justin

     

Log in to post a comment.