Menu

Queue Messages During State Transfer

Developers
2015-05-26
2015-05-26
  • Steve Schick

    Steve Schick - 2015-05-26

    I am working on a JGroups project where we are transferring a pretty large state when a node joins the group. It's about 1.25 GB of memory and takes >2 min. to stream the state. I'd like to configure JGroups in such a way so that it Queues messages for the nodes involved in state transfer but allow messages to continue as normal to other nodes.

    For Example, lets say we have the following nodes in the view:
    A - Coordinator
    B - Running node
    C - Running node
    D - Running node
    E - Starting node

    So when E joins the view and starts state transfer, I'd like A and E to stop "accepting" messages and queue them for when state transfer completes. B, C, & D should continue "accepting" messages and process them as normal.

    Then when state transfer completes, A and E would process the queued messages that came in during state transfer and start "accepting" messages as normal.

    Can JGroups be configured to "hold" and queue messages for nodes during an event like state transfer?

    JGroups version: 3.6.3.Final
    My protocol stack is below. Thanks for your input!

    <jgroupsconfig>
    
        <channelName>${jgroupsconfig.channelName}</channelName>
        <config>
            <UDP mcast_addr="228.1.2.3" 
                 mcast_port="${jgroupsconfig.config.UDP.mcast_port}"
                 ip_ttl="32"
                 ip_mcast="true"
                 mcast_send_buf_size="640K" 
                 mcast_recv_buf_size="2M"
                 ucast_send_buf_size="640K" 
                 ucast_recv_buf_size="2M"
                 max_bundle_size="64K" 
                 max_bundle_timeout="30"/>
            <PING />
            <MERGE3 min_interval="10000" 
                    max_interval="30000"/>
            <FD_SOCK />
            <FD_ALL />
            <VERIFY_SUSPECT timeout="5000"/>
            <BARRIER />
            <pbcast.NAKACK2 xmit_interval="1000"
                            xmit_table_num_rows="100"
                            xmit_table_msgs_per_row="2000"
                            xmit_table_max_compaction_time="30000"
                            max_msg_batch_size="500"
                            use_mcast_xmit="false"
                            discard_delivered_msgs="true"/>
            <UNICAST3 xmit_interval="2000"
                     xmit_table_num_rows="100"
                     xmit_table_msgs_per_row="2000"
                     xmit_table_max_compaction_time="60000"
                     conn_expiry_timeout="60000"
                     max_msg_batch_size="500"/>
            <pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000" 
                           max_bytes="4M"/>
            <pbcast.GMS join_timeout="5000" print_local_addr="true" 
                        view_bundling="true"/>
            <UFC max_credits="2M"
                 min_threshold="0.4" />
            <MFC max_credits="2M"
                 min_threshold="0.4" />
            <FRAG2 frag_size="60000"/>
            <pbcast.STATE_SOCK/>
        </config>
    </jgroupsconfig>
    
     

    Last edit: Steve Schick 2015-05-26
  • Bela Ban

    Bela Ban - 2015-05-26

    Seems my email wasn't received.. here it goes again...:
    By default, JGroups does queue incoming messages on state transfer on the state provider (A) when using a subclass of StreamingStateTransfer (STATE, STATE_SOCK) and BARRIER. Not sure about the state requester (E), I'd have to check the code.

    This is done by closing BARRIER. When closing, BARRIER waits until all incoming threads have returned and allows no new threads to enter, but queues their messages.

    When the state has been transferred, all queued messages will be sent up by BARRIER.

    Note that queueing messages on A means that A won't be able to do certain things, e.g. admitting new members (joins) and handling STABLE requests. If your state is large, perhaps pick someone else, not the coordinator, as state provider.

    You could also ship your state by means other than JChannel.getState(), e.g. on the application level: join your new member, but leave it non-operational until state has been transferred. State could be transferred by means of messages or RPCs, and - when complete - make the member operational.

     
  • Steve Schick

    Steve Schick - 2015-05-26

    Thanks for the quick reply!
    So the BARRIER only applies to one node (the state provider) when in is closed? What about the requester? Regardless, It's not a "stop the world" barrier on all nodes, right?

    Agreed that queuing up messages on the coordinator for a long time is not desirable.

    Could you point me at a doc/example that shows how to join a member but leave it non-operational?

    Many thanks.

     
  • Bela Ban

    Bela Ban - 2015-05-26

    Yes, BARRIER is only closed on the state provider.
    No, this is not stop-the-world. If you wanted that, use FLUSH.
    The non-operational member is something at the application level, e.g. a member doesn't process client requests etc. It's not something JGroups does.

     
  • Steve Schick

    Steve Schick - 2015-05-26

    Sounds good. Thanks for the help!

     

Log in to post a comment.

MongoDB Logo MongoDB