Menu

TCPPING port_range > 3 doesn't work ...

Developers
2010-03-10
2012-09-06
  • Ryan Dollard

    Ryan Dollard - 2010-03-10

    Hi,

    We are having an issue using TCP_NIO and the TCPPING discovery protocol in
    JGroups 2.8 GA. We have two serves using the jgroups DistributedHashtable
    class in order to set a shared cluster state. The jgroups
    DistributedHashtable.Notification method contentsSet() (state transfer) is
    ONLY called during server startup when each server's TCPPING "port_range" is
    set to 3 or less. This was not an issue when we used JGroups 2.6.1. Below is
    the expected stack trace call (only called when port_range<=3):

    SDCluster.contentsSet(Map) line: 1161 <<< implements
    DistributedHashtable.Notification

    SDDistributedHashtable(DistributedHashtable)._putAll(Map) line: 436

    SDDistributedHashtable(DistributedHashtable).setState(InputStream) line: 691

    MessageDispatcher$ProtocolAdapter.handleUpEvent(Event) line: 787

    MessageDispatcher$ProtocolAdapter.up(Event) line: 849

    JChannel.up(Event) line: 1413

    ProtocolStack.up(Event) line: 829

    STREAMING_STATE_TRANSFER.connectToStateProvider(STREAMING_STATE_TRANSFER$State
    Header) line: 526

    STREAMING_STATE_TRANSFER.handleStateRsp(STREAMING_STATE_TRANSFER$StateHeader)
    line: 465

    STREAMING_STATE_TRANSFER.up(Event) line: 230

    FRAG2.up(Event) line: 188

    FC.up(Event) line: 470

    VIEW_SYNC.up(Event) line: 173

    GMS.up(Event) line: 890

    AUTH.up(Event) line: 143

    STABLE.up(Event) line: 236

    UNICAST.handleDataReceived(Address, long, long, boolean, Message) line: 582

    UNICAST.up(Event) line: 275

    NAKACK.up(Event) line: 692

    VERIFY_SUSPECT.up(Event) line: 132

    FD.up(Event) line: 259

    FD_SOCK.up(Event) line: 269

    MERGE2(Protocol).up(Event) line: 340

    TCPPING(Discovery).up(Event) line: 277

    TCP_NIO(TP).passMessageUp(Message, boolean, boolean, boolean) line: 953

    TP.access$100(TP, Message, boolean, boolean, boolean) line: 53

    TP$IncomingPacket.handleMyMessage(Message, boolean) line: 1457

    TP$IncomingPacket.run() line: 1439

    ThreadPoolExecutor$Worker.runTask(Runnable) line: 650

    ThreadPoolExecutor$Worker.run() line: 675

    Thread.run() line: 595

    And here is our jgroups protocol configuration (for each server):

    <TCP_NIO

    bind_port="7800"

    loopback="true"

    discard_incompatible_packets="true"

    max_bundle_size="64000"

    max_bundle_timeout="30"

    enable_bundling="true"

    oob_thread_pool.min_threads="20"

    oob_thread_pool.max_threads="30"

    reader_threads="3"

    writer_threads="3"

    processor_threads="5"

    processor_minThreads="5"

    processor_maxThreads="5"

    processor_queueSize="100"/>

    <TCPPING timeout="5000"

    initial_hosts="139.185.17.80,139.185.17.82"

    port_range="3"

    num_initial_members="2"/>

    <MERGE2 max_interval="100000"

    min_interval="20000"/>

    <fd_sock></fd_sock>

    <fd timeout="20000" max_tries="5"></fd>

    <verify_suspect timeout="1500"></verify_suspect>

    <pbcast.NAKACK

    max_xmit_size="60000" use_mcast_xmit="false" gc_lag="0"

    retransmit_timeout="100,200,300,600,1200,2400,4800"

    discard_delivered_msgs="true"/>

    <unicast timeout="300,600,1200"></unicast>

    <pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000"

    max_bytes="400000"/>

    <pbcast.GMS print_local_addr="false" join_timeout="3000"

    view_bundling="true"/>

    <FC max_credits="2000000"

    min_threshold="0.10"/>

    <frag2 frag_size="60000"></frag2>

    <pbcast.streaming_state_transfer></pbcast.streaming_state_transfer>

    When using jgroups 2.8 GA, we also repeatedly get the following error
    messsages (one for each port in the TCPPING port range):

    ERROR failed sending message to 139.185.17.80:7803 (117 bytes):
    java.lang.Exception: connection to 139.185.17.80:7803 could not be established

    I also see the following drop message in one of the servers when the second
    server comes on line:

    2010-03-10 10:20:55,496 TRACE message is , headers are MsgDisp: , dest_mbrs=,
    NAKACK: , TCP_NIO:

    2010-03-10 10:20:55,511 TRACE barney-59011: received hollyrock-46328#1

    2010-03-10 10:20:55,511 WARN barney-59011: dropped message from
    hollyrock-46328 (not in xmit_table), keys are , view=

    Any help would be appreciated.

    Thanks,

    Ryan

     
  • Ryan Dollard

    Ryan Dollard - 2010-03-11

    Increasing the TCPPING timeout seems to have resolved the problem.

    TCPPING(timeout=5000;port_range=3;...) -- works

    TCPPING(timeout=10000;port_range=10;...) -- works

    TCPPING(timeout=30000;port_range=20;...) -- works

     
  • Ryan Dollard

    Ryan Dollard - 2010-03-11

    Increasing the TCPPING timeout works if one server is allowed to completely
    start up before the second server is started. If both servers are started
    almost at the same time, TCPPING with the increased timeout doesn't seem to
    work. Each server does not "see" the other. TCPPING intial host discovery
    seems to have a timeout problem.

     

Log in to post a comment.

MongoDB Logo MongoDB