[jgroups-dev] TIMED_WAITING threads on JGroup messages makes application wait indefinitely
Brought to you by:
belaban
|
From: Development i. <jav...@li...> - 2018-05-24 12:35:44
|
Hi Usually in discount/sale periods we are facing the threads which take care of sending JGroups messages to be in TIMED_WAITING state, more precisely on: java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00000005d2968450> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2163) at org.jgroups.util.CreditMap.decrement(CreditMap.java:157) at org.jgroups.protocols.MFC.handleDownMessage(MFC.java:102) It's like the case when the messages are so many the nodes cannot handle them and there are no enough credits to be send over between the nodes in order new messages to be processed. During peak periods we are having more or less 17 - 20 AWS EC2 instances in our cluster. One of the EC2 instance is dedicated for batch processing and on few occasions we receive huge files which initiates a big load of messages. We have around 10 nodes serving the user traffic and some more nodes for administration purposes. All of these nodes are communication between each other via JGroups in the cluster using TCP (at the time we migrated to AWS there was a constraint on only using TCP and we are exploring the ways to move to UDP now) and we are using version 3.4.1 JGroups. However having said that, with the current infrastructure, what should be the proposed JGroups TCP configuration? We feel that it is good practice to optimise our configuration. our configuration is as follows: <config xmlns="urn:org:jgroups" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/JGroups-3.0.xsd"> <TCP loopback="true" recv_buf_size="${tcp.recv_buf_size:20M}" send_buf_size="${tcp.send_buf_size:640K}" discard_incompatible_packets="true" max_bundle_size="64K" max_bundle_timeout="30" enable_bundling="true" use_send_queues="true" sock_conn_timeout="300" timer_type="new" timer.min_threads="4" timer.max_threads="10" timer.keep_alive_time="3000" timer.queue_max_size="500" thread_pool.enabled="true" thread_pool.min_threads="10" thread_pool.max_threads="40" thread_pool.keep_alive_time="5000" thread_pool.queue_enabled="false" thread_pool.queue_max_size="10000" thread_pool.rejection_policy="discard" oob_thread_pool.enabled="true" oob_thread_pool.min_threads="5" oob_thread_pool.max_threads="20" oob_thread_pool.keep_alive_time="5000" oob_thread_pool.queue_enabled="false" oob_thread_pool.queue_max_size="10000" oob_thread_pool.rejection_policy="discard" bind_addr="${hybris.jgroups.bind_addr}" bind_port="${hybris.jgroups.bind_port}" /> <JDBC_PING connection_driver="${hybris.database.driver}" connection_password="${hybris.database.password}" connection_username="${hybris.database.user}" connection_url="${hybris.database.url}" initialize_sql="${hybris.jgroups.schema}" datasource_jndi_name="${hybris.datasource.jndi.name}"/> <MERGE2 min_interval="10000" max_interval="30000" /> <FD_SOCK /> <FD timeout="3000" max_tries="3" /> <VERIFY_SUSPECT timeout="1500" /> <BARRIER /> <pbcast.NAKACK use_mcast_xmit="false" exponential_backoff="500" discard_delivered_msgs="true" /> <UNICAST /> <pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000" max_bytes="4M" /> <pbcast.GMS print_local_addr="true" join_timeout="3000" view_bundling="true" /> <UFC max_credits="20M" min_threshold="0.6" /> <MFC max_credits="20M" min_threshold="0.6" /> <FRAG2 frag_size="60K" /> <pbcast.STATE_TRANSFER /> </config> Based on this configuration, do you have any recommendation for us to modify anything here to get better throughput? Thanks in advance Simeon -- Sent from: http://jgroups.1086181.n5.nabble.com/JGroups-Dev-f6604.html |