Re: [javagroups-users] "Golden" configuration for ISPN 5.1 (JGroups 3.x)
Brought to you by:
belaban
From: Bela B. <be...@ya...> - 2011-12-09 06:45:12
|
On 12/8/11 5:34 PM, Erik Salter wrote: > Hi all, > > > > We are migrating from ISPN 5.0.x to ISPN 5.1.x, which effects a move from > JGroups 2.x to 3.x. Since I think this would be beneficial to all users > who need to migrate and have a "quick-start" configuration, I'm sending it > to this list. Hmm, maybe you should have posted this to the Infinispan list instead. Anyway, I reply here and you can cross-port to ispn-dev if you want... and maybe we should continue there. > Currently, our configuration is a TCP-based stack that we're using for a > 6-12 distributed node cluster initially, scaling linearly. We have two > classifications of caches: > > > > - Non-transactional caches keyed by unique values. These record > sizes are typically larger. A write operation will write ~10K of data to > the grid for each owner. > > - Caches that have high lock contention. These will write ~7K of > data to the grid for each owner. These caches also use distributed > executor tasks that serialize the request (~20K) to be passed to the key's > data owner. > > > > The data grid nodes live behind a round-robin load balancer. We are trying > to push the data grid as fast as it will go, which seems to be a total > throughput of 120 total writes/sec on a 6 node cluster. - Are these writes/sec *per node* ? - Does the load balancer hit every node (I assume so) ? - What's the message size ? 10K (non-transactional) or 20K/7K (transactional) ? 120 writes/sec is bad, even if it is per node ! Such a low number could only be if you hit the same key(s) on different nodes in a *transactional* cache. TX collisions and subsequent rollbacks could probably get you such a low write rate. I ran some tests in our lab yesterday with 9 nodes, see my email below for reference: ================================ forwarded email ========================= - To start the test, run "jt UnicastTestRpcDist -props /home/bela/fast.xml -name A (-I)" on 9 nodes. I call the members A-I - They should find each other (note that 'jt' sets some JVM options and a max mem of 500m (which is not that much !). I could probably get some more perf out of this, if I tuned the options better (e.g. ConcurentMarkAndSweep for the old gen, or use of the G1 collector). I'm also using JDK 1.6 build 23, which doesn't use CCMS or compressed pointers (this is done automatically starting with build 29) - fast.xml sets mcast_addr to 232.x.x.x, which makes JGroups use eth1 - Once this is done, go to 1 node and press '1' - Do this a couple of times, to warm the cluster up - You can also change the read/write ratio, message size, number of messages to send, num-owners etc - I used JGroups 3.1.0.Alpha1 (master): /home/bela/JGroups Here are some numbers (cluster size is 9 nodes: cluster01-09, number of messages=20000 and anycast count (num-owners)=2: these are the defaults): Message size: Avg. message rate/sec/node: Avg. throughput/sec/node: 1000 19'000 19.0MB 2000 19'000 38.0MB 4000 18'400 73.9MB 8000 17'000 135.0MB 16000 10'500 168.0MB 32000 5'200 167.0MB Note that this is like Infinispan; a GET carries no payload but returns the payload (e.g. 1000 bytes). A PUT carries a payload and returns nothing. Also, there is no L1 cache enabled (or something similar). Members are allowed to pick themselves for GETs and PUTs, that's why we're getting throughputs of over 125MB/sec. These numbers should be the baseline against which Infinispan can be compared. As UnicastTestRpcDist doesn't do any real work (e.g. acquire locks, place values in a hashmap, access cache loaders etc), it should always be faster, say 30%. But the diff in perf should always be the same, and not change for different cluster sizes, message sizes etc. ============================== end of forwarded email ======================= As you can see, we get 10'500 messages/sec/node (168MB/sec/node) for a payload of 16K. This is with a read/write ratio of 0.8; when I change this to 0.20 (80% writes), then I still get 3'400 messages/sec/node (55MB/sec/node). You could try to run UnicastTestRpcDist in your perf lab, and see if you get similar numbers. Take your existing config, and then take my suggested config, and see which one gets you better results, and use that one then. > I've attached a sample configuration file for the 3.x version. My preference is udp.xml or udp-largecluster.xml (both are shipped with JGroups). A few comments regarding your 3.x config: - I recommend switch to UDP/PING - The thread pools in the transport have min sizes which are too big. Also the rejection policies are "run" which is something that I don't recommend. I also recommend a queue for the default thread pool - Use MERGE3 instead of MERGE2 - FD: increase the timeouts, or else you'll get false suspicions (consult the wiki for more details re FD versus FD_SOCK) - NAKACK: use exponential_backoff, this saves memory - UNICAST --> UNICAST2 - What's GMS doing *under* STABLE ???? - GMS: the merge_timeout of 600s is too high - STABLE should have max_bytes set - FC --> MFC/UFC - If you use STREAMING_STATE_TRANSFER, you should have BARRIER in your config ! Note that SST doesn't exist anymore in 3.x, it's now called STATE or STATE_SOCK - Remove FLUSH (don't think you need it, based on our IRC conversation) Again, I suggest copy one of udp.xml, udp-largecluster.xml or tcp.xml (if you must!), and use it with minor changes. -- Bela Ban Lead JGroups (http://www.jgroups.org) JBoss / Red Hat |