Re: [javagroups-users] Difference between defaults in code vs conf files.
Brought to you by:
belaban
From: Robert N. <rob...@gm...> - 2007-11-27 17:06:38
|
On Nov 27, 2007 4:57 PM, Bela Ban <be...@ya...> wrote: > > > Robert Newson wrote: > > I'm wrestling with inconsistent state during concurrent startup, > > taking the defaults seemed to help, but it's bad either way. Since > > there have been a number of bugs around state transfer and concurrent > > startup in 2.6, I guess I need 2.6.1 or to rollback to 2.5.1. > > Wait ! We have a number of unit tests which test concurrent startup & > state transfer, and they've been passing for quite some time now... Yes, that's the odd thing. It often fails in my lab on multiple nodes. A unit test is interesting but presumably you have some integration tests to prove it works when there are multiple machines, NIC's, etc inbetween? > > So do you use the new JOIN&STATE_Transfer API (connect()) of 2.6 ? > If you mean channel.connect(name, null, null, 0); then yes. > > I'm now based on udp-sync from stacks.xml since I mostly care about > > the synchronous calls (though it seems slower that the udp stack, > > which is expected). > > You mean for your RpcDispatcher cluster method calls I assume ? > Otherwise "udp-sync" is not slower than "udp"; it just doesn't have any > flow control, but that's it. > > > btw I have used a custom Map implementation that unicasts to the > > coordinator who then multicasts back to the group. > > To achieve total ordering ? Sounds like SEQUENCER... Yes, to achieve total ordering. This was mostly an exercise to find out if jgroups had a bug or I had a bad config, etc. Since the map as described worked every time, I concluded there was something wrong with RpcDispatcher or the Multiplexer. > > > This performed > > correctly whenever I tried it but I got pushback from my team as the > > mutating events are no longer synchronous (a small price to pay for > > correctness, you might argue...). > > > > Should I be attempting to use 2.6 GA in a production environment or 2.5.1? > > Well, again, if we don't know what the problems are, we can't fix them ! > So if you think you've found a bug in JGroups, let us know. I believe I have found bugs in jgroups 2.6 and have been trying to isolate them so that you (or Vlad) can reproduce them. As you can see I have reported several bugs already. :) > > Note that we made quite a number of changes (bug fixes and > simplifications) to FLUSH in 2.6/2.6.1, so this is preferred over 2.5.x. Since I'm having so many problems with 2.6 using out-of-the-box configuration, I can only conclude that I'm doing something wrong (maybe everyone knows that the out-of-box configuration is broken and they always change it?) or concurrent startup for multiple maps over the multiplexer doesn't work reliably. My tests show that it succeeds (everyone eventually gets a joined-up view and gets all the state correct) some of the time, but fails often with lots of different errors (and lots of WARN and ERROR level log files from jgroups). Often the views never get joined up as if the constant changes to the map (every node is updating the map every few seconds) prevents or inhibits view installation (even though I have pbcast.FLUSH timeout=0). Frankly, I have no idea why it's not working for me, but there's no doubt that it's not working. > > -- > > Bela Ban > Lead JGroups / Clustering Team > JBoss - a division of Red Hat > |