[javagroups-users] simultaneous startup of group members
Brought to you by:
belaban
From: Kurmann R. <rk...@el...> - 2002-09-11 16:33:15
|
Hi We want to simultaneously start up group members in our project. But the results are not as expected. A reproduction with Draw showed the same problems: Test with Draw: I start two Draw programs (say Draw A and Draw B) right after each other (delay of less than 1 sec). 1) These two programs don't find each other in the first view (see traces). Is this behaviour expected/correct? 2) After some time the MERGE2 protocol merges these two programs into one group, but the GMS protocol throws an error (see traces). I'm using Draw of Javagroups 2.0.2 with the default protocol stack. I ran the tests with Sun JDK 1.3.1 on a Windows 2000 PC. Do you have any ideas? Draw A, short trace: ~~~~~~~~~~~~~~~~~~~~ ------------------------------------------------------- GMS: address is PCRKU:3964 ------------------------------------------------------- ** View=[PCRKU:3964|0] [PCRKU:3964] ** View=MergeView::[PCRKU:3964|1] [PCRKU:3964, PCRKU:3966], subgroups=[[PCRKU:3 966|0] [PCRKU:3966],[PCRKU:3964|0] [PCRKU:3964]] [16:45:10[968]] [ERROR] GMS.installView(): received view <= current view; disca rding it ! (current vid: [PCRKU:3964|1], new vid: [PCRKU:3964|1]) Draw B, short trace: ~~~~~~~~~~~~~~~~~~~~ ------------------------------------------------------- GMS: address is PCRKU:3964 ------------------------------------------------------- ** View=[PCRKU:3964|0] [PCRKU:3964] ** View=MergeView::[PCRKU:3964|1] [PCRKU:3964, PCRKU:3966], subgroups=[[PCRKU:3 966|0] [PCRKU:3966],[PCRKU:3964|0] [PCRKU:3964]] [16:45:10[968]] [ERROR] GMS.installView(): received view <= current view; disca rding it ! (current vid: [PCRKU:3964|1], new vid: [PCRKU:3964|1]) Draw A, debug trace: ~~~~~~~~~~~~~~~~~~~~ [17:27:43[031]] [INFO] GMS.setImpl(): changed role to org.javagroups.protocols.p bcast.ClientGmsImpl [17:27:43[046]] [INFO] GMS.becomeClient: null became client [17:27:43[078]] [INFO] UDP.startWork(): creating sockets and starting threads [17:27:43[218]] [INFO] UDP.createSockets(): all sockets will use interface 10.40 .20.115 [17:27:43[281]] [INFO] UDP.createSockets(): socket information: local_addr=PCRKU:4034, mcast_addr=228.8.8.8:45566, ttl=32 send socket: bound to 10.40.20.115:4033, send buffer size=32000 receive socket: bound to 10.40.20.115:4034, receive buffer size=64000 multicast socket: bound to 10.40.20.115:45566, send buffer size=150000, receive buffer size=80000 [17:27:43[281]] [INFO] UDP.startThreads(): created unicast receiver thread ------------------------------------------------------- GMS: address is PCRKU:4034 ------------------------------------------------------- [17:27:43[359]] [INFO] PING.down(): FIND_INITIAL_MBRS [17:27:43[359]] [INFO] FD_SOCK.run(): determinePingDest()=null, pingable_mbrs=[] [17:27:43[359]] [INFO] FD_SOCK.run(): pinger thread terminated [17:27:45[375]] [INFO] PING.down(): initial mbrs are [] [17:27:45[390]] [DEBUG] ClientGmsImpl.join(): initial_mbrs are [] [17:27:45[390]] [INFO] ClientGmsImpl.join(): no initial members discovered: crea ting group as first member [17:27:45[406]] [INFO] GMS.installView(): view is [PCRKU:4034|0] [PCRKU:4034] [17:27:45[406]] [INFO] GMS.setImpl(): changed role to org.javagroups.protocols.p bcast.CoordGmsImpl [17:27:45[421]] [INFO] GMS.becomeCoordinator(): PCRKU:4034 became coordinator [17:27:45[421]] [INFO] ClientGmsImpl.becomeSingletonMember(): created group (fir st member). My view is [PCRKU:4034|0], impl is org.javagroups.protocols.pbcast.C oordGmsImpl [17:27:45[421]] [INFO] STABLE.startStableTask(): stable task started; num_gossip _runs=3, max_gossip_runs=3 [17:27:45[437]] [INFO] FD_SOCK.down(): VIEW_CHANGE received: [PCRKU:4034] [17:27:45[437]] [INFO] FD_SOCK.getCacheFromCoordinator(): first member; cache is empty [17:27:45[484]] [INFO] FD_SOCK.up(): i-have-sock: PCRKU:4034 --> PCRKU:10000 (ca che is {PCRKU:4034=PCRKU:10000}) ** View=[PCRKU:4034|0] [PCRKU:4034] [17:27:46[296]] [INFO] FD_SOCK.up(): i-have-sock: PCRKU:4036 --> PCRKU:10001 (ca che is {PCRKU:4034=PCRKU:10000, PCRKU:4036=PCRKU:10001}) [17:27:52[140]] [INFO] PING.up(): received GET_MBRS_REQ from PCRKU:4036, returni ng [PING: type=GET_MBRS_RSP, arg=[own_addr=PCRKU:4034, coord_addr=PCRKU:4034]] [17:27:54[000]] [INFO] PING.down(): FIND_INITIAL_MBRS [17:27:54[000]] [INFO] PING.up(): received GET_MBRS_REQ from PCRKU:4034, returni ng [PING: type=GET_MBRS_RSP, arg=[own_addr=PCRKU:4034, coord_addr=PCRKU:4034]] [17:27:54[015]] [INFO] PING.up(): received FIND_INITAL_MBRS_RSP, rsp=[own_addr=P CRKU:4036, coord_addr=PCRKU:4036] [17:27:54[015]] [INFO] PING.up(): received FIND_INITAL_MBRS_RSP, rsp=[own_addr=P CRKU:4034, coord_addr=PCRKU:4034] [17:27:55[421]] [INFO] STABLE.sendStableMessage(): mcasting digest [PCRKU:4034: [0 : 0] [17:27:56[015]] [INFO] PING.down(): initial mbrs are [[own_addr=PCRKU:4036, coor d_addr=PCRKU:4036], [own_addr=PCRKU:4034, coord_addr=PCRKU:4034]] [17:27:56[031]] [INFO] MERGE2.FindSubgroups.run(): initial_mbrs=[[own_addr=PCRKU :4036, coord_addr=PCRKU:4036], [own_addr=PCRKU:4034, coord_addr=PCRKU:4034]] [17:27:56[031]] [INFO] MERGE2.FindSubgroups.run(): found multiple coordinators: [PCRKU:4036, PCRKU:4034]; sending up MERGE event [17:27:56[031]] [INFO] CoordGmsImpl.merge(): coordinators in merge protocol are: [PCRKU:4034, PCRKU:4036] [17:27:56[031]] [INFO] CoordGmsImpl.merge(): I will be the leader. Starting the merge task [17:27:56[031]] [INFO] STABLE.handleStableGossip(): received digest PCRKU:4034#0 (-1) from PCRKU:4034 [17:27:56[031]] [INFO] STABLE.handleStableGossip(): sending stability msg PCRKU: 4034#0 (-1) [17:27:56[031]] [INFO] STABLE.sendStabilityMessage(): stability_task=null, delay is 5093 [17:27:56[046]] [INFO] CoordGmsImpl.MergeTask.run(): merge task started [17:27:56[046]] [INFO] CoordGmsImpl.getMergeDataFromSubgroupCoordinators(): send ing MERGE_REQ to [PCRKU:4036, PCRKU:4034] [17:27:56[062]] [INFO] CoordGmsImpl.handleMergeRequest(): sender=PCRKU:4034, mer ge_id=[PCRKU:4034|1031758076046] [17:27:56[062]] [INFO] CoordGmsImpl.sendMergeResponse(): response=GmsHeader[MERG E_RSP]: view=[PCRKU:4034|0] [PCRKU:4034], digest=[PCRKU:4034: [0 : 0], merge_rej ected=false, merge_id=[PCRKU:4034|1031758076046] [17:27:56[078]] [INFO] CoordGmsImpl.getMergeDataFromSubgroupCoordinators(): wait ing for 10000 msecs for merge responses [17:27:56[078]] [INFO] CoordGmsImpl.getMergeDataFromSubgroupCoordinators(): rece ived response: num_rsps_expected=2, actual responses=1 [17:27:56[078]] [INFO] CoordGmsImpl.getMergeDataFromSubgroupCoordinators(): wait ing for 10000 msecs for merge responses [17:27:56[109]] [INFO] CoordGmsImpl.getMergeDataFromSubgroupCoordinators(): rece ived response: num_rsps_expected=2, actual responses=2 [17:27:56[109]] [INFO] CoordGmsImpl.consolidateMergeData(): merge data is sender =PCRKU:4034, view=[PCRKU:4034|0] [PCRKU:4034], digest=[PCRKU:4034: [0 : 0] [17:27:56[109]] [INFO] CoordGmsImpl.consolidateMergeData(): merge data is sender =PCRKU:4036, view=[PCRKU:4036|0] [PCRKU:4036], digest=[PCRKU:4036: [0 : 0] [17:27:56[109]] [INFO] CoordGmsImpl.consolidateMergeData(): new merged view will be MergeView::[PCRKU:4034|1] [PCRKU:4034, PCRKU:4036], subgroups=[[PCRKU:4034|0 ] [PCRKU:4034], [PCRKU:4036|0] [PCRKU:4036]] [17:27:56[109]] [INFO] CoordGmsImpl.consolidateMergeData(): consolidated digest= [PCRKU:4034: [0 : 0, PCRKU:4036: [0 : 0] [17:27:56[109]] [INFO] CoordGmsImpl.MergeTask.run(): merge task terminated [17:27:56[125]] [INFO] GMS.castViewChange(): mcasting view {MergeView::[PCRKU:40 34|1] [PCRKU:4034, PCRKU:4036], subgroups=[[PCRKU:4034|0] [PCRKU:4034], [PCRKU:4 036|0] [PCRKU:4036]]} (2 mbrs) [17:27:56[125]] [INFO] NAKACK.send(): sending msg #0 [17:27:56[125]] [INFO] CoordGmsImpl.handleViewChange(): view=MergeView::[PCRKU:4 034|1] [PCRKU:4034, PCRKU:4036], subgroups=[[PCRKU:4034|0] [PCRKU:4034], [PCRKU: 4036|0] [PCRKU:4036]] [17:27:56[140]] [INFO] GMS.installView(): view is MergeView::[PCRKU:4034|1] [PCR KU:4034, PCRKU:4036], subgroups=[[PCRKU:4034|0] [PCRKU:4034], [PCRKU:4036|0] [PC RKU:4036]] [17:27:56[140]] [INFO] FD_SOCK.down(): VIEW_CHANGE received: [PCRKU:4034, PCRKU: 4036] ** View=MergeView::[PCRKU:4034|1] [PCRKU:4034, PCRKU:4036], subgroups=[[PCRKU:40 34|0] [PCRKU:4034], [PCRKU:4036|0] [PCRKU:4036]] [17:27:56[218]] [INFO] FD_SOCK.run(): determinePingDest()=PCRKU:4036, pingable_m brs=[PCRKU:4034, PCRKU:4036] [17:27:56[218]] [INFO] FD_SOCK.run(): ping_dest=PCRKU:4036, ping_sock=Socket[add r=PCRKU.elca.ch/10.40.20.115,port=10001,localport=4037], cache={PCRKU:4034=PCRKU :10000, PCRKU:4036=PCRKU:10001} [17:27:56[234]] [INFO] CoordGmsImpl.handleViewChange(): view=MergeView::[PCRKU:4 034|1] [PCRKU:4034, PCRKU:4036], subgroups=[[PCRKU:4034|0] [PCRKU:4034], [PCRKU: 4036|0] [PCRKU:4036]] [17:27:56[234]] [ERROR] GMS.installView(): received view <= current view; discar ding it ! (current vid: [PCRKU:4034|1], new vid: [PCRKU:4034|1]) [17:27:56[265]] [INFO] STABLE.handleStableGossip(): received digest PCRKU:4034#0 (-1), PCRKU:4036#0 (0) from PCRKU:4036 [17:27:56[265]] [INFO] STABLE.handleStableGossip(): already received gossip from PCRKU:4036 Draw B, debug trace: ~~~~~~~~~~~~~~~~~~~~ [17:27:44[125]] [INFO] GMS.setImpl(): changed role to org.javagroups.protocols.p bcast.ClientGmsImpl [17:27:44[125]] [INFO] GMS.becomeClient: null became client [17:27:44[140]] [INFO] UDP.startWork(): creating sockets and starting threads [17:27:44[187]] [INFO] UDP.createSockets(): all sockets will use interface 10.40 .20.115 [17:27:44[203]] [INFO] UDP.createSockets(): socket information: local_addr=PCRKU:4036, mcast_addr=228.8.8.8:45566, ttl=32 send socket: bound to 10.40.20.115:4035, send buffer size=32000 receive socket: bound to 10.40.20.115:4036, receive buffer size=64000 multicast socket: bound to 10.40.20.115:45566, send buffer size=150000, receive buffer size=80000 [17:27:44[203]] [INFO] UDP.startThreads(): created unicast receiver thread ------------------------------------------------------- GMS: address is PCRKU:4036 ------------------------------------------------------- [17:27:44[218]] [INFO] PING.down(): FIND_INITIAL_MBRS [17:27:44[218]] [INFO] FD_SOCK.run(): determinePingDest()=null, pingable_mbrs=[] [17:27:44[218]] [INFO] FD_SOCK.run(): pinger thread terminated [17:27:45[468]] [INFO] FD_SOCK.up(): i-have-sock: PCRKU:4034 --> PCRKU:10000 (ca che is {PCRKU:4034=PCRKU:10000}) [17:27:46[234]] [INFO] PING.down(): initial mbrs are [] [17:27:46[234]] [DEBUG] ClientGmsImpl.join(): initial_mbrs are [] [17:27:46[234]] [INFO] ClientGmsImpl.join(): no initial members discovered: crea ting group as first member [17:27:46[250]] [INFO] GMS.installView(): view is [PCRKU:4036|0] [PCRKU:4036] [17:27:46[250]] [INFO] GMS.setImpl(): changed role to org.javagroups.protocols.p bcast.CoordGmsImpl [17:27:46[250]] [INFO] GMS.becomeCoordinator(): PCRKU:4036 became coordinator [17:27:46[250]] [INFO] ClientGmsImpl.becomeSingletonMember(): created group (fir st member). My view is [PCRKU:4036|0], impl is org.javagroups.protocols.pbcast.C oordGmsImpl [17:27:46[265]] [INFO] STABLE.startStableTask(): stable task started; num_gossip _runs=3, max_gossip_runs=3 [17:27:46[296]] [INFO] FD_SOCK.down(): VIEW_CHANGE received: [PCRKU:4036] [17:27:46[296]] [INFO] FD_SOCK.getCacheFromCoordinator(): first member; cache is empty [17:27:46[296]] [INFO] FD_SOCK.up(): i-have-sock: PCRKU:4036 --> PCRKU:10001 (ca che is {PCRKU:4034=PCRKU:10000, PCRKU:4036=PCRKU:10001}) ** View=[PCRKU:4036|0] [PCRKU:4036] [17:27:52[140]] [INFO] PING.down(): FIND_INITIAL_MBRS [17:27:52[140]] [INFO] PING.up(): received GET_MBRS_REQ from PCRKU:4036, returni ng [PING: type=GET_MBRS_RSP, arg=[own_addr=PCRKU:4036, coord_addr=PCRKU:4036]] [17:27:52[156]] [INFO] PING.up(): received FIND_INITAL_MBRS_RSP, rsp=[own_addr=P CRKU:4036, coord_addr=PCRKU:4036] [17:27:52[171]] [INFO] PING.up(): received FIND_INITAL_MBRS_RSP, rsp=[own_addr=P CRKU:4034, coord_addr=PCRKU:4034] [17:27:54[000]] [INFO] PING.up(): received GET_MBRS_REQ from PCRKU:4034, returni ng [PING: type=GET_MBRS_RSP, arg=[own_addr=PCRKU:4036, coord_addr=PCRKU:4036]] [17:27:54[125]] [INFO] PING.down(): initial mbrs are [[own_addr=PCRKU:4036, coor d_addr=PCRKU:4036], [own_addr=PCRKU:4034, coord_addr=PCRKU:4034]] [17:27:54[125]] [INFO] MERGE2.FindSubgroups.run(): initial_mbrs=[[own_addr=PCRKU :4036, coord_addr=PCRKU:4036], [own_addr=PCRKU:4034, coord_addr=PCRKU:4034]] [17:27:54[125]] [INFO] MERGE2.FindSubgroups.run(): found multiple coordinators: [PCRKU:4036, PCRKU:4034]; sending up MERGE event [17:27:54[125]] [INFO] CoordGmsImpl.merge(): coordinators in merge protocol are: [PCRKU:4034, PCRKU:4036] [17:27:56[046]] [INFO] STABLE.handleStableGossip(): received digest PCRKU:4034#0 (-1) from PCRKU:4034 [17:27:56[078]] [INFO] STABLE.handleStableGossip(): already received gossip from PCRKU:4034 [17:27:56[078]] [INFO] CoordGmsImpl.handleMergeRequest(): sender=PCRKU:4034, mer ge_id=[PCRKU:4034|1031758076046] [17:27:56[078]] [INFO] CoordGmsImpl.sendMergeResponse(): response=GmsHeader[MERG E_RSP]: view=[PCRKU:4036|0] [PCRKU:4036], digest=[PCRKU:4036: [0 : 0], merge_rej ected=false, merge_id=[PCRKU:4034|1031758076046] [17:27:56[203]] [WARN] NAKACK.handleMessage(): [PCRKU:4036] discarded message fr om non-member PCRKU:4034 [17:27:56[218]] [INFO] GMS.castViewChange(): mcasting view {MergeView::[PCRKU:40 34|1] [PCRKU:4034, PCRKU:4036], subgroups=[[PCRKU:4034|0] [PCRKU:4034], [PCRKU:4 036|0] [PCRKU:4036]]} (2 mbrs) [17:27:56[218]] [INFO] NAKACK.send(): sending msg #0 [17:27:56[234]] [INFO] CoordGmsImpl.handleViewChange(): view=MergeView::[PCRKU:4 034|1] [PCRKU:4034, PCRKU:4036], subgroups=[[PCRKU:4034|0] [PCRKU:4034], [PCRKU: 4036|0] [PCRKU:4036]] [17:27:56[234]] [INFO] GMS.installView(): view is MergeView::[PCRKU:4034|1] [PCR KU:4034, PCRKU:4036], subgroups=[[PCRKU:4034|0] [PCRKU:4034], [PCRKU:4036|0] [PC RKU:4036]] [17:27:56[234]] [INFO] GMS.setImpl(): changed role to org.javagroups.protocols.p bcast.ParticipantGmsImpl ** View=MergeView::[PCRKU:4034|1] [PCRKU:4034, PCRKU:4036], subgroups=[[PCRKU:40 34|0] [PCRKU:4034], [PCRKU:4036|0] [PCRKU:4036]] [17:27:56[250]] [INFO] GMS.becomeParticipant(): PCRKU:4036 became participant [17:27:56[250]] [INFO] FD_SOCK.down(): VIEW_CHANGE received: [PCRKU:4034, PCRKU: 4036] [17:27:56[250]] [INFO] FD_SOCK.run(): determinePingDest()=PCRKU:4034, pingable_m brs=[PCRKU:4034, PCRKU:4036] [17:27:56[265]] [INFO] STABLE.sendStableMessage(): mcasting digest [PCRKU:4034: [0 : 0, PCRKU:4036: [0 : 0 (0)]] [17:27:56[265]] [INFO] STABLE.handleStableGossip(): received digest PCRKU:4034#0 (-1), PCRKU:4036#0 (0) from PCRKU:4036 [17:27:56[562]] [INFO] FD_SOCK.up(): i-have-sock: PCRKU:4034 --> PCRKU:10000 (ca che is {PCRKU:4034=PCRKU:10000, PCRKU:4036=PCRKU:10001}) [17:27:56[562]] [INFO] FD_SOCK.run(): ping_dest=PCRKU:4034, ping_sock=Socket[add r=PCRKU.elca.ch/10.40.20.115,port=10000,localport=4038], cache={PCRKU:4034=PCRKU :10000, PCRKU:4036=PCRKU:10001} |