You can subscribe to this list here.
2007 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(37) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2008 |
Jan
(27) |
Feb
(34) |
Mar
(30) |
Apr
(151) |
May
(184) |
Jun
(55) |
Jul
(2) |
Aug
(6) |
Sep
|
Oct
(1) |
Nov
|
Dec
|
From: SourceForge.net <no...@so...> - 2008-03-10 21:42:08
|
Patches item #1882928, was opened at 2008-01-30 11:52 Message generated for change (Comment added) made by breed You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=1008546&aid=1882928&group_id=209147 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: java client Group: None >Status: Closed >Resolution: Accepted Priority: 5 Private: No Submitted By: Benjamin Reed (breed) Assigned to: Nobody/Anonymous (nobody) Summary: Log the uncaught exceptions from the SendThread and EventThr Initial Comment: If the SendThread or EventThread exit unexpectedly due to some runtime error, we should log it for post-mortum purposes. This patch also widens ZooLog to take Throwable rather than just Exception. ---------------------------------------------------------------------- >Comment By: Benjamin Reed (breed) Date: 2008-03-10 14:42 Message: Logged In: YES user_id=154690 Originator: YES Committed revision 110. ---------------------------------------------------------------------- Comment By: Mahadev Konar (mahadevkonar) Date: 2008-03-10 13:40 Message: Logged In: YES user_id=1926680 Originator: NO +1 make it happen! ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=1008546&aid=1882928&group_id=209147 |
From: SourceForge.net <no...@so...> - 2008-03-10 20:40:20
|
Patches item #1882928, was opened at 2008-01-30 19:52 Message generated for change (Comment added) made by mahadevkonar You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=1008546&aid=1882928&group_id=209147 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: java client Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Benjamin Reed (breed) Assigned to: Nobody/Anonymous (nobody) Summary: Log the uncaught exceptions from the SendThread and EventThr Initial Comment: If the SendThread or EventThread exit unexpectedly due to some runtime error, we should log it for post-mortum purposes. This patch also widens ZooLog to take Throwable rather than just Exception. ---------------------------------------------------------------------- >Comment By: Mahadev Konar (mahadevkonar) Date: 2008-03-10 20:40 Message: Logged In: YES user_id=1926680 Originator: NO +1 make it happen! ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=1008546&aid=1882928&group_id=209147 |
From: SourceForge.net <no...@so...> - 2008-03-10 15:17:41
|
Patches item #1881204, was opened at 2008-01-28 07:28 Message generated for change (Comment added) made by breed You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=1008546&aid=1881204&group_id=209147 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: server Group: None >Status: Closed >Resolution: Accepted Priority: 7 Private: No Submitted By: fpj (fpj) Assigned to: fpj (fpj) Summary: New leader election algorithm with TCP. Initial Comment: This is a feature request for a new leader election algorithm with TCP. ---------------------------------------------------------------------- >Comment By: Benjamin Reed (breed) Date: 2008-03-10 08:17 Message: Logged In: YES user_id=154690 Originator: NO Revision 190 ---------------------------------------------------------------------- Comment By: fpj (fpj) Date: 2008-03-04 09:29 Message: Logged In: YES user_id=1926444 Originator: YES File Added: patch-le-tcp-v6e.txt ---------------------------------------------------------------------- Comment By: fpj (fpj) Date: 2008-02-27 02:35 Message: Logged In: YES user_id=1926444 Originator: YES Got rid of the "property changes". -Flavio File Added: patch-le-tcp-v6d.txt ---------------------------------------------------------------------- Comment By: Mahadev Konar (mahadevkonar) Date: 2008-02-26 21:20 Message: Logged In: YES user_id=1926680 Originator: NO sorry to comment again but there are some svn changes that make some files executable... like Property changes on: /mnt/filer/Noronha/workspace/zookeeper/java/src/com/yahoo/zookeeper/server/quorum/Vote.java ___________________________________________________________________ Name: svn:executable + * could we remove those svn changes ? sorry abt that... should I mentioned earlier.. ---------------------------------------------------------------------- Comment By: fpj (fpj) Date: 2008-02-26 03:01 Message: Logged In: YES user_id=1926444 Originator: YES I have reformatted the code, and wrapped around lines that seemed too long in my screen. File Added: patch-le-tcp-v6c.txt ---------------------------------------------------------------------- Comment By: Mahadev Konar (mahadevkonar) Date: 2008-02-25 11:35 Message: Logged In: YES user_id=1926680 Originator: NO other than that: +1 for the patch. ---------------------------------------------------------------------- Comment By: Mahadev Konar (mahadevkonar) Date: 2008-02-25 11:35 Message: Logged In: YES user_id=1926680 Originator: NO only comment: the patch has tabs ... :) and also some of the lines are really long (we could wrap them arnd). ---------------------------------------------------------------------- Comment By: Mahadev Konar (mahadevkonar) Date: 2008-02-25 11:35 Message: Logged In: YES user_id=1926680 Originator: NO +1 make it happen ---------------------------------------------------------------------- Comment By: fpj (fpj) Date: 2008-02-22 08:12 Message: Logged In: YES user_id=1926444 Originator: YES File Added: patch-le-tcp-v6b.txt ---------------------------------------------------------------------- Comment By: fpj (fpj) Date: 2008-02-22 04:50 Message: Logged In: YES user_id=1926444 Originator: YES A few bugs fixed. These bugs were generating some race conditions, in particular for the cross-colo cases. Thanks, -Flavio File Added: patch-le-tcp-v6.txt ---------------------------------------------------------------------- Comment By: fpj (fpj) Date: 2008-02-19 08:07 Message: Logged In: YES user_id=1926444 Originator: YES In version 5b, I simply removed some output messages, which I had forgot to remove. Everything else should be the same as with version 5. -Flavio File Added: patch-le-tcp-v5b.txt ---------------------------------------------------------------------- Comment By: fpj (fpj) Date: 2008-02-19 07:59 Message: Logged In: YES user_id=1926444 Originator: YES Version 5 has an input parameter (electionAlg) to select which implementation of the algorithm to use. There are currently 4 flavors: the original UDP-based ZK leader election algorithm (0), UDP-based without authentication (1), UDP-based with authentication (2), TCP-based (3). I tested with three different clusters, and locally all four work fine. However, when adding a remote machine the zookeeper servers go wild when using any of the UDP-based versions. It seems that state gets corrupted, and the servers stop beahving correctly. The TCP-based version presented no problem when servers were in different clusters, though. I can't explain why state gets corrupted, but it seems to happen when there is at least one remote machine and I use a UDP-based implementation. -Flavio File Added: patch-le-tcp-v5.txt ---------------------------------------------------------------------- Comment By: Benjamin Reed (breed) Date: 2008-02-18 07:50 Message: Logged In: YES user_id=154690 Originator: NO +1 make it happen ---------------------------------------------------------------------- Comment By: fpj (fpj) Date: 2008-02-14 06:30 Message: Logged In: YES user_id=1926444 Originator: YES I have tested the new implementation using machines across the Atlantic, and I found a problem with the way we were opening a connection to the new leader on Follower.java. On Follower.followLeader(), there was a for loop that supposedly tried to connect to the new leader 3 times. The way it was implemented before didn't work because in the second iteration the Socket object was not valid anymore. This was causing the follower to try only once as in the second iteration it was throwing an excpetion that was not ConnectException, and it was leaving the for loop. I have modified the code to create a socket inside the loop for every iteration. In this way the Socket object is valid on every attempt. Because the follower was trying only once, it was happening that the follower would try to connect to the leader before the leader was ready to accept connections. To avoid this problem, we had this hack in the leader election implemention (the previous version was doing the same) to make the follower wait for a fixed amount of time, which we had set to 100ms. When I tried with high-latency connections, the value of 100ms was not sufficient, and I was observing runs in which the system was never making progress because the leader election would succeed, but the single follower of my experiment was not being able to connect to the leader as the leader was not ready yet and the bug in the code allowed the follower to try only once. With this fix, the follower may still experience unsuccessful attempts to connect to the leader, but given that it waits one second until the next try, it often succeeds in the second attempt. Moreover, with this fix, I've been able to get rid of the 100ms timer at the end of leader election, so it now terminates even faster. -Flavio File Added: patch-le-tcp-v4.txt ---------------------------------------------------------------------- Comment By: fpj (fpj) Date: 2008-02-05 06:09 Message: Logged In: YES user_id=1926444 Originator: YES I'm attaching a third version of the patch, in which I fixed a few bugs. In response to Mahadev's comment, I could use the server id to initialize the Random object, but since I'm not passing it to the constructor, I've chosen to initialize the seed in a different way, using IP and time. The idea of using IP is to have different different seeds for different servers. As Mahadev pointed out before, we have to consider tha case in which one computer runs multiple ZK servers. For this special case, I add the current time to break ties with respect to challenge. In my understanding, the way we generate the challenge value doesn't really matter. I think it is more general in the way I've implemented as it doesn't depend upon any particular scheme of identification for the servers. It is important to note, though, that it is not always necessary to break ties as there are times in which the attempt to initiate a connection is one way. In these cases, if we try to break ties, we may end up with no connection. The logic to overcome such corner cases is fairly simple. if a server A doesn't have a connection to server B, and A has received a connection request from B, then A must accept it. To make it work despite the utilization of a tie-break mechanism, A changes its challenge to the smallest possible value the challenge can have. In this way, it makes sure that it loses the challenge. If I use a mechanism that doesn't exchange challenge values (such as using IP and port), then it is not possible to implement the trick I describe above. File Added: patch-le-tcp-v3.txt ---------------------------------------------------------------------- Comment By: Mahadev Konar (mahadevkonar) Date: 2008-02-04 12:21 Message: Logged In: YES user_id=1926680 Originator: NO a few comments: new Random(System.currentTimeMillis() + + localIP.hashCode()); what is the idea behind adding ipaddress hashcode and currentimemillis? why not just use the serverid or in that case why not just use the raw server id for connection resolving? something like a server with higher server id is the server that the other connects to... ---------------------------------------------------------------------- Comment By: fpj (fpj) Date: 2008-01-31 09:14 Message: Logged In: YES user_id=1926444 Originator: YES File Added: patch-le-tcp-v2.txt ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=1008546&aid=1881204&group_id=209147 |
From: SourceForge.net <no...@so...> - 2008-03-04 17:29:00
|
Patches item #1881204, was opened at 2008-01-28 16:28 Message generated for change (Comment added) made by fpj You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=1008546&aid=1881204&group_id=209147 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: server Group: None Status: Open Resolution: None Priority: 7 Private: No Submitted By: fpj (fpj) Assigned to: fpj (fpj) Summary: New leader election algorithm with TCP. Initial Comment: This is a feature request for a new leader election algorithm with TCP. ---------------------------------------------------------------------- >Comment By: fpj (fpj) Date: 2008-03-04 18:29 Message: Logged In: YES user_id=1926444 Originator: YES File Added: patch-le-tcp-v6e.txt ---------------------------------------------------------------------- Comment By: fpj (fpj) Date: 2008-02-27 11:35 Message: Logged In: YES user_id=1926444 Originator: YES Got rid of the "property changes". -Flavio File Added: patch-le-tcp-v6d.txt ---------------------------------------------------------------------- Comment By: Mahadev Konar (mahadevkonar) Date: 2008-02-27 06:20 Message: Logged In: YES user_id=1926680 Originator: NO sorry to comment again but there are some svn changes that make some files executable... like Property changes on: /mnt/filer/Noronha/workspace/zookeeper/java/src/com/yahoo/zookeeper/server/quorum/Vote.java ___________________________________________________________________ Name: svn:executable + * could we remove those svn changes ? sorry abt that... should I mentioned earlier.. ---------------------------------------------------------------------- Comment By: fpj (fpj) Date: 2008-02-26 12:01 Message: Logged In: YES user_id=1926444 Originator: YES I have reformatted the code, and wrapped around lines that seemed too long in my screen. File Added: patch-le-tcp-v6c.txt ---------------------------------------------------------------------- Comment By: Mahadev Konar (mahadevkonar) Date: 2008-02-25 20:35 Message: Logged In: YES user_id=1926680 Originator: NO other than that: +1 for the patch. ---------------------------------------------------------------------- Comment By: Mahadev Konar (mahadevkonar) Date: 2008-02-25 20:35 Message: Logged In: YES user_id=1926680 Originator: NO only comment: the patch has tabs ... :) and also some of the lines are really long (we could wrap them arnd). ---------------------------------------------------------------------- Comment By: Mahadev Konar (mahadevkonar) Date: 2008-02-25 20:35 Message: Logged In: YES user_id=1926680 Originator: NO +1 make it happen ---------------------------------------------------------------------- Comment By: fpj (fpj) Date: 2008-02-22 17:12 Message: Logged In: YES user_id=1926444 Originator: YES File Added: patch-le-tcp-v6b.txt ---------------------------------------------------------------------- Comment By: fpj (fpj) Date: 2008-02-22 13:50 Message: Logged In: YES user_id=1926444 Originator: YES A few bugs fixed. These bugs were generating some race conditions, in particular for the cross-colo cases. Thanks, -Flavio File Added: patch-le-tcp-v6.txt ---------------------------------------------------------------------- Comment By: fpj (fpj) Date: 2008-02-19 17:07 Message: Logged In: YES user_id=1926444 Originator: YES In version 5b, I simply removed some output messages, which I had forgot to remove. Everything else should be the same as with version 5. -Flavio File Added: patch-le-tcp-v5b.txt ---------------------------------------------------------------------- Comment By: fpj (fpj) Date: 2008-02-19 16:59 Message: Logged In: YES user_id=1926444 Originator: YES Version 5 has an input parameter (electionAlg) to select which implementation of the algorithm to use. There are currently 4 flavors: the original UDP-based ZK leader election algorithm (0), UDP-based without authentication (1), UDP-based with authentication (2), TCP-based (3). I tested with three different clusters, and locally all four work fine. However, when adding a remote machine the zookeeper servers go wild when using any of the UDP-based versions. It seems that state gets corrupted, and the servers stop beahving correctly. The TCP-based version presented no problem when servers were in different clusters, though. I can't explain why state gets corrupted, but it seems to happen when there is at least one remote machine and I use a UDP-based implementation. -Flavio File Added: patch-le-tcp-v5.txt ---------------------------------------------------------------------- Comment By: Benjamin Reed (breed) Date: 2008-02-18 16:50 Message: Logged In: YES user_id=154690 Originator: NO +1 make it happen ---------------------------------------------------------------------- Comment By: fpj (fpj) Date: 2008-02-14 15:30 Message: Logged In: YES user_id=1926444 Originator: YES I have tested the new implementation using machines across the Atlantic, and I found a problem with the way we were opening a connection to the new leader on Follower.java. On Follower.followLeader(), there was a for loop that supposedly tried to connect to the new leader 3 times. The way it was implemented before didn't work because in the second iteration the Socket object was not valid anymore. This was causing the follower to try only once as in the second iteration it was throwing an excpetion that was not ConnectException, and it was leaving the for loop. I have modified the code to create a socket inside the loop for every iteration. In this way the Socket object is valid on every attempt. Because the follower was trying only once, it was happening that the follower would try to connect to the leader before the leader was ready to accept connections. To avoid this problem, we had this hack in the leader election implemention (the previous version was doing the same) to make the follower wait for a fixed amount of time, which we had set to 100ms. When I tried with high-latency connections, the value of 100ms was not sufficient, and I was observing runs in which the system was never making progress because the leader election would succeed, but the single follower of my experiment was not being able to connect to the leader as the leader was not ready yet and the bug in the code allowed the follower to try only once. With this fix, the follower may still experience unsuccessful attempts to connect to the leader, but given that it waits one second until the next try, it often succeeds in the second attempt. Moreover, with this fix, I've been able to get rid of the 100ms timer at the end of leader election, so it now terminates even faster. -Flavio File Added: patch-le-tcp-v4.txt ---------------------------------------------------------------------- Comment By: fpj (fpj) Date: 2008-02-05 15:09 Message: Logged In: YES user_id=1926444 Originator: YES I'm attaching a third version of the patch, in which I fixed a few bugs. In response to Mahadev's comment, I could use the server id to initialize the Random object, but since I'm not passing it to the constructor, I've chosen to initialize the seed in a different way, using IP and time. The idea of using IP is to have different different seeds for different servers. As Mahadev pointed out before, we have to consider tha case in which one computer runs multiple ZK servers. For this special case, I add the current time to break ties with respect to challenge. In my understanding, the way we generate the challenge value doesn't really matter. I think it is more general in the way I've implemented as it doesn't depend upon any particular scheme of identification for the servers. It is important to note, though, that it is not always necessary to break ties as there are times in which the attempt to initiate a connection is one way. In these cases, if we try to break ties, we may end up with no connection. The logic to overcome such corner cases is fairly simple. if a server A doesn't have a connection to server B, and A has received a connection request from B, then A must accept it. To make it work despite the utilization of a tie-break mechanism, A changes its challenge to the smallest possible value the challenge can have. In this way, it makes sure that it loses the challenge. If I use a mechanism that doesn't exchange challenge values (such as using IP and port), then it is not possible to implement the trick I describe above. File Added: patch-le-tcp-v3.txt ---------------------------------------------------------------------- Comment By: Mahadev Konar (mahadevkonar) Date: 2008-02-04 21:21 Message: Logged In: YES user_id=1926680 Originator: NO a few comments: new Random(System.currentTimeMillis() + + localIP.hashCode()); what is the idea behind adding ipaddress hashcode and currentimemillis? why not just use the serverid or in that case why not just use the raw server id for connection resolving? something like a server with higher server id is the server that the other connects to... ---------------------------------------------------------------------- Comment By: fpj (fpj) Date: 2008-01-31 18:14 Message: Logged In: YES user_id=1926444 Originator: YES File Added: patch-le-tcp-v2.txt ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=1008546&aid=1881204&group_id=209147 |
From: SourceForge.net <no...@so...> - 2008-02-27 10:35:10
|
Patches item #1881204, was opened at 2008-01-28 16:28 Message generated for change (Comment added) made by fpj You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=1008546&aid=1881204&group_id=209147 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: server Group: None Status: Open Resolution: None Priority: 7 Private: No Submitted By: fpj (fpj) Assigned to: fpj (fpj) Summary: New leader election algorithm with TCP. Initial Comment: This is a feature request for a new leader election algorithm with TCP. ---------------------------------------------------------------------- >Comment By: fpj (fpj) Date: 2008-02-27 11:35 Message: Logged In: YES user_id=1926444 Originator: YES Got rid of the "property changes". -Flavio File Added: patch-le-tcp-v6d.txt ---------------------------------------------------------------------- Comment By: Mahadev Konar (mahadevkonar) Date: 2008-02-27 06:20 Message: Logged In: YES user_id=1926680 Originator: NO sorry to comment again but there are some svn changes that make some files executable... like Property changes on: /mnt/filer/Noronha/workspace/zookeeper/java/src/com/yahoo/zookeeper/server/quorum/Vote.java ___________________________________________________________________ Name: svn:executable + * could we remove those svn changes ? sorry abt that... should I mentioned earlier.. ---------------------------------------------------------------------- Comment By: fpj (fpj) Date: 2008-02-26 12:01 Message: Logged In: YES user_id=1926444 Originator: YES I have reformatted the code, and wrapped around lines that seemed too long in my screen. File Added: patch-le-tcp-v6c.txt ---------------------------------------------------------------------- Comment By: Mahadev Konar (mahadevkonar) Date: 2008-02-25 20:35 Message: Logged In: YES user_id=1926680 Originator: NO other than that: +1 for the patch. ---------------------------------------------------------------------- Comment By: Mahadev Konar (mahadevkonar) Date: 2008-02-25 20:35 Message: Logged In: YES user_id=1926680 Originator: NO only comment: the patch has tabs ... :) and also some of the lines are really long (we could wrap them arnd). ---------------------------------------------------------------------- Comment By: Mahadev Konar (mahadevkonar) Date: 2008-02-25 20:35 Message: Logged In: YES user_id=1926680 Originator: NO +1 make it happen ---------------------------------------------------------------------- Comment By: fpj (fpj) Date: 2008-02-22 17:12 Message: Logged In: YES user_id=1926444 Originator: YES File Added: patch-le-tcp-v6b.txt ---------------------------------------------------------------------- Comment By: fpj (fpj) Date: 2008-02-22 13:50 Message: Logged In: YES user_id=1926444 Originator: YES A few bugs fixed. These bugs were generating some race conditions, in particular for the cross-colo cases. Thanks, -Flavio File Added: patch-le-tcp-v6.txt ---------------------------------------------------------------------- Comment By: fpj (fpj) Date: 2008-02-19 17:07 Message: Logged In: YES user_id=1926444 Originator: YES In version 5b, I simply removed some output messages, which I had forgot to remove. Everything else should be the same as with version 5. -Flavio File Added: patch-le-tcp-v5b.txt ---------------------------------------------------------------------- Comment By: fpj (fpj) Date: 2008-02-19 16:59 Message: Logged In: YES user_id=1926444 Originator: YES Version 5 has an input parameter (electionAlg) to select which implementation of the algorithm to use. There are currently 4 flavors: the original UDP-based ZK leader election algorithm (0), UDP-based without authentication (1), UDP-based with authentication (2), TCP-based (3). I tested with three different clusters, and locally all four work fine. However, when adding a remote machine the zookeeper servers go wild when using any of the UDP-based versions. It seems that state gets corrupted, and the servers stop beahving correctly. The TCP-based version presented no problem when servers were in different clusters, though. I can't explain why state gets corrupted, but it seems to happen when there is at least one remote machine and I use a UDP-based implementation. -Flavio File Added: patch-le-tcp-v5.txt ---------------------------------------------------------------------- Comment By: Benjamin Reed (breed) Date: 2008-02-18 16:50 Message: Logged In: YES user_id=154690 Originator: NO +1 make it happen ---------------------------------------------------------------------- Comment By: fpj (fpj) Date: 2008-02-14 15:30 Message: Logged In: YES user_id=1926444 Originator: YES I have tested the new implementation using machines across the Atlantic, and I found a problem with the way we were opening a connection to the new leader on Follower.java. On Follower.followLeader(), there was a for loop that supposedly tried to connect to the new leader 3 times. The way it was implemented before didn't work because in the second iteration the Socket object was not valid anymore. This was causing the follower to try only once as in the second iteration it was throwing an excpetion that was not ConnectException, and it was leaving the for loop. I have modified the code to create a socket inside the loop for every iteration. In this way the Socket object is valid on every attempt. Because the follower was trying only once, it was happening that the follower would try to connect to the leader before the leader was ready to accept connections. To avoid this problem, we had this hack in the leader election implemention (the previous version was doing the same) to make the follower wait for a fixed amount of time, which we had set to 100ms. When I tried with high-latency connections, the value of 100ms was not sufficient, and I was observing runs in which the system was never making progress because the leader election would succeed, but the single follower of my experiment was not being able to connect to the leader as the leader was not ready yet and the bug in the code allowed the follower to try only once. With this fix, the follower may still experience unsuccessful attempts to connect to the leader, but given that it waits one second until the next try, it often succeeds in the second attempt. Moreover, with this fix, I've been able to get rid of the 100ms timer at the end of leader election, so it now terminates even faster. -Flavio File Added: patch-le-tcp-v4.txt ---------------------------------------------------------------------- Comment By: fpj (fpj) Date: 2008-02-05 15:09 Message: Logged In: YES user_id=1926444 Originator: YES I'm attaching a third version of the patch, in which I fixed a few bugs. In response to Mahadev's comment, I could use the server id to initialize the Random object, but since I'm not passing it to the constructor, I've chosen to initialize the seed in a different way, using IP and time. The idea of using IP is to have different different seeds for different servers. As Mahadev pointed out before, we have to consider tha case in which one computer runs multiple ZK servers. For this special case, I add the current time to break ties with respect to challenge. In my understanding, the way we generate the challenge value doesn't really matter. I think it is more general in the way I've implemented as it doesn't depend upon any particular scheme of identification for the servers. It is important to note, though, that it is not always necessary to break ties as there are times in which the attempt to initiate a connection is one way. In these cases, if we try to break ties, we may end up with no connection. The logic to overcome such corner cases is fairly simple. if a server A doesn't have a connection to server B, and A has received a connection request from B, then A must accept it. To make it work despite the utilization of a tie-break mechanism, A changes its challenge to the smallest possible value the challenge can have. In this way, it makes sure that it loses the challenge. If I use a mechanism that doesn't exchange challenge values (such as using IP and port), then it is not possible to implement the trick I describe above. File Added: patch-le-tcp-v3.txt ---------------------------------------------------------------------- Comment By: Mahadev Konar (mahadevkonar) Date: 2008-02-04 21:21 Message: Logged In: YES user_id=1926680 Originator: NO a few comments: new Random(System.currentTimeMillis() + + localIP.hashCode()); what is the idea behind adding ipaddress hashcode and currentimemillis? why not just use the serverid or in that case why not just use the raw server id for connection resolving? something like a server with higher server id is the server that the other connects to... ---------------------------------------------------------------------- Comment By: fpj (fpj) Date: 2008-01-31 18:14 Message: Logged In: YES user_id=1926444 Originator: YES File Added: patch-le-tcp-v2.txt ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=1008546&aid=1881204&group_id=209147 |
From: SourceForge.net <no...@so...> - 2008-02-27 05:19:55
|
Patches item #1881204, was opened at 2008-01-28 15:28 Message generated for change (Comment added) made by mahadevkonar You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=1008546&aid=1881204&group_id=209147 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: server Group: None Status: Open Resolution: None Priority: 7 Private: No Submitted By: fpj (fpj) Assigned to: fpj (fpj) Summary: New leader election algorithm with TCP. Initial Comment: This is a feature request for a new leader election algorithm with TCP. ---------------------------------------------------------------------- >Comment By: Mahadev Konar (mahadevkonar) Date: 2008-02-27 05:20 Message: Logged In: YES user_id=1926680 Originator: NO sorry to comment again but there are some svn changes that make some files executable... like Property changes on: /mnt/filer/Noronha/workspace/zookeeper/java/src/com/yahoo/zookeeper/server/quorum/Vote.java ___________________________________________________________________ Name: svn:executable + * could we remove those svn changes ? sorry abt that... should I mentioned earlier.. ---------------------------------------------------------------------- Comment By: fpj (fpj) Date: 2008-02-26 11:01 Message: Logged In: YES user_id=1926444 Originator: YES I have reformatted the code, and wrapped around lines that seemed too long in my screen. File Added: patch-le-tcp-v6c.txt ---------------------------------------------------------------------- Comment By: Mahadev Konar (mahadevkonar) Date: 2008-02-25 19:35 Message: Logged In: YES user_id=1926680 Originator: NO other than that: +1 for the patch. ---------------------------------------------------------------------- Comment By: Mahadev Konar (mahadevkonar) Date: 2008-02-25 19:35 Message: Logged In: YES user_id=1926680 Originator: NO only comment: the patch has tabs ... :) and also some of the lines are really long (we could wrap them arnd). ---------------------------------------------------------------------- Comment By: Mahadev Konar (mahadevkonar) Date: 2008-02-25 19:35 Message: Logged In: YES user_id=1926680 Originator: NO +1 make it happen ---------------------------------------------------------------------- Comment By: fpj (fpj) Date: 2008-02-22 16:12 Message: Logged In: YES user_id=1926444 Originator: YES File Added: patch-le-tcp-v6b.txt ---------------------------------------------------------------------- Comment By: fpj (fpj) Date: 2008-02-22 12:50 Message: Logged In: YES user_id=1926444 Originator: YES A few bugs fixed. These bugs were generating some race conditions, in particular for the cross-colo cases. Thanks, -Flavio File Added: patch-le-tcp-v6.txt ---------------------------------------------------------------------- Comment By: fpj (fpj) Date: 2008-02-19 16:07 Message: Logged In: YES user_id=1926444 Originator: YES In version 5b, I simply removed some output messages, which I had forgot to remove. Everything else should be the same as with version 5. -Flavio File Added: patch-le-tcp-v5b.txt ---------------------------------------------------------------------- Comment By: fpj (fpj) Date: 2008-02-19 15:59 Message: Logged In: YES user_id=1926444 Originator: YES Version 5 has an input parameter (electionAlg) to select which implementation of the algorithm to use. There are currently 4 flavors: the original UDP-based ZK leader election algorithm (0), UDP-based without authentication (1), UDP-based with authentication (2), TCP-based (3). I tested with three different clusters, and locally all four work fine. However, when adding a remote machine the zookeeper servers go wild when using any of the UDP-based versions. It seems that state gets corrupted, and the servers stop beahving correctly. The TCP-based version presented no problem when servers were in different clusters, though. I can't explain why state gets corrupted, but it seems to happen when there is at least one remote machine and I use a UDP-based implementation. -Flavio File Added: patch-le-tcp-v5.txt ---------------------------------------------------------------------- Comment By: Benjamin Reed (breed) Date: 2008-02-18 15:50 Message: Logged In: YES user_id=154690 Originator: NO +1 make it happen ---------------------------------------------------------------------- Comment By: fpj (fpj) Date: 2008-02-14 14:30 Message: Logged In: YES user_id=1926444 Originator: YES I have tested the new implementation using machines across the Atlantic, and I found a problem with the way we were opening a connection to the new leader on Follower.java. On Follower.followLeader(), there was a for loop that supposedly tried to connect to the new leader 3 times. The way it was implemented before didn't work because in the second iteration the Socket object was not valid anymore. This was causing the follower to try only once as in the second iteration it was throwing an excpetion that was not ConnectException, and it was leaving the for loop. I have modified the code to create a socket inside the loop for every iteration. In this way the Socket object is valid on every attempt. Because the follower was trying only once, it was happening that the follower would try to connect to the leader before the leader was ready to accept connections. To avoid this problem, we had this hack in the leader election implemention (the previous version was doing the same) to make the follower wait for a fixed amount of time, which we had set to 100ms. When I tried with high-latency connections, the value of 100ms was not sufficient, and I was observing runs in which the system was never making progress because the leader election would succeed, but the single follower of my experiment was not being able to connect to the leader as the leader was not ready yet and the bug in the code allowed the follower to try only once. With this fix, the follower may still experience unsuccessful attempts to connect to the leader, but given that it waits one second until the next try, it often succeeds in the second attempt. Moreover, with this fix, I've been able to get rid of the 100ms timer at the end of leader election, so it now terminates even faster. -Flavio File Added: patch-le-tcp-v4.txt ---------------------------------------------------------------------- Comment By: fpj (fpj) Date: 2008-02-05 14:09 Message: Logged In: YES user_id=1926444 Originator: YES I'm attaching a third version of the patch, in which I fixed a few bugs. In response to Mahadev's comment, I could use the server id to initialize the Random object, but since I'm not passing it to the constructor, I've chosen to initialize the seed in a different way, using IP and time. The idea of using IP is to have different different seeds for different servers. As Mahadev pointed out before, we have to consider tha case in which one computer runs multiple ZK servers. For this special case, I add the current time to break ties with respect to challenge. In my understanding, the way we generate the challenge value doesn't really matter. I think it is more general in the way I've implemented as it doesn't depend upon any particular scheme of identification for the servers. It is important to note, though, that it is not always necessary to break ties as there are times in which the attempt to initiate a connection is one way. In these cases, if we try to break ties, we may end up with no connection. The logic to overcome such corner cases is fairly simple. if a server A doesn't have a connection to server B, and A has received a connection request from B, then A must accept it. To make it work despite the utilization of a tie-break mechanism, A changes its challenge to the smallest possible value the challenge can have. In this way, it makes sure that it loses the challenge. If I use a mechanism that doesn't exchange challenge values (such as using IP and port), then it is not possible to implement the trick I describe above. File Added: patch-le-tcp-v3.txt ---------------------------------------------------------------------- Comment By: Mahadev Konar (mahadevkonar) Date: 2008-02-04 20:21 Message: Logged In: YES user_id=1926680 Originator: NO a few comments: new Random(System.currentTimeMillis() + + localIP.hashCode()); what is the idea behind adding ipaddress hashcode and currentimemillis? why not just use the serverid or in that case why not just use the raw server id for connection resolving? something like a server with higher server id is the server that the other connects to... ---------------------------------------------------------------------- Comment By: fpj (fpj) Date: 2008-01-31 17:14 Message: Logged In: YES user_id=1926444 Originator: YES File Added: patch-le-tcp-v2.txt ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=1008546&aid=1881204&group_id=209147 |
From: SourceForge.net <no...@so...> - 2008-02-26 11:01:15
|
Patches item #1881204, was opened at 2008-01-28 16:28 Message generated for change (Comment added) made by fpj You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=1008546&aid=1881204&group_id=209147 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: server Group: None Status: Open Resolution: None Priority: 7 Private: No Submitted By: fpj (fpj) Assigned to: fpj (fpj) Summary: New leader election algorithm with TCP. Initial Comment: This is a feature request for a new leader election algorithm with TCP. ---------------------------------------------------------------------- >Comment By: fpj (fpj) Date: 2008-02-26 12:01 Message: Logged In: YES user_id=1926444 Originator: YES I have reformatted the code, and wrapped around lines that seemed too long in my screen. File Added: patch-le-tcp-v6c.txt ---------------------------------------------------------------------- Comment By: Mahadev Konar (mahadevkonar) Date: 2008-02-25 20:35 Message: Logged In: YES user_id=1926680 Originator: NO other than that: +1 for the patch. ---------------------------------------------------------------------- Comment By: Mahadev Konar (mahadevkonar) Date: 2008-02-25 20:35 Message: Logged In: YES user_id=1926680 Originator: NO only comment: the patch has tabs ... :) and also some of the lines are really long (we could wrap them arnd). ---------------------------------------------------------------------- Comment By: Mahadev Konar (mahadevkonar) Date: 2008-02-25 20:35 Message: Logged In: YES user_id=1926680 Originator: NO +1 make it happen ---------------------------------------------------------------------- Comment By: fpj (fpj) Date: 2008-02-22 17:12 Message: Logged In: YES user_id=1926444 Originator: YES File Added: patch-le-tcp-v6b.txt ---------------------------------------------------------------------- Comment By: fpj (fpj) Date: 2008-02-22 13:50 Message: Logged In: YES user_id=1926444 Originator: YES A few bugs fixed. These bugs were generating some race conditions, in particular for the cross-colo cases. Thanks, -Flavio File Added: patch-le-tcp-v6.txt ---------------------------------------------------------------------- Comment By: fpj (fpj) Date: 2008-02-19 17:07 Message: Logged In: YES user_id=1926444 Originator: YES In version 5b, I simply removed some output messages, which I had forgot to remove. Everything else should be the same as with version 5. -Flavio File Added: patch-le-tcp-v5b.txt ---------------------------------------------------------------------- Comment By: fpj (fpj) Date: 2008-02-19 16:59 Message: Logged In: YES user_id=1926444 Originator: YES Version 5 has an input parameter (electionAlg) to select which implementation of the algorithm to use. There are currently 4 flavors: the original UDP-based ZK leader election algorithm (0), UDP-based without authentication (1), UDP-based with authentication (2), TCP-based (3). I tested with three different clusters, and locally all four work fine. However, when adding a remote machine the zookeeper servers go wild when using any of the UDP-based versions. It seems that state gets corrupted, and the servers stop beahving correctly. The TCP-based version presented no problem when servers were in different clusters, though. I can't explain why state gets corrupted, but it seems to happen when there is at least one remote machine and I use a UDP-based implementation. -Flavio File Added: patch-le-tcp-v5.txt ---------------------------------------------------------------------- Comment By: Benjamin Reed (breed) Date: 2008-02-18 16:50 Message: Logged In: YES user_id=154690 Originator: NO +1 make it happen ---------------------------------------------------------------------- Comment By: fpj (fpj) Date: 2008-02-14 15:30 Message: Logged In: YES user_id=1926444 Originator: YES I have tested the new implementation using machines across the Atlantic, and I found a problem with the way we were opening a connection to the new leader on Follower.java. On Follower.followLeader(), there was a for loop that supposedly tried to connect to the new leader 3 times. The way it was implemented before didn't work because in the second iteration the Socket object was not valid anymore. This was causing the follower to try only once as in the second iteration it was throwing an excpetion that was not ConnectException, and it was leaving the for loop. I have modified the code to create a socket inside the loop for every iteration. In this way the Socket object is valid on every attempt. Because the follower was trying only once, it was happening that the follower would try to connect to the leader before the leader was ready to accept connections. To avoid this problem, we had this hack in the leader election implemention (the previous version was doing the same) to make the follower wait for a fixed amount of time, which we had set to 100ms. When I tried with high-latency connections, the value of 100ms was not sufficient, and I was observing runs in which the system was never making progress because the leader election would succeed, but the single follower of my experiment was not being able to connect to the leader as the leader was not ready yet and the bug in the code allowed the follower to try only once. With this fix, the follower may still experience unsuccessful attempts to connect to the leader, but given that it waits one second until the next try, it often succeeds in the second attempt. Moreover, with this fix, I've been able to get rid of the 100ms timer at the end of leader election, so it now terminates even faster. -Flavio File Added: patch-le-tcp-v4.txt ---------------------------------------------------------------------- Comment By: fpj (fpj) Date: 2008-02-05 15:09 Message: Logged In: YES user_id=1926444 Originator: YES I'm attaching a third version of the patch, in which I fixed a few bugs. In response to Mahadev's comment, I could use the server id to initialize the Random object, but since I'm not passing it to the constructor, I've chosen to initialize the seed in a different way, using IP and time. The idea of using IP is to have different different seeds for different servers. As Mahadev pointed out before, we have to consider tha case in which one computer runs multiple ZK servers. For this special case, I add the current time to break ties with respect to challenge. In my understanding, the way we generate the challenge value doesn't really matter. I think it is more general in the way I've implemented as it doesn't depend upon any particular scheme of identification for the servers. It is important to note, though, that it is not always necessary to break ties as there are times in which the attempt to initiate a connection is one way. In these cases, if we try to break ties, we may end up with no connection. The logic to overcome such corner cases is fairly simple. if a server A doesn't have a connection to server B, and A has received a connection request from B, then A must accept it. To make it work despite the utilization of a tie-break mechanism, A changes its challenge to the smallest possible value the challenge can have. In this way, it makes sure that it loses the challenge. If I use a mechanism that doesn't exchange challenge values (such as using IP and port), then it is not possible to implement the trick I describe above. File Added: patch-le-tcp-v3.txt ---------------------------------------------------------------------- Comment By: Mahadev Konar (mahadevkonar) Date: 2008-02-04 21:21 Message: Logged In: YES user_id=1926680 Originator: NO a few comments: new Random(System.currentTimeMillis() + + localIP.hashCode()); what is the idea behind adding ipaddress hashcode and currentimemillis? why not just use the serverid or in that case why not just use the raw server id for connection resolving? something like a server with higher server id is the server that the other connects to... ---------------------------------------------------------------------- Comment By: fpj (fpj) Date: 2008-01-31 18:14 Message: Logged In: YES user_id=1926444 Originator: YES File Added: patch-le-tcp-v2.txt ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=1008546&aid=1881204&group_id=209147 |
From: SourceForge.net <no...@so...> - 2008-02-25 19:35:35
|
Patches item #1881204, was opened at 2008-01-28 15:28 Message generated for change (Comment added) made by mahadevkonar You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=1008546&aid=1881204&group_id=209147 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: server Group: None Status: Open Resolution: None Priority: 7 Private: No Submitted By: fpj (fpj) Assigned to: fpj (fpj) Summary: New leader election algorithm with TCP. Initial Comment: This is a feature request for a new leader election algorithm with TCP. ---------------------------------------------------------------------- >Comment By: Mahadev Konar (mahadevkonar) Date: 2008-02-25 19:35 Message: Logged In: YES user_id=1926680 Originator: NO other than that: +1 for the patch. ---------------------------------------------------------------------- Comment By: Mahadev Konar (mahadevkonar) Date: 2008-02-25 19:35 Message: Logged In: YES user_id=1926680 Originator: NO only comment: the patch has tabs ... :) and also some of the lines are really long (we could wrap them arnd). ---------------------------------------------------------------------- Comment By: Mahadev Konar (mahadevkonar) Date: 2008-02-25 19:35 Message: Logged In: YES user_id=1926680 Originator: NO +1 make it happen ---------------------------------------------------------------------- Comment By: fpj (fpj) Date: 2008-02-22 16:12 Message: Logged In: YES user_id=1926444 Originator: YES File Added: patch-le-tcp-v6b.txt ---------------------------------------------------------------------- Comment By: fpj (fpj) Date: 2008-02-22 12:50 Message: Logged In: YES user_id=1926444 Originator: YES A few bugs fixed. These bugs were generating some race conditions, in particular for the cross-colo cases. Thanks, -Flavio File Added: patch-le-tcp-v6.txt ---------------------------------------------------------------------- Comment By: fpj (fpj) Date: 2008-02-19 16:07 Message: Logged In: YES user_id=1926444 Originator: YES In version 5b, I simply removed some output messages, which I had forgot to remove. Everything else should be the same as with version 5. -Flavio File Added: patch-le-tcp-v5b.txt ---------------------------------------------------------------------- Comment By: fpj (fpj) Date: 2008-02-19 15:59 Message: Logged In: YES user_id=1926444 Originator: YES Version 5 has an input parameter (electionAlg) to select which implementation of the algorithm to use. There are currently 4 flavors: the original UDP-based ZK leader election algorithm (0), UDP-based without authentication (1), UDP-based with authentication (2), TCP-based (3). I tested with three different clusters, and locally all four work fine. However, when adding a remote machine the zookeeper servers go wild when using any of the UDP-based versions. It seems that state gets corrupted, and the servers stop beahving correctly. The TCP-based version presented no problem when servers were in different clusters, though. I can't explain why state gets corrupted, but it seems to happen when there is at least one remote machine and I use a UDP-based implementation. -Flavio File Added: patch-le-tcp-v5.txt ---------------------------------------------------------------------- Comment By: Benjamin Reed (breed) Date: 2008-02-18 15:50 Message: Logged In: YES user_id=154690 Originator: NO +1 make it happen ---------------------------------------------------------------------- Comment By: fpj (fpj) Date: 2008-02-14 14:30 Message: Logged In: YES user_id=1926444 Originator: YES I have tested the new implementation using machines across the Atlantic, and I found a problem with the way we were opening a connection to the new leader on Follower.java. On Follower.followLeader(), there was a for loop that supposedly tried to connect to the new leader 3 times. The way it was implemented before didn't work because in the second iteration the Socket object was not valid anymore. This was causing the follower to try only once as in the second iteration it was throwing an excpetion that was not ConnectException, and it was leaving the for loop. I have modified the code to create a socket inside the loop for every iteration. In this way the Socket object is valid on every attempt. Because the follower was trying only once, it was happening that the follower would try to connect to the leader before the leader was ready to accept connections. To avoid this problem, we had this hack in the leader election implemention (the previous version was doing the same) to make the follower wait for a fixed amount of time, which we had set to 100ms. When I tried with high-latency connections, the value of 100ms was not sufficient, and I was observing runs in which the system was never making progress because the leader election would succeed, but the single follower of my experiment was not being able to connect to the leader as the leader was not ready yet and the bug in the code allowed the follower to try only once. With this fix, the follower may still experience unsuccessful attempts to connect to the leader, but given that it waits one second until the next try, it often succeeds in the second attempt. Moreover, with this fix, I've been able to get rid of the 100ms timer at the end of leader election, so it now terminates even faster. -Flavio File Added: patch-le-tcp-v4.txt ---------------------------------------------------------------------- Comment By: fpj (fpj) Date: 2008-02-05 14:09 Message: Logged In: YES user_id=1926444 Originator: YES I'm attaching a third version of the patch, in which I fixed a few bugs. In response to Mahadev's comment, I could use the server id to initialize the Random object, but since I'm not passing it to the constructor, I've chosen to initialize the seed in a different way, using IP and time. The idea of using IP is to have different different seeds for different servers. As Mahadev pointed out before, we have to consider tha case in which one computer runs multiple ZK servers. For this special case, I add the current time to break ties with respect to challenge. In my understanding, the way we generate the challenge value doesn't really matter. I think it is more general in the way I've implemented as it doesn't depend upon any particular scheme of identification for the servers. It is important to note, though, that it is not always necessary to break ties as there are times in which the attempt to initiate a connection is one way. In these cases, if we try to break ties, we may end up with no connection. The logic to overcome such corner cases is fairly simple. if a server A doesn't have a connection to server B, and A has received a connection request from B, then A must accept it. To make it work despite the utilization of a tie-break mechanism, A changes its challenge to the smallest possible value the challenge can have. In this way, it makes sure that it loses the challenge. If I use a mechanism that doesn't exchange challenge values (such as using IP and port), then it is not possible to implement the trick I describe above. File Added: patch-le-tcp-v3.txt ---------------------------------------------------------------------- Comment By: Mahadev Konar (mahadevkonar) Date: 2008-02-04 20:21 Message: Logged In: YES user_id=1926680 Originator: NO a few comments: new Random(System.currentTimeMillis() + + localIP.hashCode()); what is the idea behind adding ipaddress hashcode and currentimemillis? why not just use the serverid or in that case why not just use the raw server id for connection resolving? something like a server with higher server id is the server that the other connects to... ---------------------------------------------------------------------- Comment By: fpj (fpj) Date: 2008-01-31 17:14 Message: Logged In: YES user_id=1926444 Originator: YES File Added: patch-le-tcp-v2.txt ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=1008546&aid=1881204&group_id=209147 |
From: SourceForge.net <no...@so...> - 2008-02-25 19:35:00
|
Patches item #1881204, was opened at 2008-01-28 15:28 Message generated for change (Comment added) made by mahadevkonar You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=1008546&aid=1881204&group_id=209147 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: server Group: None Status: Open Resolution: None Priority: 7 Private: No Submitted By: fpj (fpj) Assigned to: fpj (fpj) Summary: New leader election algorithm with TCP. Initial Comment: This is a feature request for a new leader election algorithm with TCP. ---------------------------------------------------------------------- >Comment By: Mahadev Konar (mahadevkonar) Date: 2008-02-25 19:35 Message: Logged In: YES user_id=1926680 Originator: NO only comment: the patch has tabs ... :) and also some of the lines are really long (we could wrap them arnd). ---------------------------------------------------------------------- Comment By: Mahadev Konar (mahadevkonar) Date: 2008-02-25 19:35 Message: Logged In: YES user_id=1926680 Originator: NO +1 make it happen ---------------------------------------------------------------------- Comment By: fpj (fpj) Date: 2008-02-22 16:12 Message: Logged In: YES user_id=1926444 Originator: YES File Added: patch-le-tcp-v6b.txt ---------------------------------------------------------------------- Comment By: fpj (fpj) Date: 2008-02-22 12:50 Message: Logged In: YES user_id=1926444 Originator: YES A few bugs fixed. These bugs were generating some race conditions, in particular for the cross-colo cases. Thanks, -Flavio File Added: patch-le-tcp-v6.txt ---------------------------------------------------------------------- Comment By: fpj (fpj) Date: 2008-02-19 16:07 Message: Logged In: YES user_id=1926444 Originator: YES In version 5b, I simply removed some output messages, which I had forgot to remove. Everything else should be the same as with version 5. -Flavio File Added: patch-le-tcp-v5b.txt ---------------------------------------------------------------------- Comment By: fpj (fpj) Date: 2008-02-19 15:59 Message: Logged In: YES user_id=1926444 Originator: YES Version 5 has an input parameter (electionAlg) to select which implementation of the algorithm to use. There are currently 4 flavors: the original UDP-based ZK leader election algorithm (0), UDP-based without authentication (1), UDP-based with authentication (2), TCP-based (3). I tested with three different clusters, and locally all four work fine. However, when adding a remote machine the zookeeper servers go wild when using any of the UDP-based versions. It seems that state gets corrupted, and the servers stop beahving correctly. The TCP-based version presented no problem when servers were in different clusters, though. I can't explain why state gets corrupted, but it seems to happen when there is at least one remote machine and I use a UDP-based implementation. -Flavio File Added: patch-le-tcp-v5.txt ---------------------------------------------------------------------- Comment By: Benjamin Reed (breed) Date: 2008-02-18 15:50 Message: Logged In: YES user_id=154690 Originator: NO +1 make it happen ---------------------------------------------------------------------- Comment By: fpj (fpj) Date: 2008-02-14 14:30 Message: Logged In: YES user_id=1926444 Originator: YES I have tested the new implementation using machines across the Atlantic, and I found a problem with the way we were opening a connection to the new leader on Follower.java. On Follower.followLeader(), there was a for loop that supposedly tried to connect to the new leader 3 times. The way it was implemented before didn't work because in the second iteration the Socket object was not valid anymore. This was causing the follower to try only once as in the second iteration it was throwing an excpetion that was not ConnectException, and it was leaving the for loop. I have modified the code to create a socket inside the loop for every iteration. In this way the Socket object is valid on every attempt. Because the follower was trying only once, it was happening that the follower would try to connect to the leader before the leader was ready to accept connections. To avoid this problem, we had this hack in the leader election implemention (the previous version was doing the same) to make the follower wait for a fixed amount of time, which we had set to 100ms. When I tried with high-latency connections, the value of 100ms was not sufficient, and I was observing runs in which the system was never making progress because the leader election would succeed, but the single follower of my experiment was not being able to connect to the leader as the leader was not ready yet and the bug in the code allowed the follower to try only once. With this fix, the follower may still experience unsuccessful attempts to connect to the leader, but given that it waits one second until the next try, it often succeeds in the second attempt. Moreover, with this fix, I've been able to get rid of the 100ms timer at the end of leader election, so it now terminates even faster. -Flavio File Added: patch-le-tcp-v4.txt ---------------------------------------------------------------------- Comment By: fpj (fpj) Date: 2008-02-05 14:09 Message: Logged In: YES user_id=1926444 Originator: YES I'm attaching a third version of the patch, in which I fixed a few bugs. In response to Mahadev's comment, I could use the server id to initialize the Random object, but since I'm not passing it to the constructor, I've chosen to initialize the seed in a different way, using IP and time. The idea of using IP is to have different different seeds for different servers. As Mahadev pointed out before, we have to consider tha case in which one computer runs multiple ZK servers. For this special case, I add the current time to break ties with respect to challenge. In my understanding, the way we generate the challenge value doesn't really matter. I think it is more general in the way I've implemented as it doesn't depend upon any particular scheme of identification for the servers. It is important to note, though, that it is not always necessary to break ties as there are times in which the attempt to initiate a connection is one way. In these cases, if we try to break ties, we may end up with no connection. The logic to overcome such corner cases is fairly simple. if a server A doesn't have a connection to server B, and A has received a connection request from B, then A must accept it. To make it work despite the utilization of a tie-break mechanism, A changes its challenge to the smallest possible value the challenge can have. In this way, it makes sure that it loses the challenge. If I use a mechanism that doesn't exchange challenge values (such as using IP and port), then it is not possible to implement the trick I describe above. File Added: patch-le-tcp-v3.txt ---------------------------------------------------------------------- Comment By: Mahadev Konar (mahadevkonar) Date: 2008-02-04 20:21 Message: Logged In: YES user_id=1926680 Originator: NO a few comments: new Random(System.currentTimeMillis() + + localIP.hashCode()); what is the idea behind adding ipaddress hashcode and currentimemillis? why not just use the serverid or in that case why not just use the raw server id for connection resolving? something like a server with higher server id is the server that the other connects to... ---------------------------------------------------------------------- Comment By: fpj (fpj) Date: 2008-01-31 17:14 Message: Logged In: YES user_id=1926444 Originator: YES File Added: patch-le-tcp-v2.txt ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=1008546&aid=1881204&group_id=209147 |
From: SourceForge.net <no...@so...> - 2008-02-24 01:12:29
|
Patches item #1898316, was opened at 2008-02-20 18:14 Message generated for change (Comment added) made by akornev You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=1008546&aid=1898316&group_id=209147 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: server Group: None >Status: Closed >Resolution: Accepted Priority: 5 Private: No Submitted By: Andrew Kornev (akornev) Assigned to: Nobody/Anonymous (nobody) Summary: Incremental update Initial Comment: This patch includes a number of code changes that aim to make it possible to add new functionality (for example, JMX support) without introducing unnecessary coupling. Also, I started adding getters to various server classes (for now it's only DataTree and WatchManager) for use with JMX MBeans. Overall, the patch only changes the way a few things get initialized and it doesn't affect the main flow. This is an incremental update. I intend to continue refactoring the server code as needed for JMX enablement. ---------------------------------------------------------------------- >Comment By: Andrew Kornev (akornev) Date: 2008-02-23 17:12 Message: Logged In: YES user_id=1926652 Originator: YES applied revision 107 ---------------------------------------------------------------------- Comment By: Benjamin Reed (breed) Date: 2008-02-22 17:26 Message: Logged In: YES user_id=154690 Originator: NO +1 make it happen ---------------------------------------------------------------------- Comment By: Andrew Kornev (akornev) Date: 2008-02-21 14:08 Message: Logged In: YES user_id=1926652 Originator: YES Please note that the ZooKeeperServer doesn't get *started* before the leader and the followers got a chance to sync up. It's just an instance of the server that gets created early, but the ZooKeeperServer.load() and ZooKeeperServer.start() methods are called only after the initial syncing has completed (just the way it used to be). ---------------------------------------------------------------------- Comment By: Benjamin Reed (breed) Date: 2008-02-21 13:36 Message: Logged In: YES user_id=154690 Originator: NO One affect on the main flow that I see is that the ZooKeeperServer gets started before the leaders and peers have synced up. This could cause bad data to be sent to clients. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=1008546&aid=1898316&group_id=209147 |
From: SourceForge.net <no...@so...> - 2008-02-23 01:26:02
|
Patches item #1898316, was opened at 2008-02-20 18:14 Message generated for change (Comment added) made by breed You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=1008546&aid=1898316&group_id=209147 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: server Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Andrew Kornev (akornev) Assigned to: Nobody/Anonymous (nobody) Summary: Incremental update Initial Comment: This patch includes a number of code changes that aim to make it possible to add new functionality (for example, JMX support) without introducing unnecessary coupling. Also, I started adding getters to various server classes (for now it's only DataTree and WatchManager) for use with JMX MBeans. Overall, the patch only changes the way a few things get initialized and it doesn't affect the main flow. This is an incremental update. I intend to continue refactoring the server code as needed for JMX enablement. ---------------------------------------------------------------------- Comment By: Benjamin Reed (breed) Date: 2008-02-22 17:26 Message: Logged In: YES user_id=154690 Originator: NO +1 make it happen ---------------------------------------------------------------------- Comment By: Andrew Kornev (akornev) Date: 2008-02-21 14:08 Message: Logged In: YES user_id=1926652 Originator: YES Please note that the ZooKeeperServer doesn't get *started* before the leader and the followers got a chance to sync up. It's just an instance of the server that gets created early, but the ZooKeeperServer.load() and ZooKeeperServer.start() methods are called only after the initial syncing has completed (just the way it used to be). ---------------------------------------------------------------------- Comment By: Benjamin Reed (breed) Date: 2008-02-21 13:36 Message: Logged In: YES user_id=154690 Originator: NO One affect on the main flow that I see is that the ZooKeeperServer gets started before the leaders and peers have synced up. This could cause bad data to be sent to clients. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=1008546&aid=1898316&group_id=209147 |
From: SourceForge.net <no...@so...> - 2008-02-22 16:13:09
|
Patches item #1881204, was opened at 2008-01-28 16:28 Message generated for change (Comment added) made by fpj You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=1008546&aid=1881204&group_id=209147 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: server Group: None Status: Open Resolution: None Priority: 7 Private: No Submitted By: fpj (fpj) Assigned to: fpj (fpj) Summary: New leader election algorithm with TCP. Initial Comment: This is a feature request for a new leader election algorithm with TCP. ---------------------------------------------------------------------- >Comment By: fpj (fpj) Date: 2008-02-22 17:12 Message: Logged In: YES user_id=1926444 Originator: YES File Added: patch-le-tcp-v6b.txt ---------------------------------------------------------------------- Comment By: fpj (fpj) Date: 2008-02-22 13:50 Message: Logged In: YES user_id=1926444 Originator: YES A few bugs fixed. These bugs were generating some race conditions, in particular for the cross-colo cases. Thanks, -Flavio File Added: patch-le-tcp-v6.txt ---------------------------------------------------------------------- Comment By: fpj (fpj) Date: 2008-02-19 17:07 Message: Logged In: YES user_id=1926444 Originator: YES In version 5b, I simply removed some output messages, which I had forgot to remove. Everything else should be the same as with version 5. -Flavio File Added: patch-le-tcp-v5b.txt ---------------------------------------------------------------------- Comment By: fpj (fpj) Date: 2008-02-19 16:59 Message: Logged In: YES user_id=1926444 Originator: YES Version 5 has an input parameter (electionAlg) to select which implementation of the algorithm to use. There are currently 4 flavors: the original UDP-based ZK leader election algorithm (0), UDP-based without authentication (1), UDP-based with authentication (2), TCP-based (3). I tested with three different clusters, and locally all four work fine. However, when adding a remote machine the zookeeper servers go wild when using any of the UDP-based versions. It seems that state gets corrupted, and the servers stop beahving correctly. The TCP-based version presented no problem when servers were in different clusters, though. I can't explain why state gets corrupted, but it seems to happen when there is at least one remote machine and I use a UDP-based implementation. -Flavio File Added: patch-le-tcp-v5.txt ---------------------------------------------------------------------- Comment By: Benjamin Reed (breed) Date: 2008-02-18 16:50 Message: Logged In: YES user_id=154690 Originator: NO +1 make it happen ---------------------------------------------------------------------- Comment By: fpj (fpj) Date: 2008-02-14 15:30 Message: Logged In: YES user_id=1926444 Originator: YES I have tested the new implementation using machines across the Atlantic, and I found a problem with the way we were opening a connection to the new leader on Follower.java. On Follower.followLeader(), there was a for loop that supposedly tried to connect to the new leader 3 times. The way it was implemented before didn't work because in the second iteration the Socket object was not valid anymore. This was causing the follower to try only once as in the second iteration it was throwing an excpetion that was not ConnectException, and it was leaving the for loop. I have modified the code to create a socket inside the loop for every iteration. In this way the Socket object is valid on every attempt. Because the follower was trying only once, it was happening that the follower would try to connect to the leader before the leader was ready to accept connections. To avoid this problem, we had this hack in the leader election implemention (the previous version was doing the same) to make the follower wait for a fixed amount of time, which we had set to 100ms. When I tried with high-latency connections, the value of 100ms was not sufficient, and I was observing runs in which the system was never making progress because the leader election would succeed, but the single follower of my experiment was not being able to connect to the leader as the leader was not ready yet and the bug in the code allowed the follower to try only once. With this fix, the follower may still experience unsuccessful attempts to connect to the leader, but given that it waits one second until the next try, it often succeeds in the second attempt. Moreover, with this fix, I've been able to get rid of the 100ms timer at the end of leader election, so it now terminates even faster. -Flavio File Added: patch-le-tcp-v4.txt ---------------------------------------------------------------------- Comment By: fpj (fpj) Date: 2008-02-05 15:09 Message: Logged In: YES user_id=1926444 Originator: YES I'm attaching a third version of the patch, in which I fixed a few bugs. In response to Mahadev's comment, I could use the server id to initialize the Random object, but since I'm not passing it to the constructor, I've chosen to initialize the seed in a different way, using IP and time. The idea of using IP is to have different different seeds for different servers. As Mahadev pointed out before, we have to consider tha case in which one computer runs multiple ZK servers. For this special case, I add the current time to break ties with respect to challenge. In my understanding, the way we generate the challenge value doesn't really matter. I think it is more general in the way I've implemented as it doesn't depend upon any particular scheme of identification for the servers. It is important to note, though, that it is not always necessary to break ties as there are times in which the attempt to initiate a connection is one way. In these cases, if we try to break ties, we may end up with no connection. The logic to overcome such corner cases is fairly simple. if a server A doesn't have a connection to server B, and A has received a connection request from B, then A must accept it. To make it work despite the utilization of a tie-break mechanism, A changes its challenge to the smallest possible value the challenge can have. In this way, it makes sure that it loses the challenge. If I use a mechanism that doesn't exchange challenge values (such as using IP and port), then it is not possible to implement the trick I describe above. File Added: patch-le-tcp-v3.txt ---------------------------------------------------------------------- Comment By: Mahadev Konar (mahadevkonar) Date: 2008-02-04 21:21 Message: Logged In: YES user_id=1926680 Originator: NO a few comments: new Random(System.currentTimeMillis() + + localIP.hashCode()); what is the idea behind adding ipaddress hashcode and currentimemillis? why not just use the serverid or in that case why not just use the raw server id for connection resolving? something like a server with higher server id is the server that the other connects to... ---------------------------------------------------------------------- Comment By: fpj (fpj) Date: 2008-01-31 18:14 Message: Logged In: YES user_id=1926444 Originator: YES File Added: patch-le-tcp-v2.txt ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=1008546&aid=1881204&group_id=209147 |
From: SourceForge.net <no...@so...> - 2008-02-22 12:50:44
|
Patches item #1881204, was opened at 2008-01-28 16:28 Message generated for change (Comment added) made by fpj You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=1008546&aid=1881204&group_id=209147 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: server Group: None Status: Open Resolution: None Priority: 7 Private: No Submitted By: fpj (fpj) Assigned to: fpj (fpj) Summary: New leader election algorithm with TCP. Initial Comment: This is a feature request for a new leader election algorithm with TCP. ---------------------------------------------------------------------- >Comment By: fpj (fpj) Date: 2008-02-22 13:50 Message: Logged In: YES user_id=1926444 Originator: YES A few bugs fixed. These bugs were generating some race conditions, in particular for the cross-colo cases. Thanks, -Flavio File Added: patch-le-tcp-v6.txt ---------------------------------------------------------------------- Comment By: fpj (fpj) Date: 2008-02-19 17:07 Message: Logged In: YES user_id=1926444 Originator: YES In version 5b, I simply removed some output messages, which I had forgot to remove. Everything else should be the same as with version 5. -Flavio File Added: patch-le-tcp-v5b.txt ---------------------------------------------------------------------- Comment By: fpj (fpj) Date: 2008-02-19 16:59 Message: Logged In: YES user_id=1926444 Originator: YES Version 5 has an input parameter (electionAlg) to select which implementation of the algorithm to use. There are currently 4 flavors: the original UDP-based ZK leader election algorithm (0), UDP-based without authentication (1), UDP-based with authentication (2), TCP-based (3). I tested with three different clusters, and locally all four work fine. However, when adding a remote machine the zookeeper servers go wild when using any of the UDP-based versions. It seems that state gets corrupted, and the servers stop beahving correctly. The TCP-based version presented no problem when servers were in different clusters, though. I can't explain why state gets corrupted, but it seems to happen when there is at least one remote machine and I use a UDP-based implementation. -Flavio File Added: patch-le-tcp-v5.txt ---------------------------------------------------------------------- Comment By: Benjamin Reed (breed) Date: 2008-02-18 16:50 Message: Logged In: YES user_id=154690 Originator: NO +1 make it happen ---------------------------------------------------------------------- Comment By: fpj (fpj) Date: 2008-02-14 15:30 Message: Logged In: YES user_id=1926444 Originator: YES I have tested the new implementation using machines across the Atlantic, and I found a problem with the way we were opening a connection to the new leader on Follower.java. On Follower.followLeader(), there was a for loop that supposedly tried to connect to the new leader 3 times. The way it was implemented before didn't work because in the second iteration the Socket object was not valid anymore. This was causing the follower to try only once as in the second iteration it was throwing an excpetion that was not ConnectException, and it was leaving the for loop. I have modified the code to create a socket inside the loop for every iteration. In this way the Socket object is valid on every attempt. Because the follower was trying only once, it was happening that the follower would try to connect to the leader before the leader was ready to accept connections. To avoid this problem, we had this hack in the leader election implemention (the previous version was doing the same) to make the follower wait for a fixed amount of time, which we had set to 100ms. When I tried with high-latency connections, the value of 100ms was not sufficient, and I was observing runs in which the system was never making progress because the leader election would succeed, but the single follower of my experiment was not being able to connect to the leader as the leader was not ready yet and the bug in the code allowed the follower to try only once. With this fix, the follower may still experience unsuccessful attempts to connect to the leader, but given that it waits one second until the next try, it often succeeds in the second attempt. Moreover, with this fix, I've been able to get rid of the 100ms timer at the end of leader election, so it now terminates even faster. -Flavio File Added: patch-le-tcp-v4.txt ---------------------------------------------------------------------- Comment By: fpj (fpj) Date: 2008-02-05 15:09 Message: Logged In: YES user_id=1926444 Originator: YES I'm attaching a third version of the patch, in which I fixed a few bugs. In response to Mahadev's comment, I could use the server id to initialize the Random object, but since I'm not passing it to the constructor, I've chosen to initialize the seed in a different way, using IP and time. The idea of using IP is to have different different seeds for different servers. As Mahadev pointed out before, we have to consider tha case in which one computer runs multiple ZK servers. For this special case, I add the current time to break ties with respect to challenge. In my understanding, the way we generate the challenge value doesn't really matter. I think it is more general in the way I've implemented as it doesn't depend upon any particular scheme of identification for the servers. It is important to note, though, that it is not always necessary to break ties as there are times in which the attempt to initiate a connection is one way. In these cases, if we try to break ties, we may end up with no connection. The logic to overcome such corner cases is fairly simple. if a server A doesn't have a connection to server B, and A has received a connection request from B, then A must accept it. To make it work despite the utilization of a tie-break mechanism, A changes its challenge to the smallest possible value the challenge can have. In this way, it makes sure that it loses the challenge. If I use a mechanism that doesn't exchange challenge values (such as using IP and port), then it is not possible to implement the trick I describe above. File Added: patch-le-tcp-v3.txt ---------------------------------------------------------------------- Comment By: Mahadev Konar (mahadevkonar) Date: 2008-02-04 21:21 Message: Logged In: YES user_id=1926680 Originator: NO a few comments: new Random(System.currentTimeMillis() + + localIP.hashCode()); what is the idea behind adding ipaddress hashcode and currentimemillis? why not just use the serverid or in that case why not just use the raw server id for connection resolving? something like a server with higher server id is the server that the other connects to... ---------------------------------------------------------------------- Comment By: fpj (fpj) Date: 2008-01-31 18:14 Message: Logged In: YES user_id=1926444 Originator: YES File Added: patch-le-tcp-v2.txt ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=1008546&aid=1881204&group_id=209147 |
From: SourceForge.net <no...@so...> - 2008-02-21 22:21:22
|
Patches item #1898314, was opened at 2008-02-20 17:50 Message generated for change (Comment added) made by akornev You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=1008546&aid=1898314&group_id=209147 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: server Group: None >Status: Closed >Resolution: Accepted Priority: 5 Private: No Submitted By: Andrew Kornev (akornev) Assigned to: Nobody/Anonymous (nobody) Summary: Server version info support Initial Comment: This patch introduces two new Java classes: a) a compile time utility to generate a Java interface file generated/com/yahoo/zookeeper/version/Info.java, that defines the version-related constants: the current version, the SVN revision number and the build date. b) a utility class Version that uses the interface definition generated by step above to return the server version information. This can be used from the command line as well as programmatically (for example, by a JMX MBean) c) the ant build file has been modified to use the new Version utility to stamp the server jar's manifest file with the current version and SVN revision numbers. Also, the new build file target "release" can now be used for building the release server jar. The only difference between development (built by default) and release versions is the name of the jar file. The release jar file includes the version number in its name: zookeeper-1.1.1.jar ---------------------------------------------------------------------- >Comment By: Andrew Kornev (akornev) Date: 2008-02-21 14:21 Message: Logged In: YES user_id=1926652 Originator: YES Checked in at revision 106 ---------------------------------------------------------------------- Comment By: Benjamin Reed (breed) Date: 2008-02-21 13:41 Message: Logged In: YES user_id=154690 Originator: NO +1 patch looks good ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=1008546&aid=1898314&group_id=209147 |
From: SourceForge.net <no...@so...> - 2008-02-21 22:08:11
|
Patches item #1898316, was opened at 2008-02-20 18:14 Message generated for change (Comment added) made by akornev You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=1008546&aid=1898316&group_id=209147 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: server Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Andrew Kornev (akornev) Assigned to: Nobody/Anonymous (nobody) Summary: Incremental update Initial Comment: This patch includes a number of code changes that aim to make it possible to add new functionality (for example, JMX support) without introducing unnecessary coupling. Also, I started adding getters to various server classes (for now it's only DataTree and WatchManager) for use with JMX MBeans. Overall, the patch only changes the way a few things get initialized and it doesn't affect the main flow. This is an incremental update. I intend to continue refactoring the server code as needed for JMX enablement. ---------------------------------------------------------------------- >Comment By: Andrew Kornev (akornev) Date: 2008-02-21 14:08 Message: Logged In: YES user_id=1926652 Originator: YES Please note that the ZooKeeperServer doesn't get *started* before the leader and the followers got a chance to sync up. It's just an instance of the server that gets created early, but the ZooKeeperServer.load() and ZooKeeperServer.start() methods are called only after the initial syncing has completed (just the way it used to be). ---------------------------------------------------------------------- Comment By: Benjamin Reed (breed) Date: 2008-02-21 13:36 Message: Logged In: YES user_id=154690 Originator: NO One affect on the main flow that I see is that the ZooKeeperServer gets started before the leaders and peers have synced up. This could cause bad data to be sent to clients. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=1008546&aid=1898316&group_id=209147 |
From: SourceForge.net <no...@so...> - 2008-02-21 21:41:43
|
Patches item #1898314, was opened at 2008-02-20 17:50 Message generated for change (Comment added) made by breed You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=1008546&aid=1898314&group_id=209147 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: server Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Andrew Kornev (akornev) Assigned to: Nobody/Anonymous (nobody) Summary: Server version info support Initial Comment: This patch introduces two new Java classes: a) a compile time utility to generate a Java interface file generated/com/yahoo/zookeeper/version/Info.java, that defines the version-related constants: the current version, the SVN revision number and the build date. b) a utility class Version that uses the interface definition generated by step above to return the server version information. This can be used from the command line as well as programmatically (for example, by a JMX MBean) c) the ant build file has been modified to use the new Version utility to stamp the server jar's manifest file with the current version and SVN revision numbers. Also, the new build file target "release" can now be used for building the release server jar. The only difference between development (built by default) and release versions is the name of the jar file. The release jar file includes the version number in its name: zookeeper-1.1.1.jar ---------------------------------------------------------------------- Comment By: Benjamin Reed (breed) Date: 2008-02-21 13:41 Message: Logged In: YES user_id=154690 Originator: NO +1 patch looks good ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=1008546&aid=1898314&group_id=209147 |
From: SourceForge.net <no...@so...> - 2008-02-21 21:36:12
|
Patches item #1898316, was opened at 2008-02-20 18:14 Message generated for change (Comment added) made by breed You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=1008546&aid=1898316&group_id=209147 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: server Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Andrew Kornev (akornev) Assigned to: Nobody/Anonymous (nobody) Summary: Incremental update Initial Comment: This patch includes a number of code changes that aim to make it possible to add new functionality (for example, JMX support) without introducing unnecessary coupling. Also, I started adding getters to various server classes (for now it's only DataTree and WatchManager) for use with JMX MBeans. Overall, the patch only changes the way a few things get initialized and it doesn't affect the main flow. This is an incremental update. I intend to continue refactoring the server code as needed for JMX enablement. ---------------------------------------------------------------------- >Comment By: Benjamin Reed (breed) Date: 2008-02-21 13:36 Message: Logged In: YES user_id=154690 Originator: NO One affect on the main flow that I see is that the ZooKeeperServer gets started before the leaders and peers have synced up. This could cause bad data to be sent to clients. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=1008546&aid=1898316&group_id=209147 |
From: SourceForge.net <no...@so...> - 2008-02-21 15:47:07
|
Patches item #1892108, was opened at 2008-02-12 10:04 Message generated for change (Comment added) made by breed You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=1008546&aid=1892108&group_id=209147 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: server Group: None >Status: Closed >Resolution: Accepted Priority: 5 Private: No Submitted By: Benjamin Reed (breed) Assigned to: Nobody/Anonymous (nobody) Summary: Configurable packet sanity check Initial Comment: I added a jute.maxbuffer property to configure the packet sanity check. I also make the 2 sanity checks use the same number thus default to the same thing: 1M. ---------------------------------------------------------------------- >Comment By: Benjamin Reed (breed) Date: 2008-02-21 07:47 Message: Logged In: YES user_id=154690 Originator: YES Committed revision 105. ---------------------------------------------------------------------- Comment By: Andrew Kornev (akornev) Date: 2008-02-14 15:33 Message: Logged In: YES user_id=1926652 Originator: NO +1 patch looks good ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=1008546&aid=1892108&group_id=209147 |
From: SourceForge.net <no...@so...> - 2008-02-21 02:14:54
|
Patches item #1898316, was opened at 2008-02-20 18:14 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=1008546&aid=1898316&group_id=209147 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: server Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Andrew Kornev (akornev) Assigned to: Nobody/Anonymous (nobody) Summary: Incremental update Initial Comment: This patch includes a number of code changes that aim to make it possible to add new functionality (for example, JMX support) without introducing unnecessary coupling. Also, I started adding getters to various server classes (for now it's only DataTree and WatchManager) for use with JMX MBeans. Overall, the patch only changes the way a few things get initialized and it doesn't affect the main flow. This is an incremental update. I intend to continue refactoring the server code as needed for JMX enablement. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=1008546&aid=1898316&group_id=209147 |
From: SourceForge.net <no...@so...> - 2008-02-21 01:50:11
|
Patches item #1898314, was opened at 2008-02-20 17:50 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=1008546&aid=1898314&group_id=209147 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: server Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Andrew Kornev (akornev) Assigned to: Nobody/Anonymous (nobody) Summary: Server version info support Initial Comment: This patch introduces two new Java classes: a) a compile time utility to generate a Java interface file generated/com/yahoo/zookeeper/version/Info.java, that defines the version-related constants: the current version, the SVN revision number and the build date. b) a utility class Version that uses the interface definition generated by step above to return the server version information. This can be used from the command line as well as programmatically (for example, by a JMX MBean) c) the ant build file has been modified to use the new Version utility to stamp the server jar's manifest file with the current version and SVN revision numbers. Also, the new build file target "release" can now be used for building the release server jar. The only difference between development (built by default) and release versions is the name of the jar file. The release jar file includes the version number in its name: zookeeper-1.1.1.jar ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=1008546&aid=1898314&group_id=209147 |
From: SourceForge.net <no...@so...> - 2008-02-19 16:07:01
|
Patches item #1881204, was opened at 2008-01-28 16:28 Message generated for change (Comment added) made by fpj You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=1008546&aid=1881204&group_id=209147 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: server Group: None Status: Open Resolution: None Priority: 7 Private: No Submitted By: fpj (fpj) Assigned to: fpj (fpj) Summary: New leader election algorithm with TCP. Initial Comment: This is a feature request for a new leader election algorithm with TCP. ---------------------------------------------------------------------- >Comment By: fpj (fpj) Date: 2008-02-19 17:07 Message: Logged In: YES user_id=1926444 Originator: YES In version 5b, I simply removed some output messages, which I had forgot to remove. Everything else should be the same as with version 5. -Flavio File Added: patch-le-tcp-v5b.txt ---------------------------------------------------------------------- Comment By: fpj (fpj) Date: 2008-02-19 16:59 Message: Logged In: YES user_id=1926444 Originator: YES Version 5 has an input parameter (electionAlg) to select which implementation of the algorithm to use. There are currently 4 flavors: the original UDP-based ZK leader election algorithm (0), UDP-based without authentication (1), UDP-based with authentication (2), TCP-based (3). I tested with three different clusters, and locally all four work fine. However, when adding a remote machine the zookeeper servers go wild when using any of the UDP-based versions. It seems that state gets corrupted, and the servers stop beahving correctly. The TCP-based version presented no problem when servers were in different clusters, though. I can't explain why state gets corrupted, but it seems to happen when there is at least one remote machine and I use a UDP-based implementation. -Flavio File Added: patch-le-tcp-v5.txt ---------------------------------------------------------------------- Comment By: Benjamin Reed (breed) Date: 2008-02-18 16:50 Message: Logged In: YES user_id=154690 Originator: NO +1 make it happen ---------------------------------------------------------------------- Comment By: fpj (fpj) Date: 2008-02-14 15:30 Message: Logged In: YES user_id=1926444 Originator: YES I have tested the new implementation using machines across the Atlantic, and I found a problem with the way we were opening a connection to the new leader on Follower.java. On Follower.followLeader(), there was a for loop that supposedly tried to connect to the new leader 3 times. The way it was implemented before didn't work because in the second iteration the Socket object was not valid anymore. This was causing the follower to try only once as in the second iteration it was throwing an excpetion that was not ConnectException, and it was leaving the for loop. I have modified the code to create a socket inside the loop for every iteration. In this way the Socket object is valid on every attempt. Because the follower was trying only once, it was happening that the follower would try to connect to the leader before the leader was ready to accept connections. To avoid this problem, we had this hack in the leader election implemention (the previous version was doing the same) to make the follower wait for a fixed amount of time, which we had set to 100ms. When I tried with high-latency connections, the value of 100ms was not sufficient, and I was observing runs in which the system was never making progress because the leader election would succeed, but the single follower of my experiment was not being able to connect to the leader as the leader was not ready yet and the bug in the code allowed the follower to try only once. With this fix, the follower may still experience unsuccessful attempts to connect to the leader, but given that it waits one second until the next try, it often succeeds in the second attempt. Moreover, with this fix, I've been able to get rid of the 100ms timer at the end of leader election, so it now terminates even faster. -Flavio File Added: patch-le-tcp-v4.txt ---------------------------------------------------------------------- Comment By: fpj (fpj) Date: 2008-02-05 15:09 Message: Logged In: YES user_id=1926444 Originator: YES I'm attaching a third version of the patch, in which I fixed a few bugs. In response to Mahadev's comment, I could use the server id to initialize the Random object, but since I'm not passing it to the constructor, I've chosen to initialize the seed in a different way, using IP and time. The idea of using IP is to have different different seeds for different servers. As Mahadev pointed out before, we have to consider tha case in which one computer runs multiple ZK servers. For this special case, I add the current time to break ties with respect to challenge. In my understanding, the way we generate the challenge value doesn't really matter. I think it is more general in the way I've implemented as it doesn't depend upon any particular scheme of identification for the servers. It is important to note, though, that it is not always necessary to break ties as there are times in which the attempt to initiate a connection is one way. In these cases, if we try to break ties, we may end up with no connection. The logic to overcome such corner cases is fairly simple. if a server A doesn't have a connection to server B, and A has received a connection request from B, then A must accept it. To make it work despite the utilization of a tie-break mechanism, A changes its challenge to the smallest possible value the challenge can have. In this way, it makes sure that it loses the challenge. If I use a mechanism that doesn't exchange challenge values (such as using IP and port), then it is not possible to implement the trick I describe above. File Added: patch-le-tcp-v3.txt ---------------------------------------------------------------------- Comment By: Mahadev Konar (mahadevkonar) Date: 2008-02-04 21:21 Message: Logged In: YES user_id=1926680 Originator: NO a few comments: new Random(System.currentTimeMillis() + + localIP.hashCode()); what is the idea behind adding ipaddress hashcode and currentimemillis? why not just use the serverid or in that case why not just use the raw server id for connection resolving? something like a server with higher server id is the server that the other connects to... ---------------------------------------------------------------------- Comment By: fpj (fpj) Date: 2008-01-31 18:14 Message: Logged In: YES user_id=1926444 Originator: YES File Added: patch-le-tcp-v2.txt ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=1008546&aid=1881204&group_id=209147 |
From: SourceForge.net <no...@so...> - 2008-02-19 15:59:38
|
Patches item #1881204, was opened at 2008-01-28 16:28 Message generated for change (Comment added) made by fpj You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=1008546&aid=1881204&group_id=209147 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: server Group: None Status: Open Resolution: None Priority: 7 Private: No Submitted By: fpj (fpj) Assigned to: fpj (fpj) Summary: New leader election algorithm with TCP. Initial Comment: This is a feature request for a new leader election algorithm with TCP. ---------------------------------------------------------------------- >Comment By: fpj (fpj) Date: 2008-02-19 16:59 Message: Logged In: YES user_id=1926444 Originator: YES Version 5 has an input parameter (electionAlg) to select which implementation of the algorithm to use. There are currently 4 flavors: the original UDP-based ZK leader election algorithm (0), UDP-based without authentication (1), UDP-based with authentication (2), TCP-based (3). I tested with three different clusters, and locally all four work fine. However, when adding a remote machine the zookeeper servers go wild when using any of the UDP-based versions. It seems that state gets corrupted, and the servers stop beahving correctly. The TCP-based version presented no problem when servers were in different clusters, though. I can't explain why state gets corrupted, but it seems to happen when there is at least one remote machine and I use a UDP-based implementation. -Flavio File Added: patch-le-tcp-v5.txt ---------------------------------------------------------------------- Comment By: Benjamin Reed (breed) Date: 2008-02-18 16:50 Message: Logged In: YES user_id=154690 Originator: NO +1 make it happen ---------------------------------------------------------------------- Comment By: fpj (fpj) Date: 2008-02-14 15:30 Message: Logged In: YES user_id=1926444 Originator: YES I have tested the new implementation using machines across the Atlantic, and I found a problem with the way we were opening a connection to the new leader on Follower.java. On Follower.followLeader(), there was a for loop that supposedly tried to connect to the new leader 3 times. The way it was implemented before didn't work because in the second iteration the Socket object was not valid anymore. This was causing the follower to try only once as in the second iteration it was throwing an excpetion that was not ConnectException, and it was leaving the for loop. I have modified the code to create a socket inside the loop for every iteration. In this way the Socket object is valid on every attempt. Because the follower was trying only once, it was happening that the follower would try to connect to the leader before the leader was ready to accept connections. To avoid this problem, we had this hack in the leader election implemention (the previous version was doing the same) to make the follower wait for a fixed amount of time, which we had set to 100ms. When I tried with high-latency connections, the value of 100ms was not sufficient, and I was observing runs in which the system was never making progress because the leader election would succeed, but the single follower of my experiment was not being able to connect to the leader as the leader was not ready yet and the bug in the code allowed the follower to try only once. With this fix, the follower may still experience unsuccessful attempts to connect to the leader, but given that it waits one second until the next try, it often succeeds in the second attempt. Moreover, with this fix, I've been able to get rid of the 100ms timer at the end of leader election, so it now terminates even faster. -Flavio File Added: patch-le-tcp-v4.txt ---------------------------------------------------------------------- Comment By: fpj (fpj) Date: 2008-02-05 15:09 Message: Logged In: YES user_id=1926444 Originator: YES I'm attaching a third version of the patch, in which I fixed a few bugs. In response to Mahadev's comment, I could use the server id to initialize the Random object, but since I'm not passing it to the constructor, I've chosen to initialize the seed in a different way, using IP and time. The idea of using IP is to have different different seeds for different servers. As Mahadev pointed out before, we have to consider tha case in which one computer runs multiple ZK servers. For this special case, I add the current time to break ties with respect to challenge. In my understanding, the way we generate the challenge value doesn't really matter. I think it is more general in the way I've implemented as it doesn't depend upon any particular scheme of identification for the servers. It is important to note, though, that it is not always necessary to break ties as there are times in which the attempt to initiate a connection is one way. In these cases, if we try to break ties, we may end up with no connection. The logic to overcome such corner cases is fairly simple. if a server A doesn't have a connection to server B, and A has received a connection request from B, then A must accept it. To make it work despite the utilization of a tie-break mechanism, A changes its challenge to the smallest possible value the challenge can have. In this way, it makes sure that it loses the challenge. If I use a mechanism that doesn't exchange challenge values (such as using IP and port), then it is not possible to implement the trick I describe above. File Added: patch-le-tcp-v3.txt ---------------------------------------------------------------------- Comment By: Mahadev Konar (mahadevkonar) Date: 2008-02-04 21:21 Message: Logged In: YES user_id=1926680 Originator: NO a few comments: new Random(System.currentTimeMillis() + + localIP.hashCode()); what is the idea behind adding ipaddress hashcode and currentimemillis? why not just use the serverid or in that case why not just use the raw server id for connection resolving? something like a server with higher server id is the server that the other connects to... ---------------------------------------------------------------------- Comment By: fpj (fpj) Date: 2008-01-31 18:14 Message: Logged In: YES user_id=1926444 Originator: YES File Added: patch-le-tcp-v2.txt ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=1008546&aid=1881204&group_id=209147 |
From: SourceForge.net <no...@so...> - 2008-02-18 15:50:03
|
Patches item #1881204, was opened at 2008-01-28 07:28 Message generated for change (Comment added) made by breed You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=1008546&aid=1881204&group_id=209147 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: server Group: None Status: Open Resolution: None Priority: 7 Private: No Submitted By: fpj (fpj) Assigned to: fpj (fpj) Summary: New leader election algorithm with TCP. Initial Comment: This is a feature request for a new leader election algorithm with TCP. ---------------------------------------------------------------------- Comment By: Benjamin Reed (breed) Date: 2008-02-18 07:50 Message: Logged In: YES user_id=154690 Originator: NO +1 make it happen ---------------------------------------------------------------------- Comment By: fpj (fpj) Date: 2008-02-14 06:30 Message: Logged In: YES user_id=1926444 Originator: YES I have tested the new implementation using machines across the Atlantic, and I found a problem with the way we were opening a connection to the new leader on Follower.java. On Follower.followLeader(), there was a for loop that supposedly tried to connect to the new leader 3 times. The way it was implemented before didn't work because in the second iteration the Socket object was not valid anymore. This was causing the follower to try only once as in the second iteration it was throwing an excpetion that was not ConnectException, and it was leaving the for loop. I have modified the code to create a socket inside the loop for every iteration. In this way the Socket object is valid on every attempt. Because the follower was trying only once, it was happening that the follower would try to connect to the leader before the leader was ready to accept connections. To avoid this problem, we had this hack in the leader election implemention (the previous version was doing the same) to make the follower wait for a fixed amount of time, which we had set to 100ms. When I tried with high-latency connections, the value of 100ms was not sufficient, and I was observing runs in which the system was never making progress because the leader election would succeed, but the single follower of my experiment was not being able to connect to the leader as the leader was not ready yet and the bug in the code allowed the follower to try only once. With this fix, the follower may still experience unsuccessful attempts to connect to the leader, but given that it waits one second until the next try, it often succeeds in the second attempt. Moreover, with this fix, I've been able to get rid of the 100ms timer at the end of leader election, so it now terminates even faster. -Flavio File Added: patch-le-tcp-v4.txt ---------------------------------------------------------------------- Comment By: fpj (fpj) Date: 2008-02-05 06:09 Message: Logged In: YES user_id=1926444 Originator: YES I'm attaching a third version of the patch, in which I fixed a few bugs. In response to Mahadev's comment, I could use the server id to initialize the Random object, but since I'm not passing it to the constructor, I've chosen to initialize the seed in a different way, using IP and time. The idea of using IP is to have different different seeds for different servers. As Mahadev pointed out before, we have to consider tha case in which one computer runs multiple ZK servers. For this special case, I add the current time to break ties with respect to challenge. In my understanding, the way we generate the challenge value doesn't really matter. I think it is more general in the way I've implemented as it doesn't depend upon any particular scheme of identification for the servers. It is important to note, though, that it is not always necessary to break ties as there are times in which the attempt to initiate a connection is one way. In these cases, if we try to break ties, we may end up with no connection. The logic to overcome such corner cases is fairly simple. if a server A doesn't have a connection to server B, and A has received a connection request from B, then A must accept it. To make it work despite the utilization of a tie-break mechanism, A changes its challenge to the smallest possible value the challenge can have. In this way, it makes sure that it loses the challenge. If I use a mechanism that doesn't exchange challenge values (such as using IP and port), then it is not possible to implement the trick I describe above. File Added: patch-le-tcp-v3.txt ---------------------------------------------------------------------- Comment By: Mahadev Konar (mahadevkonar) Date: 2008-02-04 12:21 Message: Logged In: YES user_id=1926680 Originator: NO a few comments: new Random(System.currentTimeMillis() + + localIP.hashCode()); what is the idea behind adding ipaddress hashcode and currentimemillis? why not just use the serverid or in that case why not just use the raw server id for connection resolving? something like a server with higher server id is the server that the other connects to... ---------------------------------------------------------------------- Comment By: fpj (fpj) Date: 2008-01-31 09:14 Message: Logged In: YES user_id=1926444 Originator: YES File Added: patch-le-tcp-v2.txt ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=1008546&aid=1881204&group_id=209147 |
From: SourceForge.net <no...@so...> - 2008-02-14 23:34:01
|
Patches item #1892108, was opened at 2008-02-12 10:04 Message generated for change (Comment added) made by akornev You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=1008546&aid=1892108&group_id=209147 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: server Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Benjamin Reed (breed) Assigned to: Nobody/Anonymous (nobody) Summary: Configurable packet sanity check Initial Comment: I added a jute.maxbuffer property to configure the packet sanity check. I also make the 2 sanity checks use the same number thus default to the same thing: 1M. ---------------------------------------------------------------------- Comment By: Andrew Kornev (akornev) Date: 2008-02-14 15:33 Message: Logged In: YES user_id=1926652 Originator: NO +1 patch looks good ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=1008546&aid=1892108&group_id=209147 |
From: SourceForge.net <no...@so...> - 2008-02-14 23:30:26
|
Patches item #1889354, was opened at 2008-02-07 18:24 Message generated for change (Settings changed) made by akornev You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=1008546&aid=1889354&group_id=209147 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None >Status: Closed >Resolution: Accepted Priority: 5 Private: No Submitted By: Andrew Kornev (akornev) Assigned to: Nobody/Anonymous (nobody) Summary: revision number and other metadata in the JAR manifest Initial Comment: Here's how the new manifest's going to look like: Manifest-Version: 1.0 Ant-Version: Apache Ant 1.7.0 Created-By: 1.6.0_02-b06 (Sun Microsystems Inc.) Main-Class: com.yahoo.zookeeper.server.quorum.QuorumPeer Built-By: reallycooldude Built-At: February 7 2008 Implementation-Title: com.yahoo.zookeeper Implementation-Version: SVN revision 93 Implementation-Vendor: Yahoo! Inc. Please note in order to extract the last revision number from the SVN repository I had to use Collabnet's Subversion Ant task svnant-1.1.0-RC2: http://subclipse.tigris.org/svnant.html. Download their latest release 1.1.0-RC2. Next, create the svnant directory under zookeeper/java/lib and copy all jar files from the zip's lib directory to the newly created svnant directory. Apply the attached build.xml patch and build the server jar as usual. ---------------------------------------------------------------------- Comment By: Benjamin Reed (breed) Date: 2008-02-08 11:17 Message: Logged In: YES user_id=154690 Originator: NO +1 Looks good. Lets put the needed libraries with their licenses in the lib directory. I think we should also put a readme.txt in the lib directory indicating that the libraries are only used to do builds and are not part of the resulting binaries. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=1008546&aid=1889354&group_id=209147 |