From: Jonathan L. <jon...@no...> - 2008-11-04 14:31:09
|
On Nov 3, 2008, at 3:46 PM, Eric Chan-Tin wrote: > Jonathan, > > I can insert some code to fix this and it doesn't look too hard but > I do > have some concerns. > >> I don't remember observing this behavior in our PL >> deployment, but that could have been because we had the ping time set >> to 1-2 minutes, IIRC. > > The default ping time was set to 10 seconds + some Gaussian > distribution, > so that's what I used, but changing the ping time to 1 minute or > more will > result in the same effect except it will take longer for the "cycle" > to > occur. > > What I have in mind was the following: > > 1) In NCClient.addNeighbor() would be changed to (in pseudocode) > protected boolean addNeighbor(RemoteState<T> guy) { > boolean added = false; > //if neighbors.size() >= MAX_NEIGHBORS > //Check last update time of each neighbor in > neighbors > //neighbor.getLastUpdateTime() > //if any of them is older than some number TIMEOUT, > remove the oldest one, then add "guy" to neighbors > //else don't add "guy" to neighbors > //else add "guy" to neighbors > } > IMO this shouldn't be the behavior of this function because it is called "addNeighbor." Instead what could happen is that part of the performMaintenance() function could remove old neighbors. > 2) In NCManager, getGossipNode() would just pick a random node from > localNC.primaryNC.getNeighbors() to send a request to every 10 > seconds. > > 3) Pyxida will still keep track of all GossipNodes in the network. > It could > also (say every 10 minutes) check its NCClient.neighbors list and > see if > any neighbor is older than TIMEOUT, remove it, then pick a random > GossipNode to send a ping request. > > The main problem with this scheme is that one Pyxida node would most > likely > stay with the same set of neighbors all the time. Its coordinates > would be > greatly affected by those 32 nodes (I am assuming MAX_NEIGHBORS = 32). > Instead of (2), why not have the getNeighborToPing function determine the behavior? If it returns a node, then ping it. If it returns null, then have the NCManager pick some other node to use? If things are hunky-dory, getNeighborToPing() would return a node (a current member of the neighbor set), if we're short on neighbors, then it would return null. The periodic maintenance would have already tossed old nodes, so this would be self-sustaining. The "problem" you mention is fairly central to network coordinates in general. Sure the coordinates will be a little better the more neighbors you have. But in a very large system talking to everybody isn't feasible. Peter and I have a graph in one of the workshop papers which shows the effect of increasing the number of neighbors > I don't think the network would be partitioned, but it could happen. > For > example, the boostrap nodes have each other in their neighbors list. > Every > new node bootstrapping to the network would contact those > bootstrapping > nodes first and add them to their neighbors. So every Pyxida node > could > have as its neighbors the "Bootstrap nodes", which would make Pyxida > become > a centralized system. > As long as the initial bootstrap list is large enough, I think you are right, that partitions won't be a problem. > One last thing: When Pyxida receives a ping request from a node, it > does > not update its coordinates. Is there any particular reason for that? > Is it > because Pyxida will not be able to calculate the RTT? > Yes. > Also, there is no "gossiping" of other nodes' coordinates, right? So > if A > has B in its neighbors list, it will from time to time tell other > nodes > about B's coordinates. I realize this has a lot of complications but > the > original Vivaldi paper was very fuzzy about this point and the > previous > point that it seems it is up to the implementors to decide what to do. > I don't think we are doing this. It could be useful in certain circumstance (e.g., picking potential new neighbors), but we're not, IIRC. > I will have to think some more about this as there are a lot of nasty > little side effects that could happen, but these are my thoughts. > > Thanks, > > -Eric > Sounds good. -jonathan > On Nov 3 2008, Jonathan Ledlie wrote: > >> >> Eric, >> >> Based on a look at the code just now, I see where you are coming >> from: >> with these params on PL, there isn't anything stopping the neighbor >> list from circulating pretty quickly through all nodes. In the >> Azureus implementation, the guys fed to the alg (fed to >> processSample) >> weren't changing much because they were coming from the routing >> finger >> tables. I don't remember observing this behavior in our PL >> deployment, but that could have been because we had the ping time set >> to 1-2 minutes, IIRC. >> >> Do you want to insert some code that will fix this? Basically we >> want >> to only add a new neighbor if our oldest neighbor should be tossed >> (or >> if we haven't reached MAX_NEIGHBORS). It's probably best from a >> clarity standpoint if addNeighbor does continue to actually add the >> neighbor in all cases. Probably the best thing to do is to have >> NCClient drop old neighbors internally and have NCManager ask it >> which >> nodes should be pinged via getNeighborToPing. So NCClient would >> continue to have the same behavior on processSample (it would not >> reject new information) and NCManager and NCClient would cooperate to >> measure RTTs to current neighbors. We do want to make sure, however, >> that the gossip set stays large enough so that partitions don't >> occur. >> >> Do you want to be added to the SourceForge Pyxida developers to make >> this change? (Please test it out carefully first.) >> >> -jonathan >> >> On Oct 30, 2008, at 11:32 AM, ext Eric Chan-Tin wrote: >> >>> Jonathan, >>> >>> From what I saw, eventually, you learn about all the nodes in the >>> network. Every 10 seconds, Pyxida picks a random "Gossip" node from >>> its >>> list and sends a request. The Gossip node will reply back with an >>> "upNeighbor", that is, a "Gossip" node that is "up" or with 10% >>> probability, a node that is "pending". Since naturally, all the >>> nodes >>> bootstrap off the same set of nodes, every node will eventually know >>> of >>> every other node. >>> >>> Pyxida might not be rotating through everybody continuously >>> (although >>> eventually, since it knows everybody, it will have added everybody >>> to >>> its Neighbors list at some point), but could be removing a node that >>> is >>> "alive". So, if MAX_NEIGHBORS = 32, let's call them A_1, A_2, ..., >>> A_32. >>> Pyxida picks a random Gossip node B, sends it a request and B >>> replies >>> back. Pyxida will add B to its Neighbors list, see that size of >>> Neighbors > MAX_NEIGHBORS, kick A_1 out. Then from those 32 >>> neighbors >>> A_2, ...., A_32, B, it will calculate the force, and apply it to the >>> current system coordinates. Thus, if every 10 seconds, Pyxida >>> picks a >>> different set of nodes C, D, E, etc... it would keep kicking nodes >>> out >>> of the neighbors list. >>> >>> It's the "kicking A_1 out" part that seemed a bit confusing. Churn >>> is a >>> good reason but you don't check that A_1 might still be alive or >>> maybe >>> A_1 was just added. I have been printing the neighbors list every 10 >>> minutes, and they are completely different between those 10 minutes. >>> >>> One possible problem might be that I am choosing the value of >>> MAX_NEIGHBORS to be too small. Originally it was set to 512 (which >>> was >>> meant for Azureus). Since I am using Planetlab where there are about >>> 400-500 "stable" nodes, I changed the value to 32. But I don't think >>> that should be a problem since log(n) = log(512) = 9. >>> >>> Thanks, >>> >>> -Eric >>> >>> Jonathan Ledlie wrote: >>>> >>>> Eric, >>>> >>>> Glad to hear it. Let Peter and me know if you publish anything >>>> using >>>> Pyxida and we can put it on the web page. >>>> >>>> "Gossip" nodes are for learning about the network. "Neighbors" are >>>> for >>>> constructing the coordinate. >>>> >>>> So, IIRC, the gossip process learns about nodes that may or may not >>>> be >>>> applied to the neighbor list. That is, we may eventually learn >>>> about a >>>> large portion of the network, via gossiping, but should only be >>>> using >>>> MAX_NEIGHBORS at any given time for coordinate computation. The >>>> reason >>>> we don't want to use a fixed set of neighbors for coordinate >>>> computation >>>> is because nodes may come and go from the system. However, it >>>> should >>>> not be the case that we just rotate through everybody, because >>>> then, as >>>> you note, we'd be computing against everyone's coordinate/latency, >>>> and >>>> this is counter to the point of network coordinates. With this >>>> gradual >>>> transience as a desirable quality in mind, I am pretty sure we >>>> maintain >>>> neighbors in a fairly fixed manner. Does this not seem to be the >>>> case >>>> to you? >>>> >>>> -jonathan >>>> >>>> On Oct 29, 2008, at 9:20 PM, Eric Chan-Tin wrote: >>>> >>>>> Hi Jonathan and Peter, >>>>> >>>>> First of all, just wanted to say that Pyxida has been working >>>>> great on >>>>> Planetlab for the past 4 months or so. Getting good data and >>>>> experiments >>>>> going well. >>>>> >>>>> However, I do have one question/comment. >>>>> >>>>> Every Pyxida node contains a list of Neighbors (in NCClient) and a >>>>> list >>>>> of Gossip nodes (they are called neighbors in NCManager but I will >>>>> call >>>>> them gossip for now so as not to confuse with the 'real' neighbors >>>>> in >>>>> NCClient). Every 10 seconds, Pyxida will pick a random node (call >>>>> it A) >>>>> from Gossip and send a gossip request (basically asking for rtt, >>>>> coordinate, error). But Gossip contains all the nodes in the >>>>> network. >>>>> The comment in the code said that was to avoid Pyxida becoming >>>>> lonely >>>>> and no-one to gossip with, which makes sense. >>>>> >>>>> Once A responds, Pyxida will add A to its Neighbors (if Neighbors >>>>> does >>>>> not contain A) and remove the first element from the Neighbors >>>>> list. >>>>> This seems a little bit odd. Was there any particular reasoning >>>>> behind >>>>> this? Because it would seem like you're getting the coordinate >>>>> from >>>>> every node in the network eventually. >>>>> >>>>> Thanks a lot, >>>>> >>>>> -Eric >>>>> >>>>> Jonathan Ledlie wrote: >>>>>> >>>>>> Your inclination is correct. See forwarded message. >>>>>> >>>>>> -jonathan >>>>>> >>>>>> On Apr 9, 2008, at 1:53 PM, ext Eric Chan-Tin wrote: >>>>>> >>>>>>> Thank you for the clarification. >>>>>>> >>>>>>> I've been reading through your published papers and code and I >>>>>>> must >>>>>>> say I learned a lot about how Pyxida works :) >>>>>>> >>>>>>> I still have a quick and easy (hopefully) question. Each Pyxida >>>>>>> node >>>>>>> has a small amount of neighbors and learn about other nodes >>>>>>> through >>>>>>> other nodes. I am running on about 200 machines, and it seems >>>>>>> like all >>>>>>> the Pyxida nodes are adding each other to their upNeighbors set. >>>>>>> NCClient.java has the MAX_NEIGHBORS set to 512. >>>>>>> >>>>>>> I wanted to make sure that I am not misreading the code, but >>>>>>> shouldn't >>>>>>> each Pyxida node keep a small number of neighbors in its >>>>>>> upNeighbors >>>>>>> set. Else, it will be a lot of updates to send a gossipMessage >>>>>>> to all >>>>>>> 200 neighbors, and it also means that a Pyxida node has the >>>>>>> whole >>>>>>> network in its upNeighbors set - that seems wrong. I am inclined >>>>>>> to >>>>>>> change the value of MAX_NEIGHBORS but wanted to make sure that >>>>>>> my >>>>>>> understanding is correct first. >>>>>>> >>>>>>> Thank you, >>>>>>> >>>>>>> Eric >>>>>>> >>>>> >>>>> >>>>> >>>>> ------------------------------------------------------------------------- >>>>> This SF.Net email is sponsored by the Moblin Your Move Developer's >>>>> challenge Build the coolest Linux based applications with Moblin >>>>> SDK & >>>>> win great prizes Grand prize is a trip for two to an Open Source >>>>> event >>>>> anywhere in the world >>>>> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >>>>> _______________________________________________ Pyxida-users >>>>> mailing >>>>> list Pyx...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/pyxida-users >> > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's > challenge > Build the coolest Linux based applications with Moblin SDK & win > great prizes > Grand prize is a trip for two to an Open Source event anywhere in > the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Pyxida-users mailing list > Pyx...@li... > https://lists.sourceforge.net/lists/listinfo/pyxida-users |