From: Eric Chan-T. <dch...@cs...> - 2008-11-03 20:46:21
|
Jonathan, I can insert some code to fix this and it doesn't look too hard but I do have some concerns. >I don't remember observing this behavior in our PL >deployment, but that could have been because we had the ping time set >to 1-2 minutes, IIRC. The default ping time was set to 10 seconds + some Gaussian distribution, so that's what I used, but changing the ping time to 1 minute or more will result in the same effect except it will take longer for the "cycle" to occur. What I have in mind was the following: 1) In NCClient.addNeighbor() would be changed to (in pseudocode) protected boolean addNeighbor(RemoteState<T> guy) { boolean added = false; //if neighbors.size() >= MAX_NEIGHBORS //Check last update time of each neighbor in neighbors //neighbor.getLastUpdateTime() //if any of them is older than some number TIMEOUT, remove the oldest one, then add "guy" to neighbors //else don't add "guy" to neighbors //else add "guy" to neighbors } 2) In NCManager, getGossipNode() would just pick a random node from localNC.primaryNC.getNeighbors() to send a request to every 10 seconds. 3) Pyxida will still keep track of all GossipNodes in the network. It could also (say every 10 minutes) check its NCClient.neighbors list and see if any neighbor is older than TIMEOUT, remove it, then pick a random GossipNode to send a ping request. The main problem with this scheme is that one Pyxida node would most likely stay with the same set of neighbors all the time. Its coordinates would be greatly affected by those 32 nodes (I am assuming MAX_NEIGHBORS = 32). I don't think the network would be partitioned, but it could happen. For example, the boostrap nodes have each other in their neighbors list. Every new node bootstrapping to the network would contact those bootstrapping nodes first and add them to their neighbors. So every Pyxida node could have as its neighbors the "Bootstrap nodes", which would make Pyxida become a centralized system. One last thing: When Pyxida receives a ping request from a node, it does not update its coordinates. Is there any particular reason for that? Is it because Pyxida will not be able to calculate the RTT? Also, there is no "gossiping" of other nodes' coordinates, right? So if A has B in its neighbors list, it will from time to time tell other nodes about B's coordinates. I realize this has a lot of complications but the original Vivaldi paper was very fuzzy about this point and the previous point that it seems it is up to the implementors to decide what to do. I will have to think some more about this as there are a lot of nasty little side effects that could happen, but these are my thoughts. Thanks, -Eric On Nov 3 2008, Jonathan Ledlie wrote: > >Eric, > >Based on a look at the code just now, I see where you are coming from: >with these params on PL, there isn't anything stopping the neighbor >list from circulating pretty quickly through all nodes. In the >Azureus implementation, the guys fed to the alg (fed to processSample) >weren't changing much because they were coming from the routing finger >tables. I don't remember observing this behavior in our PL >deployment, but that could have been because we had the ping time set >to 1-2 minutes, IIRC. > >Do you want to insert some code that will fix this? Basically we want >to only add a new neighbor if our oldest neighbor should be tossed (or >if we haven't reached MAX_NEIGHBORS). It's probably best from a >clarity standpoint if addNeighbor does continue to actually add the >neighbor in all cases. Probably the best thing to do is to have >NCClient drop old neighbors internally and have NCManager ask it which >nodes should be pinged via getNeighborToPing. So NCClient would >continue to have the same behavior on processSample (it would not >reject new information) and NCManager and NCClient would cooperate to >measure RTTs to current neighbors. We do want to make sure, however, >that the gossip set stays large enough so that partitions don't occur. > >Do you want to be added to the SourceForge Pyxida developers to make >this change? (Please test it out carefully first.) > >-jonathan > >On Oct 30, 2008, at 11:32 AM, ext Eric Chan-Tin wrote: > >> Jonathan, >> >> From what I saw, eventually, you learn about all the nodes in the >> network. Every 10 seconds, Pyxida picks a random "Gossip" node from >> its >> list and sends a request. The Gossip node will reply back with an >> "upNeighbor", that is, a "Gossip" node that is "up" or with 10% >> probability, a node that is "pending". Since naturally, all the nodes >> bootstrap off the same set of nodes, every node will eventually know >> of >> every other node. >> >> Pyxida might not be rotating through everybody continuously (although >> eventually, since it knows everybody, it will have added everybody to >> its Neighbors list at some point), but could be removing a node that >> is >> "alive". So, if MAX_NEIGHBORS = 32, let's call them A_1, A_2, ..., >> A_32. >> Pyxida picks a random Gossip node B, sends it a request and B replies >> back. Pyxida will add B to its Neighbors list, see that size of >> Neighbors > MAX_NEIGHBORS, kick A_1 out. Then from those 32 neighbors >> A_2, ...., A_32, B, it will calculate the force, and apply it to the >> current system coordinates. Thus, if every 10 seconds, Pyxida picks a >> different set of nodes C, D, E, etc... it would keep kicking nodes out >> of the neighbors list. >> >> It's the "kicking A_1 out" part that seemed a bit confusing. Churn >> is a >> good reason but you don't check that A_1 might still be alive or maybe >> A_1 was just added. I have been printing the neighbors list every 10 >> minutes, and they are completely different between those 10 minutes. >> >> One possible problem might be that I am choosing the value of >> MAX_NEIGHBORS to be too small. Originally it was set to 512 (which was >> meant for Azureus). Since I am using Planetlab where there are about >> 400-500 "stable" nodes, I changed the value to 32. But I don't think >> that should be a problem since log(n) = log(512) = 9. >> >> Thanks, >> >> -Eric >> >> Jonathan Ledlie wrote: >>> >>> Eric, >>> >>> Glad to hear it. Let Peter and me know if you publish anything using >>> Pyxida and we can put it on the web page. >>> >>> "Gossip" nodes are for learning about the network. "Neighbors" are >>> for >>> constructing the coordinate. >>> >>> So, IIRC, the gossip process learns about nodes that may or may not >>> be >>> applied to the neighbor list. That is, we may eventually learn >>> about a >>> large portion of the network, via gossiping, but should only be using >>> MAX_NEIGHBORS at any given time for coordinate computation. The >>> reason >>> we don't want to use a fixed set of neighbors for coordinate >>> computation >>> is because nodes may come and go from the system. However, it should >>> not be the case that we just rotate through everybody, because >>> then, as >>> you note, we'd be computing against everyone's coordinate/latency, >>> and >>> this is counter to the point of network coordinates. With this >>> gradual >>> transience as a desirable quality in mind, I am pretty sure we >>> maintain >>> neighbors in a fairly fixed manner. Does this not seem to be the >>> case >>> to you? >>> >>> -jonathan >>> >>> On Oct 29, 2008, at 9:20 PM, Eric Chan-Tin wrote: >>> >>>> Hi Jonathan and Peter, >>>> >>>> First of all, just wanted to say that Pyxida has been working >>>> great on >>>> Planetlab for the past 4 months or so. Getting good data and >>>> experiments >>>> going well. >>>> >>>> However, I do have one question/comment. >>>> >>>> Every Pyxida node contains a list of Neighbors (in NCClient) and a >>>> list >>>> of Gossip nodes (they are called neighbors in NCManager but I will >>>> call >>>> them gossip for now so as not to confuse with the 'real' neighbors >>>> in >>>> NCClient). Every 10 seconds, Pyxida will pick a random node (call >>>> it A) >>>> from Gossip and send a gossip request (basically asking for rtt, >>>> coordinate, error). But Gossip contains all the nodes in the >>>> network. >>>> The comment in the code said that was to avoid Pyxida becoming >>>> lonely >>>> and no-one to gossip with, which makes sense. >>>> >>>> Once A responds, Pyxida will add A to its Neighbors (if Neighbors >>>> does >>>> not contain A) and remove the first element from the Neighbors list. >>>> This seems a little bit odd. Was there any particular reasoning >>>> behind >>>> this? Because it would seem like you're getting the coordinate from >>>> every node in the network eventually. >>>> >>>> Thanks a lot, >>>> >>>> -Eric >>>> >>>> Jonathan Ledlie wrote: >>>>> >>>>> Your inclination is correct. See forwarded message. >>>>> >>>>> -jonathan >>>>> >>>>> On Apr 9, 2008, at 1:53 PM, ext Eric Chan-Tin wrote: >>>>> >>>>>> Thank you for the clarification. >>>>>> >>>>>> I've been reading through your published papers and code and I >>>>>> must >>>>>> say I learned a lot about how Pyxida works :) >>>>>> >>>>>> I still have a quick and easy (hopefully) question. Each Pyxida >>>>>> node >>>>>> has a small amount of neighbors and learn about other nodes >>>>>> through >>>>>> other nodes. I am running on about 200 machines, and it seems >>>>>> like all >>>>>> the Pyxida nodes are adding each other to their upNeighbors set. >>>>>> NCClient.java has the MAX_NEIGHBORS set to 512. >>>>>> >>>>>> I wanted to make sure that I am not misreading the code, but >>>>>> shouldn't >>>>>> each Pyxida node keep a small number of neighbors in its >>>>>> upNeighbors >>>>>> set. Else, it will be a lot of updates to send a gossipMessage >>>>>> to all >>>>>> 200 neighbors, and it also means that a Pyxida node has the whole >>>>>> network in its upNeighbors set - that seems wrong. I am inclined >>>>>> to >>>>>> change the value of MAX_NEIGHBORS but wanted to make sure that my >>>>>> understanding is correct first. >>>>>> >>>>>> Thank you, >>>>>> >>>>>> Eric >>>>>> >>>> >>>> >>>> >>>> ------------------------------------------------------------------------- >>>> This SF.Net email is sponsored by the Moblin Your Move Developer's >>>> challenge Build the coolest Linux based applications with Moblin SDK & >>>> win great prizes Grand prize is a trip for two to an Open Source event >>>> anywhere in the world >>>> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >>>> _______________________________________________ Pyxida-users mailing >>>> list Pyx...@li... >>>> https://lists.sourceforge.net/lists/listinfo/pyxida-users > |