From: Peter P. <pr...@do...> - 2008-12-09 09:37:04
|
Hi, Eric Chan-Tin wrote: > for (RemoteState<T> neighbor : neighbors) { //this is line 1029 > if (neighbor.getLastUpdateTime() < (curr_time - MAINTENANCE_PERIOD)) { > neighbors.remove(neighbor); > } > } IIRC you can't modify a collection while iterating over it using the above for loop. Instead you have to use a proper Iterator<RemoteState<T>> and use it's remove method. Cheers, Peter > Jonathan Ledlie wrote: >> On Nov 3, 2008, at 3:46 PM, Eric Chan-Tin wrote: >> >>> Jonathan, >>> >>> I can insert some code to fix this and it doesn't look too hard but I do >>> have some concerns. >>> >>>> I don't remember observing this behavior in our PL >>>> deployment, but that could have been because we had the ping time set >>>> to 1-2 minutes, IIRC. >>> The default ping time was set to 10 seconds + some Gaussian distribution, >>> so that's what I used, but changing the ping time to 1 minute or more >>> will >>> result in the same effect except it will take longer for the "cycle" to >>> occur. >>> >>> What I have in mind was the following: >>> >>> 1) In NCClient.addNeighbor() would be changed to (in pseudocode) >>> protected boolean addNeighbor(RemoteState<T> guy) { >>> boolean added = false; >>> //if neighbors.size() >= MAX_NEIGHBORS >>> //Check last update time of each neighbor in neighbors >>> //neighbor.getLastUpdateTime() >>> //if any of them is older than some number TIMEOUT, >>> remove the oldest one, then add "guy" to neighbors >>> //else don't add "guy" to neighbors >>> //else add "guy" to neighbors >>> } >>> >> IMO this shouldn't be the behavior of this function because it is called >> "addNeighbor." Instead what could happen is that part of the >> performMaintenance() function could remove old neighbors. >> >>> 2) In NCManager, getGossipNode() would just pick a random node from >>> localNC.primaryNC.getNeighbors() to send a request to every 10 seconds. >>> >>> 3) Pyxida will still keep track of all GossipNodes in the network. It >>> could >>> also (say every 10 minutes) check its NCClient.neighbors list and see if >>> any neighbor is older than TIMEOUT, remove it, then pick a random >>> GossipNode to send a ping request. >>> >>> The main problem with this scheme is that one Pyxida node would most >>> likely >>> stay with the same set of neighbors all the time. Its coordinates >>> would be >>> greatly affected by those 32 nodes (I am assuming MAX_NEIGHBORS = 32). >>> >> Instead of (2), why not have the getNeighborToPing function determine >> the behavior? If it returns a node, then ping it. If it returns null, >> then have the NCManager pick some other node to use? If things are >> hunky-dory, getNeighborToPing() would return a node (a current member of >> the neighbor set), if we're short on neighbors, then it would return >> null. The periodic maintenance would have already tossed old nodes, so >> this would be self-sustaining. >> >> The "problem" you mention is fairly central to network coordinates in >> general. Sure the coordinates will be a little better the more >> neighbors you have. But in a very large system talking to everybody >> isn't feasible. Peter and I have a graph in one of the workshop papers >> which shows the effect of increasing the number of neighbors >> >>> I don't think the network would be partitioned, but it could happen. For >>> example, the boostrap nodes have each other in their neighbors list. >>> Every >>> new node bootstrapping to the network would contact those bootstrapping >>> nodes first and add them to their neighbors. So every Pyxida node could >>> have as its neighbors the "Bootstrap nodes", which would make Pyxida >>> become >>> a centralized system. >>> >> As long as the initial bootstrap list is large enough, I think you are >> right, that partitions won't be a problem. >> >>> One last thing: When Pyxida receives a ping request from a node, it does >>> not update its coordinates. Is there any particular reason for that? >>> Is it >>> because Pyxida will not be able to calculate the RTT? >>> >> Yes. >> >>> Also, there is no "gossiping" of other nodes' coordinates, right? So if A >>> has B in its neighbors list, it will from time to time tell other nodes >>> about B's coordinates. I realize this has a lot of complications but the >>> original Vivaldi paper was very fuzzy about this point and the previous >>> point that it seems it is up to the implementors to decide what to do. >>> >> I don't think we are doing this. It could be useful in certain >> circumstance (e.g., picking potential new neighbors), but we're not, IIRC. >> >>> I will have to think some more about this as there are a lot of nasty >>> little side effects that could happen, but these are my thoughts. >>> >>> Thanks, >>> >>> -Eric >>> >> Sounds good. >> >> -jonathan >> >>> On Nov 3 2008, Jonathan Ledlie wrote: >>> >>>> Eric, >>>> >>>> Based on a look at the code just now, I see where you are coming from: >>>> with these params on PL, there isn't anything stopping the neighbor >>>> list from circulating pretty quickly through all nodes. In the >>>> Azureus implementation, the guys fed to the alg (fed to processSample) >>>> weren't changing much because they were coming from the routing finger >>>> tables. I don't remember observing this behavior in our PL >>>> deployment, but that could have been because we had the ping time set >>>> to 1-2 minutes, IIRC. >>>> >>>> Do you want to insert some code that will fix this? Basically we want >>>> to only add a new neighbor if our oldest neighbor should be tossed (or >>>> if we haven't reached MAX_NEIGHBORS). It's probably best from a >>>> clarity standpoint if addNeighbor does continue to actually add the >>>> neighbor in all cases. Probably the best thing to do is to have >>>> NCClient drop old neighbors internally and have NCManager ask it which >>>> nodes should be pinged via getNeighborToPing. So NCClient would >>>> continue to have the same behavior on processSample (it would not >>>> reject new information) and NCManager and NCClient would cooperate to >>>> measure RTTs to current neighbors. We do want to make sure, however, >>>> that the gossip set stays large enough so that partitions don't occur. >>>> >>>> Do you want to be added to the SourceForge Pyxida developers to make >>>> this change? (Please test it out carefully first.) >>>> >>>> -jonathan >>>> >>>> On Oct 30, 2008, at 11:32 AM, ext Eric Chan-Tin wrote: >>>> >>>>> Jonathan, >>>>> >>>>> From what I saw, eventually, you learn about all the nodes in the >>>>> network. Every 10 seconds, Pyxida picks a random "Gossip" node from >>>>> its >>>>> list and sends a request. The Gossip node will reply back with an >>>>> "upNeighbor", that is, a "Gossip" node that is "up" or with 10% >>>>> probability, a node that is "pending". Since naturally, all the nodes >>>>> bootstrap off the same set of nodes, every node will eventually know >>>>> of >>>>> every other node. >>>>> >>>>> Pyxida might not be rotating through everybody continuously (although >>>>> eventually, since it knows everybody, it will have added everybody to >>>>> its Neighbors list at some point), but could be removing a node that >>>>> is >>>>> "alive". So, if MAX_NEIGHBORS = 32, let's call them A_1, A_2, ..., >>>>> A_32. >>>>> Pyxida picks a random Gossip node B, sends it a request and B replies >>>>> back. Pyxida will add B to its Neighbors list, see that size of >>>>> Neighbors > MAX_NEIGHBORS, kick A_1 out. Then from those 32 neighbors >>>>> A_2, ...., A_32, B, it will calculate the force, and apply it to the >>>>> current system coordinates. Thus, if every 10 seconds, Pyxida picks a >>>>> different set of nodes C, D, E, etc... it would keep kicking nodes out >>>>> of the neighbors list. >>>>> >>>>> It's the "kicking A_1 out" part that seemed a bit confusing. Churn >>>>> is a >>>>> good reason but you don't check that A_1 might still be alive or maybe >>>>> A_1 was just added. I have been printing the neighbors list every 10 >>>>> minutes, and they are completely different between those 10 minutes. >>>>> >>>>> One possible problem might be that I am choosing the value of >>>>> MAX_NEIGHBORS to be too small. Originally it was set to 512 (which was >>>>> meant for Azureus). Since I am using Planetlab where there are about >>>>> 400-500 "stable" nodes, I changed the value to 32. But I don't think >>>>> that should be a problem since log(n) = log(512) = 9. >>>>> >>>>> Thanks, >>>>> >>>>> -Eric >>>>> >>>>> Jonathan Ledlie wrote: >>>>>> Eric, >>>>>> >>>>>> Glad to hear it. Let Peter and me know if you publish anything using >>>>>> Pyxida and we can put it on the web page. >>>>>> >>>>>> "Gossip" nodes are for learning about the network. "Neighbors" are >>>>>> for >>>>>> constructing the coordinate. >>>>>> >>>>>> So, IIRC, the gossip process learns about nodes that may or may not >>>>>> be >>>>>> applied to the neighbor list. That is, we may eventually learn >>>>>> about a >>>>>> large portion of the network, via gossiping, but should only be using >>>>>> MAX_NEIGHBORS at any given time for coordinate computation. The >>>>>> reason >>>>>> we don't want to use a fixed set of neighbors for coordinate >>>>>> computation >>>>>> is because nodes may come and go from the system. However, it should >>>>>> not be the case that we just rotate through everybody, because >>>>>> then, as >>>>>> you note, we'd be computing against everyone's coordinate/latency, >>>>>> and >>>>>> this is counter to the point of network coordinates. With this >>>>>> gradual >>>>>> transience as a desirable quality in mind, I am pretty sure we >>>>>> maintain >>>>>> neighbors in a fairly fixed manner. Does this not seem to be the >>>>>> case >>>>>> to you? >>>>>> >>>>>> -jonathan >>>>>> >>>>>> On Oct 29, 2008, at 9:20 PM, Eric Chan-Tin wrote: >>>>>> >>>>>>> Hi Jonathan and Peter, >>>>>>> >>>>>>> First of all, just wanted to say that Pyxida has been working >>>>>>> great on >>>>>>> Planetlab for the past 4 months or so. Getting good data and >>>>>>> experiments >>>>>>> going well. >>>>>>> >>>>>>> However, I do have one question/comment. >>>>>>> >>>>>>> Every Pyxida node contains a list of Neighbors (in NCClient) and a >>>>>>> list >>>>>>> of Gossip nodes (they are called neighbors in NCManager but I will >>>>>>> call >>>>>>> them gossip for now so as not to confuse with the 'real' neighbors >>>>>>> in >>>>>>> NCClient). Every 10 seconds, Pyxida will pick a random node (call >>>>>>> it A) >>>>>>> from Gossip and send a gossip request (basically asking for rtt, >>>>>>> coordinate, error). But Gossip contains all the nodes in the >>>>>>> network. >>>>>>> The comment in the code said that was to avoid Pyxida becoming >>>>>>> lonely >>>>>>> and no-one to gossip with, which makes sense. >>>>>>> >>>>>>> Once A responds, Pyxida will add A to its Neighbors (if Neighbors >>>>>>> does >>>>>>> not contain A) and remove the first element from the Neighbors list. >>>>>>> This seems a little bit odd. Was there any particular reasoning >>>>>>> behind >>>>>>> this? Because it would seem like you're getting the coordinate from >>>>>>> every node in the network eventually. >>>>>>> >>>>>>> Thanks a lot, >>>>>>> >>>>>>> -Eric >>>>>>> >>>>>>> Jonathan Ledlie wrote: >>>>>>>> Your inclination is correct. See forwarded message. >>>>>>>> >>>>>>>> -jonathan >>>>>>>> >>>>>>>> On Apr 9, 2008, at 1:53 PM, ext Eric Chan-Tin wrote: >>>>>>>> >>>>>>>>> Thank you for the clarification. >>>>>>>>> >>>>>>>>> I've been reading through your published papers and code and I >>>>>>>>> must >>>>>>>>> say I learned a lot about how Pyxida works :) >>>>>>>>> >>>>>>>>> I still have a quick and easy (hopefully) question. Each Pyxida >>>>>>>>> node >>>>>>>>> has a small amount of neighbors and learn about other nodes >>>>>>>>> through >>>>>>>>> other nodes. I am running on about 200 machines, and it seems >>>>>>>>> like all >>>>>>>>> the Pyxida nodes are adding each other to their upNeighbors set. >>>>>>>>> NCClient.java has the MAX_NEIGHBORS set to 512. >>>>>>>>> >>>>>>>>> I wanted to make sure that I am not misreading the code, but >>>>>>>>> shouldn't >>>>>>>>> each Pyxida node keep a small number of neighbors in its >>>>>>>>> upNeighbors >>>>>>>>> set. Else, it will be a lot of updates to send a gossipMessage >>>>>>>>> to all >>>>>>>>> 200 neighbors, and it also means that a Pyxida node has the whole >>>>>>>>> network in its upNeighbors set - that seems wrong. I am inclined >>>>>>>>> to >>>>>>>>> change the value of MAX_NEIGHBORS but wanted to make sure that my >>>>>>>>> understanding is correct first. >>>>>>>>> >>>>>>>>> Thank you, >>>>>>>>> >>>>>>>>> Eric >>>>>>>>> >>>>>>> >>>>>>> >>>>>>> ------------------------------------------------------------------------- >>>>>>> >>>>>>> This SF.Net email is sponsored by the Moblin Your Move Developer's >>>>>>> challenge Build the coolest Linux based applications with Moblin >>>>>>> SDK & >>>>>>> win great prizes Grand prize is a trip for two to an Open Source >>>>>>> event >>>>>>> anywhere in the world >>>>>>> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >>>>>>> _______________________________________________ Pyxida-users mailing >>>>>>> list Pyx...@li... >>>>>>> https://lists.sourceforge.net/lists/listinfo/pyxida-users >>> ------------------------------------------------------------------------- >>> This SF.Net email is sponsored by the Moblin Your Move Developer's >>> challenge >>> Build the coolest Linux based applications with Moblin SDK & win great >>> prizes >>> Grand prize is a trip for two to an Open Source event anywhere in the >>> world >>> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >>> _______________________________________________ >>> Pyxida-users mailing list >>> Pyx...@li... >>> https://lists.sourceforge.net/lists/listinfo/pyxida-users > > ------------------------------------------------------------------------------ > SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada. > The future of the web can't happen without you. Join us at MIX09 to help > pave the way to the Next Web now. Learn more and register at > http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/ > _______________________________________________ > Pyxida-users mailing list > Pyx...@li... > https://lists.sourceforge.net/lists/listinfo/pyxida-users -- Dr Peter R Pietzuch Department of Computing, Imperial College London South Kensington Campus, 180 Queen's Gate, London SW7 2AZ, UK Telephone: +44 (20) 7594 8314 http://www.doc.ic.ac.uk/~prp |