|
From: Peter P. <pr...@do...> - 2008-12-09 09:37:04
|
Hi,
Eric Chan-Tin wrote:
> for (RemoteState<T> neighbor : neighbors) { //this is line 1029
> if (neighbor.getLastUpdateTime() < (curr_time - MAINTENANCE_PERIOD)) {
> neighbors.remove(neighbor);
> }
> }
IIRC you can't modify a collection while iterating over it using the
above for loop. Instead you have to use a proper
Iterator<RemoteState<T>> and use it's remove method.
Cheers,
Peter
> Jonathan Ledlie wrote:
>> On Nov 3, 2008, at 3:46 PM, Eric Chan-Tin wrote:
>>
>>> Jonathan,
>>>
>>> I can insert some code to fix this and it doesn't look too hard but I do
>>> have some concerns.
>>>
>>>> I don't remember observing this behavior in our PL
>>>> deployment, but that could have been because we had the ping time set
>>>> to 1-2 minutes, IIRC.
>>> The default ping time was set to 10 seconds + some Gaussian distribution,
>>> so that's what I used, but changing the ping time to 1 minute or more
>>> will
>>> result in the same effect except it will take longer for the "cycle" to
>>> occur.
>>>
>>> What I have in mind was the following:
>>>
>>> 1) In NCClient.addNeighbor() would be changed to (in pseudocode)
>>> protected boolean addNeighbor(RemoteState<T> guy) {
>>> boolean added = false;
>>> //if neighbors.size() >= MAX_NEIGHBORS
>>> //Check last update time of each neighbor in neighbors
>>> //neighbor.getLastUpdateTime()
>>> //if any of them is older than some number TIMEOUT,
>>> remove the oldest one, then add "guy" to neighbors
>>> //else don't add "guy" to neighbors
>>> //else add "guy" to neighbors
>>> }
>>>
>> IMO this shouldn't be the behavior of this function because it is called
>> "addNeighbor." Instead what could happen is that part of the
>> performMaintenance() function could remove old neighbors.
>>
>>> 2) In NCManager, getGossipNode() would just pick a random node from
>>> localNC.primaryNC.getNeighbors() to send a request to every 10 seconds.
>>>
>>> 3) Pyxida will still keep track of all GossipNodes in the network. It
>>> could
>>> also (say every 10 minutes) check its NCClient.neighbors list and see if
>>> any neighbor is older than TIMEOUT, remove it, then pick a random
>>> GossipNode to send a ping request.
>>>
>>> The main problem with this scheme is that one Pyxida node would most
>>> likely
>>> stay with the same set of neighbors all the time. Its coordinates
>>> would be
>>> greatly affected by those 32 nodes (I am assuming MAX_NEIGHBORS = 32).
>>>
>> Instead of (2), why not have the getNeighborToPing function determine
>> the behavior? If it returns a node, then ping it. If it returns null,
>> then have the NCManager pick some other node to use? If things are
>> hunky-dory, getNeighborToPing() would return a node (a current member of
>> the neighbor set), if we're short on neighbors, then it would return
>> null. The periodic maintenance would have already tossed old nodes, so
>> this would be self-sustaining.
>>
>> The "problem" you mention is fairly central to network coordinates in
>> general. Sure the coordinates will be a little better the more
>> neighbors you have. But in a very large system talking to everybody
>> isn't feasible. Peter and I have a graph in one of the workshop papers
>> which shows the effect of increasing the number of neighbors
>>
>>> I don't think the network would be partitioned, but it could happen. For
>>> example, the boostrap nodes have each other in their neighbors list.
>>> Every
>>> new node bootstrapping to the network would contact those bootstrapping
>>> nodes first and add them to their neighbors. So every Pyxida node could
>>> have as its neighbors the "Bootstrap nodes", which would make Pyxida
>>> become
>>> a centralized system.
>>>
>> As long as the initial bootstrap list is large enough, I think you are
>> right, that partitions won't be a problem.
>>
>>> One last thing: When Pyxida receives a ping request from a node, it does
>>> not update its coordinates. Is there any particular reason for that?
>>> Is it
>>> because Pyxida will not be able to calculate the RTT?
>>>
>> Yes.
>>
>>> Also, there is no "gossiping" of other nodes' coordinates, right? So if A
>>> has B in its neighbors list, it will from time to time tell other nodes
>>> about B's coordinates. I realize this has a lot of complications but the
>>> original Vivaldi paper was very fuzzy about this point and the previous
>>> point that it seems it is up to the implementors to decide what to do.
>>>
>> I don't think we are doing this. It could be useful in certain
>> circumstance (e.g., picking potential new neighbors), but we're not, IIRC.
>>
>>> I will have to think some more about this as there are a lot of nasty
>>> little side effects that could happen, but these are my thoughts.
>>>
>>> Thanks,
>>>
>>> -Eric
>>>
>> Sounds good.
>>
>> -jonathan
>>
>>> On Nov 3 2008, Jonathan Ledlie wrote:
>>>
>>>> Eric,
>>>>
>>>> Based on a look at the code just now, I see where you are coming from:
>>>> with these params on PL, there isn't anything stopping the neighbor
>>>> list from circulating pretty quickly through all nodes. In the
>>>> Azureus implementation, the guys fed to the alg (fed to processSample)
>>>> weren't changing much because they were coming from the routing finger
>>>> tables. I don't remember observing this behavior in our PL
>>>> deployment, but that could have been because we had the ping time set
>>>> to 1-2 minutes, IIRC.
>>>>
>>>> Do you want to insert some code that will fix this? Basically we want
>>>> to only add a new neighbor if our oldest neighbor should be tossed (or
>>>> if we haven't reached MAX_NEIGHBORS). It's probably best from a
>>>> clarity standpoint if addNeighbor does continue to actually add the
>>>> neighbor in all cases. Probably the best thing to do is to have
>>>> NCClient drop old neighbors internally and have NCManager ask it which
>>>> nodes should be pinged via getNeighborToPing. So NCClient would
>>>> continue to have the same behavior on processSample (it would not
>>>> reject new information) and NCManager and NCClient would cooperate to
>>>> measure RTTs to current neighbors. We do want to make sure, however,
>>>> that the gossip set stays large enough so that partitions don't occur.
>>>>
>>>> Do you want to be added to the SourceForge Pyxida developers to make
>>>> this change? (Please test it out carefully first.)
>>>>
>>>> -jonathan
>>>>
>>>> On Oct 30, 2008, at 11:32 AM, ext Eric Chan-Tin wrote:
>>>>
>>>>> Jonathan,
>>>>>
>>>>> From what I saw, eventually, you learn about all the nodes in the
>>>>> network. Every 10 seconds, Pyxida picks a random "Gossip" node from
>>>>> its
>>>>> list and sends a request. The Gossip node will reply back with an
>>>>> "upNeighbor", that is, a "Gossip" node that is "up" or with 10%
>>>>> probability, a node that is "pending". Since naturally, all the nodes
>>>>> bootstrap off the same set of nodes, every node will eventually know
>>>>> of
>>>>> every other node.
>>>>>
>>>>> Pyxida might not be rotating through everybody continuously (although
>>>>> eventually, since it knows everybody, it will have added everybody to
>>>>> its Neighbors list at some point), but could be removing a node that
>>>>> is
>>>>> "alive". So, if MAX_NEIGHBORS = 32, let's call them A_1, A_2, ...,
>>>>> A_32.
>>>>> Pyxida picks a random Gossip node B, sends it a request and B replies
>>>>> back. Pyxida will add B to its Neighbors list, see that size of
>>>>> Neighbors > MAX_NEIGHBORS, kick A_1 out. Then from those 32 neighbors
>>>>> A_2, ...., A_32, B, it will calculate the force, and apply it to the
>>>>> current system coordinates. Thus, if every 10 seconds, Pyxida picks a
>>>>> different set of nodes C, D, E, etc... it would keep kicking nodes out
>>>>> of the neighbors list.
>>>>>
>>>>> It's the "kicking A_1 out" part that seemed a bit confusing. Churn
>>>>> is a
>>>>> good reason but you don't check that A_1 might still be alive or maybe
>>>>> A_1 was just added. I have been printing the neighbors list every 10
>>>>> minutes, and they are completely different between those 10 minutes.
>>>>>
>>>>> One possible problem might be that I am choosing the value of
>>>>> MAX_NEIGHBORS to be too small. Originally it was set to 512 (which was
>>>>> meant for Azureus). Since I am using Planetlab where there are about
>>>>> 400-500 "stable" nodes, I changed the value to 32. But I don't think
>>>>> that should be a problem since log(n) = log(512) = 9.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> -Eric
>>>>>
>>>>> Jonathan Ledlie wrote:
>>>>>> Eric,
>>>>>>
>>>>>> Glad to hear it. Let Peter and me know if you publish anything using
>>>>>> Pyxida and we can put it on the web page.
>>>>>>
>>>>>> "Gossip" nodes are for learning about the network. "Neighbors" are
>>>>>> for
>>>>>> constructing the coordinate.
>>>>>>
>>>>>> So, IIRC, the gossip process learns about nodes that may or may not
>>>>>> be
>>>>>> applied to the neighbor list. That is, we may eventually learn
>>>>>> about a
>>>>>> large portion of the network, via gossiping, but should only be using
>>>>>> MAX_NEIGHBORS at any given time for coordinate computation. The
>>>>>> reason
>>>>>> we don't want to use a fixed set of neighbors for coordinate
>>>>>> computation
>>>>>> is because nodes may come and go from the system. However, it should
>>>>>> not be the case that we just rotate through everybody, because
>>>>>> then, as
>>>>>> you note, we'd be computing against everyone's coordinate/latency,
>>>>>> and
>>>>>> this is counter to the point of network coordinates. With this
>>>>>> gradual
>>>>>> transience as a desirable quality in mind, I am pretty sure we
>>>>>> maintain
>>>>>> neighbors in a fairly fixed manner. Does this not seem to be the
>>>>>> case
>>>>>> to you?
>>>>>>
>>>>>> -jonathan
>>>>>>
>>>>>> On Oct 29, 2008, at 9:20 PM, Eric Chan-Tin wrote:
>>>>>>
>>>>>>> Hi Jonathan and Peter,
>>>>>>>
>>>>>>> First of all, just wanted to say that Pyxida has been working
>>>>>>> great on
>>>>>>> Planetlab for the past 4 months or so. Getting good data and
>>>>>>> experiments
>>>>>>> going well.
>>>>>>>
>>>>>>> However, I do have one question/comment.
>>>>>>>
>>>>>>> Every Pyxida node contains a list of Neighbors (in NCClient) and a
>>>>>>> list
>>>>>>> of Gossip nodes (they are called neighbors in NCManager but I will
>>>>>>> call
>>>>>>> them gossip for now so as not to confuse with the 'real' neighbors
>>>>>>> in
>>>>>>> NCClient). Every 10 seconds, Pyxida will pick a random node (call
>>>>>>> it A)
>>>>>>> from Gossip and send a gossip request (basically asking for rtt,
>>>>>>> coordinate, error). But Gossip contains all the nodes in the
>>>>>>> network.
>>>>>>> The comment in the code said that was to avoid Pyxida becoming
>>>>>>> lonely
>>>>>>> and no-one to gossip with, which makes sense.
>>>>>>>
>>>>>>> Once A responds, Pyxida will add A to its Neighbors (if Neighbors
>>>>>>> does
>>>>>>> not contain A) and remove the first element from the Neighbors list.
>>>>>>> This seems a little bit odd. Was there any particular reasoning
>>>>>>> behind
>>>>>>> this? Because it would seem like you're getting the coordinate from
>>>>>>> every node in the network eventually.
>>>>>>>
>>>>>>> Thanks a lot,
>>>>>>>
>>>>>>> -Eric
>>>>>>>
>>>>>>> Jonathan Ledlie wrote:
>>>>>>>> Your inclination is correct. See forwarded message.
>>>>>>>>
>>>>>>>> -jonathan
>>>>>>>>
>>>>>>>> On Apr 9, 2008, at 1:53 PM, ext Eric Chan-Tin wrote:
>>>>>>>>
>>>>>>>>> Thank you for the clarification.
>>>>>>>>>
>>>>>>>>> I've been reading through your published papers and code and I
>>>>>>>>> must
>>>>>>>>> say I learned a lot about how Pyxida works :)
>>>>>>>>>
>>>>>>>>> I still have a quick and easy (hopefully) question. Each Pyxida
>>>>>>>>> node
>>>>>>>>> has a small amount of neighbors and learn about other nodes
>>>>>>>>> through
>>>>>>>>> other nodes. I am running on about 200 machines, and it seems
>>>>>>>>> like all
>>>>>>>>> the Pyxida nodes are adding each other to their upNeighbors set.
>>>>>>>>> NCClient.java has the MAX_NEIGHBORS set to 512.
>>>>>>>>>
>>>>>>>>> I wanted to make sure that I am not misreading the code, but
>>>>>>>>> shouldn't
>>>>>>>>> each Pyxida node keep a small number of neighbors in its
>>>>>>>>> upNeighbors
>>>>>>>>> set. Else, it will be a lot of updates to send a gossipMessage
>>>>>>>>> to all
>>>>>>>>> 200 neighbors, and it also means that a Pyxida node has the whole
>>>>>>>>> network in its upNeighbors set - that seems wrong. I am inclined
>>>>>>>>> to
>>>>>>>>> change the value of MAX_NEIGHBORS but wanted to make sure that my
>>>>>>>>> understanding is correct first.
>>>>>>>>>
>>>>>>>>> Thank you,
>>>>>>>>>
>>>>>>>>> Eric
>>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> -------------------------------------------------------------------------
>>>>>>>
>>>>>>> This SF.Net email is sponsored by the Moblin Your Move Developer's
>>>>>>> challenge Build the coolest Linux based applications with Moblin
>>>>>>> SDK &
>>>>>>> win great prizes Grand prize is a trip for two to an Open Source
>>>>>>> event
>>>>>>> anywhere in the world
>>>>>>> http://moblin-contest.org/redirect.php?banner_id=100&url=/
>>>>>>> _______________________________________________ Pyxida-users mailing
>>>>>>> list Pyx...@li...
>>>>>>> https://lists.sourceforge.net/lists/listinfo/pyxida-users
>>> -------------------------------------------------------------------------
>>> This SF.Net email is sponsored by the Moblin Your Move Developer's
>>> challenge
>>> Build the coolest Linux based applications with Moblin SDK & win great
>>> prizes
>>> Grand prize is a trip for two to an Open Source event anywhere in the
>>> world
>>> http://moblin-contest.org/redirect.php?banner_id=100&url=/
>>> _______________________________________________
>>> Pyxida-users mailing list
>>> Pyx...@li...
>>> https://lists.sourceforge.net/lists/listinfo/pyxida-users
>
> ------------------------------------------------------------------------------
> SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
> The future of the web can't happen without you. Join us at MIX09 to help
> pave the way to the Next Web now. Learn more and register at
> http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/
> _______________________________________________
> Pyxida-users mailing list
> Pyx...@li...
> https://lists.sourceforge.net/lists/listinfo/pyxida-users
--
Dr Peter R Pietzuch
Department of Computing, Imperial College London
South Kensington Campus, 180 Queen's Gate, London SW7 2AZ, UK
Telephone: +44 (20) 7594 8314 http://www.doc.ic.ac.uk/~prp
|