Re: [Pyxida-users] Pyxida developer + help

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

This is an interesting situation.  Let's think about the goal.  The  
goal is to keep pinging the same nodes as long as they are alive, and  
to replace a node soon after it dies, keeping the total number of up  
neighbors at approximately MAX_NEIGHBORS.

How about the following:
- when you pick a neighbor to ping, you set a time on that event (in a  
new per-neighbor timestamp LastPingTime)
- when you first insert a neighbor, set this timestamp to 0.
- when you loop in performMaintenance, remove nodes whose LastPingTime  
- LastUpdateTime > threshold (like 10 minutes)

- when picking a node, pick the guy with the oldest LastPingTime (not  
sure about this one -- I don't like situations where you pick the  
absolute *oldest* because then we could get in a situation where an  
old guy sits there, waiting to be kicked out, and we keep picking him  
to be pinged; I'd prefer to pick randomly, or random older guys).
  - a simple way around this is to simply have a self-adjusting  
threshold: so, we walk through the nodes, and everyone whose  
LastPingTime is older than MIN_UPDATE_TIME_TO_PING ago (a relative  
time) get added to a set; we pick a guy randomly from the set to  
ping.  If the set is empty, we increase MIN_UPDATE_TIME_TO_PING, and  
wait for the next time the function is called.  If the set is too  
large (relative to a fraction of MAX_NEIGHBORS), we decrease it.

We still might get some churn from lost UDP packets.  You might also  
give nodes a second chance to be pinged by including LastPingTime -  
LastUpdateTime > 1 minute in the set.  Note that increases and  
decreases in MIN_UPDATE_TIME_TO_PING shouldn't be affected by the  
number of nodes that fall into this category.

We still probably need a way for new potential neighbors to be  
rejected if our current set of neighbors is fine.  Because I still  
want addNeigbor to really add the neighbor, how about putting a check  
before calling that to see if we need more neighbors.

What thinks?

-jonathan

On Dec 9, 2008, at 11:37 PM, ext Eric Chan-Tin wrote:

> Peter,
>
> Thanks. That fixed it.
>
> However, here is another problem I found while deploying Pyxida and
> running them for a while.
>
> In performMaintenance(), I loop through every neighbor and if
> (neighbor.getLastUpdateTime() < current_time - 10 minutes), then I
> remove that neighbor.

>
>
> In NCManager, I call getNeighborToPing, which randomly picks one of  
> the
> neighbors (32 in this case). Pinging is performed every 10 seconds, so
> in 10 minutes, you can ping 60 nodes. However, if a neighbor A is not
> among those 60 nodes, it will get kicked out in performMaintenance(),
> although A is still alive (it just hasn't been contacted yet).
>
> There are some possible options.
>
> 1) Increase the timeout period from 10 minutes to ?? minutes. This
> doesn't quite work because say in Azureus, the number of neighbors is
> 512, so the timeout needs to be increased even more... It could be a
> function of MAX_NEIGHBORS...
>
> 2) When Pyxida replies to a ping request, it will update the  
> neighbor's
> lastUpdateTime. This does not quite work either because it assumes a
> symmetric relationship which is most likely not the case.
>
> 3) Introduce some more logic to Pyxida so it remembers what nodes it
> contacted and does not send, say, more than 2 requests in 10 minutes  
> to
> the same node. This solution isn't that great either.
>
> Let me know what you guys think. I am currently leaning towards option
> 1) but make the timeout period a function of MAX_NEIGHBORS, although I
> haven't quite found the perfect function yet.
>
> Here are some preliminary results. I compared the number of neighbors
> which are the same at each reported time (10 minutes). Before the fix,
> more than 50% of the neighbors changed within 10 minutes. After the  
> fix,
> only about 30% of the neighbors changed within 10 minutes. It is more
> stable but I think 30% is still quite a big number.
>
> Sorry for the long email :)
>
> -Eric
>
> Peter Pietzuch wrote:
>> Hi,
>>
>> Eric Chan-Tin wrote:
>>> for (RemoteState<T> neighbor : neighbors) { //this is line 1029
>>>  if (neighbor.getLastUpdateTime() < (curr_time -  
>>> MAINTENANCE_PERIOD)) {
>>>       neighbors.remove(neighbor);
>>>  }
>>> }
>>
>> IIRC you can't modify a collection while iterating over it using the
>> above for loop. Instead you have to use a proper
>> Iterator<RemoteState<T>> and use it's remove method.
>>
>> Cheers,
>>  Peter
>>
>>> Jonathan Ledlie wrote:
>>>> On Nov 3, 2008, at 3:46 PM, Eric Chan-Tin wrote:
>>>>
>>>>> Jonathan,
>>>>>
>>>>> I can insert some code to fix this and it doesn't look too hard  
>>>>> but I do
>>>>> have some concerns.
>>>>>
>>>>>> I don't remember observing this behavior in our PL
>>>>>> deployment, but that could have been because we had the ping  
>>>>>> time set
>>>>>> to 1-2 minutes, IIRC.
>>>>> The default ping time was set to 10 seconds + some Gaussian  
>>>>> distribution,
>>>>> so that's what I used, but changing the ping time to 1 minute or  
>>>>> more
>>>>> will
>>>>> result in the same effect except it will take longer for the  
>>>>> "cycle" to
>>>>> occur.
>>>>>
>>>>> What I have in mind was the following:
>>>>>
>>>>> 1) In NCClient.addNeighbor() would be changed to (in pseudocode)
>>>>>       protected boolean addNeighbor(RemoteState<T> guy) {
>>>>>               boolean added = false;
>>>>>               //if neighbors.size() >= MAX_NEIGHBORS
>>>>>                  //Check last update time of each neighbor in  
>>>>> neighbors
>>>>>                  //neighbor.getLastUpdateTime()
>>>>>                  //if any of them is older than some number  
>>>>> TIMEOUT,
>>>>> remove the oldest one, then add "guy" to neighbors
>>>>>                  //else don't add "guy" to neighbors
>>>>>               //else add "guy" to neighbors
>>>>>       }
>>>>>
>>>> IMO this shouldn't be the behavior of this function because it is  
>>>> called
>>>> "addNeighbor."  Instead what could happen is that part of the
>>>> performMaintenance() function could remove old neighbors.
>>>>
>>>>> 2) In NCManager, getGossipNode() would just pick a random node  
>>>>> from
>>>>> localNC.primaryNC.getNeighbors() to send a request to every 10  
>>>>> seconds.
>>>>>
>>>>> 3) Pyxida will still keep track of all GossipNodes in the  
>>>>> network. It
>>>>> could
>>>>> also (say every 10 minutes) check its NCClient.neighbors list  
>>>>> and see if
>>>>> any neighbor is older than TIMEOUT, remove it, then pick a random
>>>>> GossipNode to send a ping request.
>>>>>
>>>>> The main problem with this scheme is that one Pyxida node would  
>>>>> most
>>>>> likely
>>>>> stay with the same set of neighbors all the time. Its coordinates
>>>>> would be
>>>>> greatly affected by those 32 nodes (I am assuming MAX_NEIGHBORS  
>>>>> = 32).
>>>>>
>>>> Instead of (2), why not have the getNeighborToPing function  
>>>> determine
>>>> the behavior?  If it returns a node, then ping it.  If it returns  
>>>> null,
>>>> then have the NCManager pick some other node to use?  If things are
>>>> hunky-dory, getNeighborToPing() would return a node (a current  
>>>> member of
>>>> the neighbor set), if we're short on neighbors, then it would  
>>>> return
>>>> null.  The periodic maintenance would have already tossed old  
>>>> nodes, so
>>>> this would be self-sustaining.
>>>>
>>>> The "problem" you mention is fairly central to network  
>>>> coordinates in
>>>> general.  Sure the coordinates will be a little better the more
>>>> neighbors you have.  But in a very large system talking to  
>>>> everybody
>>>> isn't feasible.  Peter and I have a graph in one of the workshop  
>>>> papers
>>>> which shows the effect of increasing the number of neighbors
>>>>
>>>>> I don't think the network would be partitioned, but it could  
>>>>> happen. For
>>>>> example, the boostrap nodes have each other in their neighbors  
>>>>> list.
>>>>> Every
>>>>> new node bootstrapping to the network would contact those  
>>>>> bootstrapping
>>>>> nodes first and add them to their neighbors. So every Pyxida  
>>>>> node could
>>>>> have as its neighbors the "Bootstrap nodes", which would make  
>>>>> Pyxida
>>>>> become
>>>>> a centralized system.
>>>>>
>>>> As long as the initial bootstrap list is large enough, I think  
>>>> you are
>>>> right, that partitions won't be a problem.
>>>>
>>>>> One last thing: When Pyxida receives a ping request from a node,  
>>>>> it does
>>>>> not update its coordinates. Is there any particular reason for  
>>>>> that?
>>>>> Is it
>>>>> because Pyxida will not be able to calculate the RTT?
>>>>>
>>>> Yes.
>>>>
>>>>> Also, there is no "gossiping" of other nodes' coordinates,  
>>>>> right? So if A
>>>>> has B in its neighbors list, it will from time to time tell  
>>>>> other nodes
>>>>> about B's coordinates. I realize this has a lot of complications  
>>>>> but the
>>>>> original Vivaldi paper was very fuzzy about this point and the  
>>>>> previous
>>>>> point that it seems it is up to the implementors to decide what  
>>>>> to do.
>>>>>
>>>> I don't think we are doing this.  It could be useful in certain
>>>> circumstance (e.g., picking potential new neighbors), but we're  
>>>> not, IIRC.
>>>>
>>>>> I will have to think some more about this as there are a lot of  
>>>>> nasty
>>>>> little side effects that could happen, but these are my thoughts.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> -Eric
>>>>>
>>>> Sounds good.
>>>>
>>>> -jonathan
>>>>
>>>>> On Nov 3 2008, Jonathan Ledlie wrote:
>>>>>
>>>>>> Eric,
>>>>>>
>>>>>> Based on a look at the code just now, I see where you are  
>>>>>> coming from:
>>>>>> with these params on PL, there isn't anything stopping the  
>>>>>> neighbor
>>>>>> list from circulating pretty quickly through all nodes.  In the
>>>>>> Azureus implementation, the guys fed to the alg (fed to  
>>>>>> processSample)
>>>>>> weren't changing much because they were coming from the routing  
>>>>>> finger
>>>>>> tables.  I don't remember observing this behavior in our PL
>>>>>> deployment, but that could have been because we had the ping  
>>>>>> time set
>>>>>> to 1-2 minutes, IIRC.
>>>>>>
>>>>>> Do you want to insert some code that will fix this?  Basically  
>>>>>> we want
>>>>>> to only add a new neighbor if our oldest neighbor should be  
>>>>>> tossed (or
>>>>>> if we haven't reached MAX_NEIGHBORS).  It's probably best from a
>>>>>> clarity standpoint if addNeighbor does continue to actually add  
>>>>>> the
>>>>>> neighbor in all cases.  Probably the best thing to do is to have
>>>>>> NCClient drop old neighbors internally and have NCManager ask  
>>>>>> it which
>>>>>> nodes should be pinged via getNeighborToPing.  So NCClient would
>>>>>> continue to have the same behavior on processSample (it would not
>>>>>> reject new information) and NCManager and NCClient would  
>>>>>> cooperate to
>>>>>> measure RTTs to current neighbors.  We do want to make sure,  
>>>>>> however,
>>>>>> that the gossip set stays large enough so that partitions don't  
>>>>>> occur.
>>>>>>
>>>>>> Do you want to be added to the SourceForge Pyxida developers to  
>>>>>> make
>>>>>> this change?  (Please test it out carefully first.)
>>>>>>
>>>>>> -jonathan
>>>>>>
>>>>>> On Oct 30, 2008, at 11:32 AM, ext Eric Chan-Tin wrote:
>>>>>>
>>>>>>> Jonathan,
>>>>>>>
>>>>>>> From what I saw, eventually, you learn about all the nodes in  
>>>>>>> the
>>>>>>> network. Every 10 seconds, Pyxida picks a random "Gossip" node  
>>>>>>> from
>>>>>>> its
>>>>>>> list and sends a request. The Gossip node will reply back with  
>>>>>>> an
>>>>>>> "upNeighbor", that is, a "Gossip" node that is "up" or with 10%
>>>>>>> probability, a node that is "pending". Since naturally, all  
>>>>>>> the nodes
>>>>>>> bootstrap off the same set of nodes, every node will  
>>>>>>> eventually know
>>>>>>> of
>>>>>>> every other node.
>>>>>>>
>>>>>>> Pyxida might not be rotating through everybody continuously  
>>>>>>> (although
>>>>>>> eventually, since it knows everybody, it will have added  
>>>>>>> everybody to
>>>>>>> its Neighbors list at some point), but could be removing a  
>>>>>>> node that
>>>>>>> is
>>>>>>> "alive". So, if MAX_NEIGHBORS = 32, let's call them A_1,  
>>>>>>> A_2, ...,
>>>>>>> A_32.
>>>>>>> Pyxida picks a random Gossip node B, sends it a request and B  
>>>>>>> replies
>>>>>>> back. Pyxida will add B to its Neighbors list, see that size of
>>>>>>> Neighbors > MAX_NEIGHBORS, kick A_1 out. Then from those 32  
>>>>>>> neighbors
>>>>>>> A_2, ...., A_32, B, it will calculate the force, and apply it  
>>>>>>> to the
>>>>>>> current system coordinates. Thus, if every 10 seconds, Pyxida  
>>>>>>> picks a
>>>>>>> different set of nodes C, D, E, etc... it would keep kicking  
>>>>>>> nodes out
>>>>>>> of the neighbors list.
>>>>>>>
>>>>>>> It's the "kicking A_1 out" part that seemed a bit confusing.  
>>>>>>> Churn
>>>>>>> is a
>>>>>>> good reason but you don't check that A_1 might still be alive  
>>>>>>> or maybe
>>>>>>> A_1 was just added. I have been printing the neighbors list  
>>>>>>> every 10
>>>>>>> minutes, and they are completely different between those 10  
>>>>>>> minutes.
>>>>>>>
>>>>>>> One possible problem might be that I am choosing the value of
>>>>>>> MAX_NEIGHBORS to be too small. Originally it was set to 512  
>>>>>>> (which was
>>>>>>> meant for Azureus). Since I am using Planetlab where there are  
>>>>>>> about
>>>>>>> 400-500 "stable" nodes, I changed the value to 32. But I don't  
>>>>>>> think
>>>>>>> that should be a problem since log(n) = log(512) = 9.
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> -Eric
>>>>>>>
>>>>>>> Jonathan Ledlie wrote:
>>>>>>>> Eric,
>>>>>>>>
>>>>>>>> Glad to hear it.  Let Peter and me know if you publish  
>>>>>>>> anything using
>>>>>>>> Pyxida and we can put it on the web page.
>>>>>>>>
>>>>>>>> "Gossip" nodes are for learning about the network.   
>>>>>>>> "Neighbors" are
>>>>>>>> for
>>>>>>>> constructing the coordinate.
>>>>>>>>
>>>>>>>> So, IIRC, the gossip process learns about nodes that may or  
>>>>>>>> may not
>>>>>>>> be
>>>>>>>> applied to the neighbor list.  That is, we may eventually learn
>>>>>>>> about a
>>>>>>>> large portion of the network, via gossiping, but should only  
>>>>>>>> be using
>>>>>>>> MAX_NEIGHBORS at any given time for coordinate computation.   
>>>>>>>> The
>>>>>>>> reason
>>>>>>>> we don't want to use a fixed set of neighbors for coordinate
>>>>>>>> computation
>>>>>>>> is because nodes may come and go from the system.  However,  
>>>>>>>> it should
>>>>>>>> not be the case that we just rotate through everybody, because
>>>>>>>> then, as
>>>>>>>> you note, we'd be computing against everyone's coordinate/ 
>>>>>>>> latency,
>>>>>>>> and
>>>>>>>> this is counter to the point of network coordinates.  With this
>>>>>>>> gradual
>>>>>>>> transience as a desirable quality in mind, I am pretty sure we
>>>>>>>> maintain
>>>>>>>> neighbors in a fairly fixed manner.  Does this not seem to be  
>>>>>>>> the
>>>>>>>> case
>>>>>>>> to you?
>>>>>>>>
>>>>>>>> -jonathan
>>>>>>>>
>>>>>>>> On Oct 29, 2008, at 9:20 PM, Eric Chan-Tin wrote:
>>>>>>>>
>>>>>>>>> Hi Jonathan and Peter,
>>>>>>>>>
>>>>>>>>> First of all, just wanted to say that Pyxida has been working
>>>>>>>>> great on
>>>>>>>>> Planetlab for the past 4 months or so. Getting good data and
>>>>>>>>> experiments
>>>>>>>>> going well.
>>>>>>>>>
>>>>>>>>> However, I do have one question/comment.
>>>>>>>>>
>>>>>>>>> Every Pyxida node contains a list of Neighbors (in NCClient)  
>>>>>>>>> and a
>>>>>>>>> list
>>>>>>>>> of Gossip nodes (they are called neighbors in NCManager but  
>>>>>>>>> I will
>>>>>>>>> call
>>>>>>>>> them gossip for now so as not to confuse with the 'real'  
>>>>>>>>> neighbors
>>>>>>>>> in
>>>>>>>>> NCClient). Every 10 seconds, Pyxida will pick a random node  
>>>>>>>>> (call
>>>>>>>>> it A)
>>>>>>>>> from Gossip and send a gossip request (basically asking for  
>>>>>>>>> rtt,
>>>>>>>>> coordinate, error). But Gossip contains all the nodes in the
>>>>>>>>> network.
>>>>>>>>> The comment in the code said that was to avoid Pyxida becoming
>>>>>>>>> lonely
>>>>>>>>> and no-one to gossip with, which makes sense.
>>>>>>>>>
>>>>>>>>> Once A responds, Pyxida will add A to its Neighbors (if  
>>>>>>>>> Neighbors
>>>>>>>>> does
>>>>>>>>> not contain A) and remove the first element from the  
>>>>>>>>> Neighbors list.
>>>>>>>>> This seems a little bit odd. Was there any particular  
>>>>>>>>> reasoning
>>>>>>>>> behind
>>>>>>>>> this?  Because it would seem like you're getting the  
>>>>>>>>> coordinate from
>>>>>>>>> every node in the network eventually.
>>>>>>>>>
>>>>>>>>> Thanks a lot,
>>>>>>>>>
>>>>>>>>> -Eric
>>>>>>>>>
>>>>>>>>> Jonathan Ledlie wrote:
>>>>>>>>>> Your inclination is correct.  See forwarded message.
>>>>>>>>>>
>>>>>>>>>> -jonathan
>>>>>>>>>>
>>>>>>>>>> On Apr 9, 2008, at 1:53 PM, ext Eric Chan-Tin wrote:
>>>>>>>>>>
>>>>>>>>>>> Thank you for the clarification.
>>>>>>>>>>>
>>>>>>>>>>> I've been reading through your published papers and code  
>>>>>>>>>>> and I
>>>>>>>>>>> must
>>>>>>>>>>> say I learned a lot about how Pyxida works :)
>>>>>>>>>>>
>>>>>>>>>>> I still have a quick and easy (hopefully) question. Each  
>>>>>>>>>>> Pyxida
>>>>>>>>>>> node
>>>>>>>>>>> has a small amount of neighbors and learn about other nodes
>>>>>>>>>>> through
>>>>>>>>>>> other nodes. I am running on about 200 machines, and it  
>>>>>>>>>>> seems
>>>>>>>>>>> like all
>>>>>>>>>>> the Pyxida nodes are adding each other to their  
>>>>>>>>>>> upNeighbors set.
>>>>>>>>>>> NCClient.java has the MAX_NEIGHBORS set to 512.
>>>>>>>>>>>
>>>>>>>>>>> I wanted to make sure that I am not misreading the code, but
>>>>>>>>>>> shouldn't
>>>>>>>>>>> each Pyxida node keep a small number of neighbors in its
>>>>>>>>>>> upNeighbors
>>>>>>>>>>> set. Else, it will be a lot of updates to send a  
>>>>>>>>>>> gossipMessage
>>>>>>>>>>> to all
>>>>>>>>>>> 200 neighbors, and it also means that a Pyxida node has  
>>>>>>>>>>> the whole
>>>>>>>>>>> network in its upNeighbors set - that seems wrong. I am  
>>>>>>>>>>> inclined
>>>>>>>>>>> to
>>>>>>>>>>> change the value of MAX_NEIGHBORS but wanted to make sure  
>>>>>>>>>>> that my
>>>>>>>>>>> understanding is correct first.
>>>>>>>>>>>
>>>>>>>>>>> Thank you,
>>>>>>>>>>>
>>>>>>>>>>> Eric
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -------------------------------------------------------------------------
>>>>>>>>>
>>>>>>>>> This SF.Net email is sponsored by the Moblin Your Move  
>>>>>>>>> Developer's
>>>>>>>>> challenge Build the coolest Linux based applications with  
>>>>>>>>> Moblin
>>>>>>>>> SDK &
>>>>>>>>> win great prizes Grand prize is a trip for two to an Open  
>>>>>>>>> Source
>>>>>>>>> event
>>>>>>>>> anywhere in the world
>>>>>>>>> http://moblin-contest.org/redirect.php?banner_id=100&url=/
>>>>>>>>> _______________________________________________ Pyxida-users  
>>>>>>>>> mailing
>>>>>>>>> list Pyx...@li...
>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/pyxida-users
>>>>> -------------------------------------------------------------------------
>>>>> This SF.Net email is sponsored by the Moblin Your Move Developer's
>>>>> challenge
>>>>> Build the coolest Linux based applications with Moblin SDK & win  
>>>>> great
>>>>> prizes
>>>>> Grand prize is a trip for two to an Open Source event anywhere  
>>>>> in the
>>>>> world
>>>>> http://moblin-contest.org/redirect.php?banner_id=100&url=/
>>>>> _______________________________________________
>>>>> Pyxida-users mailing list
>>>>> Pyx...@li...
>>>>> https://lists.sourceforge.net/lists/listinfo/pyxida-users
>>> ------------------------------------------------------------------------------
>>> SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las  
>>> Vegas, Nevada.
>>> The future of the web can't happen without you.  Join us at MIX09  
>>> to help
>>> pave the way to the Next Web now. Learn more and register at
>>> http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/
>>> _______________________________________________
>>> Pyxida-users mailing list
>>> Pyx...@li...
>>> https://lists.sourceforge.net/lists/listinfo/pyxida-users
>>
>>