Re: [Pyxida-users] Pyxida developer + help

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Eric,

I'm getting an error (an old favorite):
WA 1229649353501 NetUtil           : Could not deserialize java  
object: java.io.StreamCorruptedException: invalid type code: CC
WA 1229649353501 ObjComm$ReadConnHa: No valid msgObject received from  
remoteAddr=planet1.scs.stanford.edu:26500. Ignoring.

What java version are you running on PL?

-jonathan

On Dec 18, 2008, at 4:49 PM, Eric Chan-Tin wrote:

> Do you mean the Pyxida code running on PlanetLab? I killed them after
> running it overnight. I am currently restarting the experiments.  
> Try again
> in about 15 minutes and it should be ready.
>
> Fix1 is on port 28500 (and api.port 28501). Fix2 is on port 26500  
> (api.port
> 26501). The node you can contact is "planet1.scs.stanford.edu" or
> "planetlab01.cs.washington.edu".
>
> About your other email: Yes I know. The fix was to go through the list
> again, but I didn't want to implement that since it seems to be  
> working
> fine. So go through the list once to calculate weight, and add  
> neighbors
> into another list (sorted by weights). Then go through that new  
> list. The
> new list will be sorted by higher weights, so older neighbors will be
> "tested" first. If you want it that way, I can do that. However, I  
> do not
> think we need to. Let's assume iteration 0, the first node gets  
> picked. At
> iteration 1, that first node will have a very small probability of  
> getting
> picked again.
>
> -Eric
>
>
> On Dec 18 2008, Jonathan Ledlie wrote:
>
>>
>> Eric,
>>
>> Do you have a PL Pyxida slice that I can connect to to try this out?
>>
>> That is, I'll run it on my machine, but connect to your slice via the
>> right ports.
>>
>> -jonathan
>>
>> On Dec 18, 2008, at 11:35 AM, Eric Chan-Tin wrote:
>>
>>> Sorry for the delay. Was busy with final exams and projects.
>>>
>>> I made the changes. They can be found at
>>>
>>> http://www-users.cs.umn.edu/~dchantin/NCClient.java
>>> http://www-users.cs.umn.edu/~dchantin/NCManager.java
>>> http://www-users.cs.umn.edu/~dchantin/RemoteState.java
>>>
>>> All the changes are enclosed within
>>> //ERIC START
>>> ...
>>> //ERIC END
>>>
>>> I compared every reported "neighbors list" of each Pyxida node
>>> (about 370
>>> total) and see how many neighbors did not change within 10 minutes.
>>> Then
>>> did the average for all the nodes.
>>>
>>> Original: 43.37% of the neighbors stayed the same on average.
>>>
>>> Fix0 (before we found the interesting situation): 74.73%
>>>
>>> Fix1 (with the weight probability of picking a neighbor to ping):
>>> 95.79%
>>>
>>> Fix2 (with the weight probability and the MIN_UPDATE_PING): 98.81%
>>>
>>> Also, the reported error, rrl, etc.. were much lower in the Fixed
>>> Pyxida
>>> than in the original one (this could not be significant, since the
>>> Original
>>> experiment was performed 1-2 months ago), but as expected, it was  
>>> not
>>> higher.
>>>
>>> You can take a look at the code and test it if you want. If  
>>> everything
>>> looks fine, I'll remove some of the debugging code and other  
>>> "useless"
>>> comments (like //ERIC). Also, let me know which "fix" you prefer. I
>>> will
>>> then email you the updated code.
>>>
>>> Thanks,
>>>
>>> -Eric
>>>
>>>
>>> On Dec 10 2008, Jonathan Ledlie wrote:
>>>
>>>>
>>>> Seems reasonable.
>>>>
>>>> Both ideas (lastPingTime and a weighted probability based on age)
>>>> seem
>>>> to be independent.
>>>>
>>>> Some of the code is more CPU intensive than it should be (e.g.,
>>>> looping through potentially long lists), and that is a slight
>>>> downside
>>>> to the function approach vs a set, but this probably isn't a big
>>>> deal.
>>>>
>>>> -jonathan
>>>>
>>>> On Dec 10, 2008, at 10:40 PM, Eric Chan-Tin wrote:
>>>>
>>>>> okay. I think I might have misunderstood you. Your idea of
>>>>> lastPingTime
>>>>> looks interesting.
>>>>>
>>>>> My idea is to introduce a probability function. Right now, it just
>>>>> randomly picks one of the neighbors. What I am thinking it might
>>>>> do is
>>>>> that instead of just randomly picking a neighbor, pick a neighbor
>>>>> based
>>>>> on some function. So, the neighbor whose lastUpdateTime is oldest
>>>>> will
>>>>> have a higher chance of being picked than a neighbor who was just
>>>>> updated.
>>>>>
>>>>> Does that make more sense?
>>>>>
>>>>> -Eric
>>>>>
>>>>> Jonathan Ledlie wrote:
>>>>>>
>>>>>> Hmm.  The advantage of lastPingTime is that you learn whether or
>>>>>> not
>>>>>> you've tried to contact a neighbor and they haven't responded.
>>>>>> Then you
>>>>>> can use that knowledge to determine whether or not to toss them.
>>>>>>
>>>>>> Maybe I'm not understanding your suggestion then.
>>>>>>
>>>>>> -jonathan
>>>>>>
>>>>>> On Dec 10, 2008, at 12:51 PM, Eric Chan-Tin wrote:
>>>>>>
>>>>>>> That's a good idea. Here is a little twist on it.
>>>>>>>
>>>>>>> Put a "weight" on each neighbor, based on the lastUpdateTime. So
>>>>>>> if
>>>>>>> currentTime - neighbor's lastUpdateTime, then the weight is
>>>>>>> higher.
>>>>>>> Then in getNeighborToPing, neighbor's with higher weight have a
>>>>>>> higher
>>>>>>> probability of getting picked than neighbor's with lower weight.
>>>>>>>
>>>>>>> The weight could be ( (currentTime - lastUpdateTime) / 15
>>>>>>> minutes).
>>>>>>> It's not 10 minutes so as to avoid a probability of 1 if the
>>>>>>> lastUpdateTime was the previous performMaintenance. This should
>>>>>>> also
>>>>>>> avoid the problem of always pinging the oldest neighbor since  
>>>>>>> it's
>>>>>>> about probability (weight).
>>>>>>>
>>>>>>> It seems a little bit less complicated than having lastPingTime
>>>>>>> and
>>>>>>> MIN_UPDATE_TIME.
>>>>>>>
>>>>>>> -Eric
>>>>>>>
>>>>>>> On Dec 10 2008, Jonathan Ledlie wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> This is an interesting situation.  Let's think about the goal.
>>>>>>>> The
>>>>>>>> goal is to keep pinging the same nodes as long as they are  
>>>>>>>> alive,
>>>>>>>> and  to replace a node soon after it dies, keeping the total
>>>>>>>> number
>>>>>>>> of up  neighbors at approximately MAX_NEIGHBORS.
>>>>>>>>
>>>>>>>> How about the following:
>>>>>>>> - when you pick a neighbor to ping, you set a time on that  
>>>>>>>> event
>>>>>>>> (in
>>>>>>>> a  new per-neighbor timestamp LastPingTime)
>>>>>>>> - when you first insert a neighbor, set this timestamp to 0.
>>>>>>>> - when you loop in performMaintenance, remove nodes whose
>>>>>>>> LastPingTime  - LastUpdateTime > threshold (like 10 minutes)
>>>>>>>>
>>>>>>>> - when picking a node, pick the guy with the oldest  
>>>>>>>> LastPingTime
>>>>>>>> (not  sure about this one -- I don't like situations where you
>>>>>>>> pick
>>>>>>>> the  absolute *oldest* because then we could get in a situation
>>>>>>>> where
>>>>>>>> an  old guy sits there, waiting to be kicked out, and we keep
>>>>>>>> picking
>>>>>>>> him  to be pinged; I'd prefer to pick randomly, or random older
>>>>>>>> guys).
>>>>>>>> - a simple way around this is to simply have a self-adjusting
>>>>>>>> threshold: so, we walk through the nodes, and everyone whose
>>>>>>>> LastPingTime is older than MIN_UPDATE_TIME_TO_PING ago (a
>>>>>>>> relative
>>>>>>>> time) get added to a set; we pick a guy randomly from the  
>>>>>>>> set to
>>>>>>>> ping.  If the set is empty, we increase  
>>>>>>>> MIN_UPDATE_TIME_TO_PING,
>>>>>>>> and
>>>>>>>> wait for the next time the function is called.  If the set  
>>>>>>>> is too
>>>>>>>> large (relative to a fraction of MAX_NEIGHBORS), we decrease  
>>>>>>>> it.
>>>>>>>>
>>>>>>>> We still might get some churn from lost UDP packets.  You might
>>>>>>>> also
>>>>>>>> give nodes a second chance to be pinged by including
>>>>>>>> LastPingTime -
>>>>>>>> LastUpdateTime > 1 minute in the set.  Note that increases and
>>>>>>>> decreases in MIN_UPDATE_TIME_TO_PING shouldn't be affected  
>>>>>>>> by the
>>>>>>>> number of nodes that fall into this category.
>>>>>>>>
>>>>>>>> We still probably need a way for new potential neighbors to be
>>>>>>>> rejected if our current set of neighbors is fine.  Because I
>>>>>>>> still
>>>>>>>> want addNeigbor to really add the neighbor, how about putting a
>>>>>>>> check  before calling that to see if we need more neighbors.
>>>>>>>>
>>>>>>>> What thinks?
>>>>>>>>
>>>>>>>> -jonathan
>>>>>>>>
>>>>>>>> On Dec 9, 2008, at 11:37 PM, ext Eric Chan-Tin wrote:
>>>>>>>>
>>>>>>>>> Peter,
>>>>>>>>>
>>>>>>>>> Thanks. That fixed it.
>>>>>>>>>
>>>>>>>>> However, here is another problem I found while deploying  
>>>>>>>>> Pyxida
>>>>>>>>> and
>>>>>>>>> running them for a while.
>>>>>>>>>
>>>>>>>>> In performMaintenance(), I loop through every neighbor and if
>>>>>>>>> (neighbor.getLastUpdateTime() < current_time - 10 minutes),
>>>>>>>>> then I
>>>>>>>>> remove that neighbor.
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> In NCManager, I call getNeighborToPing, which randomly  
>>>>>>>>> picks one
>>>>>>>>> of
>>>>>>>>> the
>>>>>>>>> neighbors (32 in this case). Pinging is performed every 10
>>>>>>>>> seconds, so
>>>>>>>>> in 10 minutes, you can ping 60 nodes. However, if a neighbor A
>>>>>>>>> is not
>>>>>>>>> among those 60 nodes, it will get kicked out in
>>>>>>>>> performMaintenance(),
>>>>>>>>> although A is still alive (it just hasn't been contacted yet).
>>>>>>>>>
>>>>>>>>> There are some possible options.
>>>>>>>>>
>>>>>>>>> 1) Increase the timeout period from 10 minutes to ?? minutes.
>>>>>>>>> This
>>>>>>>>> doesn't quite work because say in Azureus, the number of
>>>>>>>>> neighbors is
>>>>>>>>> 512, so the timeout needs to be increased even more... It  
>>>>>>>>> could
>>>>>>>>> be a
>>>>>>>>> function of MAX_NEIGHBORS...
>>>>>>>>>
>>>>>>>>> 2) When Pyxida replies to a ping request, it will update the
>>>>>>>>> neighbor's
>>>>>>>>> lastUpdateTime. This does not quite work either because it
>>>>>>>>> assumes a
>>>>>>>>> symmetric relationship which is most likely not the case.
>>>>>>>>>
>>>>>>>>> 3) Introduce some more logic to Pyxida so it remembers what
>>>>>>>>> nodes it
>>>>>>>>> contacted and does not send, say, more than 2 requests in 10
>>>>>>>>> minutes  to
>>>>>>>>> the same node. This solution isn't that great either.
>>>>>>>>>
>>>>>>>>> Let me know what you guys think. I am currently leaning  
>>>>>>>>> towards
>>>>>>>>> option
>>>>>>>>> 1) but make the timeout period a function of MAX_NEIGHBORS,
>>>>>>>>> although I
>>>>>>>>> haven't quite found the perfect function yet.
>>>>>>>>>
>>>>>>>>> Here are some preliminary results. I compared the number of
>>>>>>>>> neighbors
>>>>>>>>> which are the same at each reported time (10 minutes). Before
>>>>>>>>> the fix,
>>>>>>>>> more than 50% of the neighbors changed within 10 minutes.  
>>>>>>>>> After
>>>>>>>>> the
>>>>>>>>> fix,
>>>>>>>>> only about 30% of the neighbors changed within 10 minutes.  
>>>>>>>>> It is
>>>>>>>>> more
>>>>>>>>> stable but I think 30% is still quite a big number.
>>>>>>>>>
>>>>>>>>> Sorry for the long email :)
>>>>>>>>>
>>>>>>>>> -Eric
>>>>>>>>>
>>>>>>>>> Peter Pietzuch wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> Eric Chan-Tin wrote:
>>>>>>>>>>> for (RemoteState<T> neighbor : neighbors) { //this is line
>>>>>>>>>>> 1029
>>>>>>>>>>> if (neighbor.getLastUpdateTime() < (curr_time -
>>>>>>>>>>> MAINTENANCE_PERIOD)) {
>>>>>>>>>>>    neighbors.remove(neighbor);
>>>>>>>>>>> }
>>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> IIRC you can't modify a collection while iterating over it
>>>>>>>>>> using the
>>>>>>>>>> above for loop. Instead you have to use a proper
>>>>>>>>>> Iterator<RemoteState<T>> and use it's remove method.
>>>>>>>>>>
>>>>>>>>>> Cheers,
>>>>>>>>>> Peter
>>>>>>>>>>
>>>>>>>>>>> Jonathan Ledlie wrote:
>>>>>>>>>>>> On Nov 3, 2008, at 3:46 PM, Eric Chan-Tin wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Jonathan,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I can insert some code to fix this and it doesn't look too
>>>>>>>>>>>>> hard
>>>>>>>>>>>>> but I do
>>>>>>>>>>>>> have some concerns.
>>>>>>>>>>>>>
>>>>>>>>>>>>>> I don't remember observing this behavior in our PL
>>>>>>>>>>>>>> deployment, but that could have been because we had the
>>>>>>>>>>>>>> ping
>>>>>>>>>>>>>> time set
>>>>>>>>>>>>>> to 1-2 minutes, IIRC.
>>>>>>>>>>>>> The default ping time was set to 10 seconds + some  
>>>>>>>>>>>>> Gaussian
>>>>>>>>>>>>> distribution,
>>>>>>>>>>>>> so that's what I used, but changing the ping time to 1
>>>>>>>>>>>>> minute
>>>>>>>>>>>>> or  more
>>>>>>>>>>>>> will
>>>>>>>>>>>>> result in the same effect except it will take longer  
>>>>>>>>>>>>> for the
>>>>>>>>>>>>> "cycle" to
>>>>>>>>>>>>> occur.
>>>>>>>>>>>>>
>>>>>>>>>>>>> What I have in mind was the following:
>>>>>>>>>>>>>
>>>>>>>>>>>>> 1) In NCClient.addNeighbor() would be changed to (in
>>>>>>>>>>>>> pseudocode)
>>>>>>>>>>>>>    protected boolean addNeighbor(RemoteState<T> guy) {
>>>>>>>>>>>>>            boolean added = false;
>>>>>>>>>>>>>            //if neighbors.size() >= MAX_NEIGHBORS
>>>>>>>>>>>>>               //Check last update time of each neighbor in
>>>>>>>>>>>>> neighbors
>>>>>>>>>>>>>               //neighbor.getLastUpdateTime()
>>>>>>>>>>>>>               //if any of them is older than some number
>>>>>>>>>>>>> TIMEOUT,
>>>>>>>>>>>>> remove the oldest one, then add "guy" to neighbors
>>>>>>>>>>>>>               //else don't add "guy" to neighbors
>>>>>>>>>>>>>            //else add "guy" to neighbors
>>>>>>>>>>>>>    }
>>>>>>>>>>>>>
>>>>>>>>>>>> IMO this shouldn't be the behavior of this function because
>>>>>>>>>>>> it
>>>>>>>>>>>> is  called
>>>>>>>>>>>> "addNeighbor."  Instead what could happen is that part  
>>>>>>>>>>>> of the
>>>>>>>>>>>> performMaintenance() function could remove old neighbors.
>>>>>>>>>>>>
>>>>>>>>>>>>> 2) In NCManager, getGossipNode() would just pick a random
>>>>>>>>>>>>> node
>>>>>>>>>>>>> from
>>>>>>>>>>>>> localNC.primaryNC.getNeighbors() to send a request to
>>>>>>>>>>>>> every 10
>>>>>>>>>>>>> seconds.
>>>>>>>>>>>>>
>>>>>>>>>>>>> 3) Pyxida will still keep track of all GossipNodes in the
>>>>>>>>>>>>> network. It
>>>>>>>>>>>>> could
>>>>>>>>>>>>> also (say every 10 minutes) check its NCClient.neighbors
>>>>>>>>>>>>> list
>>>>>>>>>>>>> and see if
>>>>>>>>>>>>> any neighbor is older than TIMEOUT, remove it, then pick a
>>>>>>>>>>>>> random
>>>>>>>>>>>>> GossipNode to send a ping request.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The main problem with this scheme is that one Pyxida node
>>>>>>>>>>>>> would
>>>>>>>>>>>>> most
>>>>>>>>>>>>> likely
>>>>>>>>>>>>> stay with the same set of neighbors all the time. Its
>>>>>>>>>>>>> coordinates
>>>>>>>>>>>>> would be
>>>>>>>>>>>>> greatly affected by those 32 nodes (I am assuming
>>>>>>>>>>>>> MAX_NEIGHBORS
>>>>>>>>>>>>> = 32).
>>>>>>>>>>>>>
>>>>>>>>>>>> Instead of (2), why not have the getNeighborToPing function
>>>>>>>>>>>> determine
>>>>>>>>>>>> the behavior?  If it returns a node, then ping it.  If it
>>>>>>>>>>>> returns  null,
>>>>>>>>>>>> then have the NCManager pick some other node to use?  If
>>>>>>>>>>>> things are
>>>>>>>>>>>> hunky-dory, getNeighborToPing() would return a node (a
>>>>>>>>>>>> current
>>>>>>>>>>>> member of
>>>>>>>>>>>> the neighbor set), if we're short on neighbors, then it  
>>>>>>>>>>>> would
>>>>>>>>>>>> return
>>>>>>>>>>>> null.  The periodic maintenance would have already  
>>>>>>>>>>>> tossed old
>>>>>>>>>>>> nodes, so
>>>>>>>>>>>> this would be self-sustaining.
>>>>>>>>>>>>
>>>>>>>>>>>> The "problem" you mention is fairly central to network
>>>>>>>>>>>> coordinates in
>>>>>>>>>>>> general.  Sure the coordinates will be a little better the
>>>>>>>>>>>> more
>>>>>>>>>>>> neighbors you have.  But in a very large system talking to
>>>>>>>>>>>> everybody
>>>>>>>>>>>> isn't feasible.  Peter and I have a graph in one of the
>>>>>>>>>>>> workshop
>>>>>>>>>>>> papers
>>>>>>>>>>>> which shows the effect of increasing the number of  
>>>>>>>>>>>> neighbors
>>>>>>>>>>>>
>>>>>>>>>>>>> I don't think the network would be partitioned, but it  
>>>>>>>>>>>>> could
>>>>>>>>>>>>> happen. For
>>>>>>>>>>>>> example, the boostrap nodes have each other in their
>>>>>>>>>>>>> neighbors
>>>>>>>>>>>>> list.
>>>>>>>>>>>>> Every
>>>>>>>>>>>>> new node bootstrapping to the network would contact those
>>>>>>>>>>>>> bootstrapping
>>>>>>>>>>>>> nodes first and add them to their neighbors. So every  
>>>>>>>>>>>>> Pyxida
>>>>>>>>>>>>> node could
>>>>>>>>>>>>> have as its neighbors the "Bootstrap nodes", which would
>>>>>>>>>>>>> make
>>>>>>>>>>>>> Pyxida
>>>>>>>>>>>>> become
>>>>>>>>>>>>> a centralized system.
>>>>>>>>>>>>>
>>>>>>>>>>>> As long as the initial bootstrap list is large enough, I
>>>>>>>>>>>> think
>>>>>>>>>>>> you are
>>>>>>>>>>>> right, that partitions won't be a problem.
>>>>>>>>>>>>
>>>>>>>>>>>>> One last thing: When Pyxida receives a ping request from a
>>>>>>>>>>>>> node,  it does
>>>>>>>>>>>>> not update its coordinates. Is there any particular reason
>>>>>>>>>>>>> for
>>>>>>>>>>>>> that?
>>>>>>>>>>>>> Is it
>>>>>>>>>>>>> because Pyxida will not be able to calculate the RTT?
>>>>>>>>>>>>>
>>>>>>>>>>>> Yes.
>>>>>>>>>>>>
>>>>>>>>>>>>> Also, there is no "gossiping" of other nodes' coordinates,
>>>>>>>>>>>>> right? So if A
>>>>>>>>>>>>> has B in its neighbors list, it will from time to time  
>>>>>>>>>>>>> tell
>>>>>>>>>>>>> other nodes
>>>>>>>>>>>>> about B's coordinates. I realize this has a lot of
>>>>>>>>>>>>> complications  but the
>>>>>>>>>>>>> original Vivaldi paper was very fuzzy about this point and
>>>>>>>>>>>>> the
>>>>>>>>>>>>> previous
>>>>>>>>>>>>> point that it seems it is up to the implementors to decide
>>>>>>>>>>>>> what
>>>>>>>>>>>>> to do.
>>>>>>>>>>>>>
>>>>>>>>>>>> I don't think we are doing this.  It could be useful in
>>>>>>>>>>>> certain
>>>>>>>>>>>> circumstance (e.g., picking potential new neighbors), but
>>>>>>>>>>>> we're
>>>>>>>>>>>> not, IIRC.
>>>>>>>>>>>>
>>>>>>>>>>>>> I will have to think some more about this as there are  
>>>>>>>>>>>>> a lot
>>>>>>>>>>>>> of
>>>>>>>>>>>>> nasty
>>>>>>>>>>>>> little side effects that could happen, but these are my
>>>>>>>>>>>>> thoughts.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>
>>>>>>>>>>>>> -Eric
>>>>>>>>>>>>>
>>>>>>>>>>>> Sounds good.
>>>>>>>>>>>>
>>>>>>>>>>>> -jonathan
>>>>>>>>>>>>
>>>>>>>>>>>>> On Nov 3 2008, Jonathan Ledlie wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Eric,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Based on a look at the code just now, I see where you are
>>>>>>>>>>>>>> coming from:
>>>>>>>>>>>>>> with these params on PL, there isn't anything stopping  
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> neighbor
>>>>>>>>>>>>>> list from circulating pretty quickly through all  
>>>>>>>>>>>>>> nodes.  In
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> Azureus implementation, the guys fed to the alg (fed to
>>>>>>>>>>>>>> processSample)
>>>>>>>>>>>>>> weren't changing much because they were coming from the
>>>>>>>>>>>>>> routing  finger
>>>>>>>>>>>>>> tables.  I don't remember observing this behavior in  
>>>>>>>>>>>>>> our PL
>>>>>>>>>>>>>> deployment, but that could have been because we had the
>>>>>>>>>>>>>> ping
>>>>>>>>>>>>>> time set
>>>>>>>>>>>>>> to 1-2 minutes, IIRC.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Do you want to insert some code that will fix this?
>>>>>>>>>>>>>> Basically
>>>>>>>>>>>>>> we want
>>>>>>>>>>>>>> to only add a new neighbor if our oldest neighbor  
>>>>>>>>>>>>>> should be
>>>>>>>>>>>>>> tossed (or
>>>>>>>>>>>>>> if we haven't reached MAX_NEIGHBORS).  It's probably best
>>>>>>>>>>>>>> from a
>>>>>>>>>>>>>> clarity standpoint if addNeighbor does continue to  
>>>>>>>>>>>>>> actually
>>>>>>>>>>>>>> add  the
>>>>>>>>>>>>>> neighbor in all cases.  Probably the best thing to do  
>>>>>>>>>>>>>> is to
>>>>>>>>>>>>>> have
>>>>>>>>>>>>>> NCClient drop old neighbors internally and have NCManager
>>>>>>>>>>>>>> ask
>>>>>>>>>>>>>> it which
>>>>>>>>>>>>>> nodes should be pinged via getNeighborToPing.  So  
>>>>>>>>>>>>>> NCClient
>>>>>>>>>>>>>> would
>>>>>>>>>>>>>> continue to have the same behavior on processSample (it
>>>>>>>>>>>>>> would not
>>>>>>>>>>>>>> reject new information) and NCManager and NCClient would
>>>>>>>>>>>>>> cooperate to
>>>>>>>>>>>>>> measure RTTs to current neighbors.  We do want to make
>>>>>>>>>>>>>> sure,
>>>>>>>>>>>>>> however,
>>>>>>>>>>>>>> that the gossip set stays large enough so that partitions
>>>>>>>>>>>>>> don't  occur.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Do you want to be added to the SourceForge Pyxida
>>>>>>>>>>>>>> developers
>>>>>>>>>>>>>> to  make
>>>>>>>>>>>>>> this change?  (Please test it out carefully first.)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -jonathan
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Oct 30, 2008, at 11:32 AM, ext Eric Chan-Tin wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Jonathan,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> From what I saw, eventually, you learn about all the  
>>>>>>>>>>>>>>> nodes
>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> network. Every 10 seconds, Pyxida picks a random  
>>>>>>>>>>>>>>> "Gossip"
>>>>>>>>>>>>>>> node  from
>>>>>>>>>>>>>>> its
>>>>>>>>>>>>>>> list and sends a request. The Gossip node will reply  
>>>>>>>>>>>>>>> back
>>>>>>>>>>>>>>> with  an
>>>>>>>>>>>>>>> "upNeighbor", that is, a "Gossip" node that is "up" or
>>>>>>>>>>>>>>> with 10%
>>>>>>>>>>>>>>> probability, a node that is "pending". Since naturally,
>>>>>>>>>>>>>>> all
>>>>>>>>>>>>>>> the nodes
>>>>>>>>>>>>>>> bootstrap off the same set of nodes, every node will
>>>>>>>>>>>>>>> eventually know
>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>> every other node.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Pyxida might not be rotating through everybody
>>>>>>>>>>>>>>> continuously
>>>>>>>>>>>>>>> (although
>>>>>>>>>>>>>>> eventually, since it knows everybody, it will have added
>>>>>>>>>>>>>>> everybody to
>>>>>>>>>>>>>>> its Neighbors list at some point), but could be  
>>>>>>>>>>>>>>> removing a
>>>>>>>>>>>>>>> node that
>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>> "alive". So, if MAX_NEIGHBORS = 32, let's call them A_1,
>>>>>>>>>>>>>>> A_2,
>>>>>>>>>>>>>>> ...,
>>>>>>>>>>>>>>> A_32.
>>>>>>>>>>>>>>> Pyxida picks a random Gossip node B, sends it a request
>>>>>>>>>>>>>>> and B
>>>>>>>>>>>>>>> replies
>>>>>>>>>>>>>>> back. Pyxida will add B to its Neighbors list, see that
>>>>>>>>>>>>>>> size of
>>>>>>>>>>>>>>> Neighbors > MAX_NEIGHBORS, kick A_1 out. Then from those
>>>>>>>>>>>>>>> 32
>>>>>>>>>>>>>>> neighbors
>>>>>>>>>>>>>>> A_2, ...., A_32, B, it will calculate the force, and  
>>>>>>>>>>>>>>> apply
>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>> to the
>>>>>>>>>>>>>>> current system coordinates. Thus, if every 10 seconds,
>>>>>>>>>>>>>>> Pyxida
>>>>>>>>>>>>>>> picks a
>>>>>>>>>>>>>>> different set of nodes C, D, E, etc... it would keep
>>>>>>>>>>>>>>> kicking
>>>>>>>>>>>>>>> nodes out
>>>>>>>>>>>>>>> of the neighbors list.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> It's the "kicking A_1 out" part that seemed a bit
>>>>>>>>>>>>>>> confusing.
>>>>>>>>>>>>>>> Churn
>>>>>>>>>>>>>>> is a
>>>>>>>>>>>>>>> good reason but you don't check that A_1 might still be
>>>>>>>>>>>>>>> alive
>>>>>>>>>>>>>>> or maybe
>>>>>>>>>>>>>>> A_1 was just added. I have been printing the neighbors
>>>>>>>>>>>>>>> list
>>>>>>>>>>>>>>> every 10
>>>>>>>>>>>>>>> minutes, and they are completely different between those
>>>>>>>>>>>>>>> 10
>>>>>>>>>>>>>>> minutes.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> One possible problem might be that I am choosing the  
>>>>>>>>>>>>>>> value
>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>> MAX_NEIGHBORS to be too small. Originally it was set to
>>>>>>>>>>>>>>> 512
>>>>>>>>>>>>>>> (which was
>>>>>>>>>>>>>>> meant for Azureus). Since I am using Planetlab where  
>>>>>>>>>>>>>>> there
>>>>>>>>>>>>>>> are  about
>>>>>>>>>>>>>>> 400-500 "stable" nodes, I changed the value to 32. But I
>>>>>>>>>>>>>>> don't  think
>>>>>>>>>>>>>>> that should be a problem since log(n) = log(512) = 9.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -Eric
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Jonathan Ledlie wrote:
>>>>>>>>>>>>>>>> Eric,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Glad to hear it.  Let Peter and me know if you publish
>>>>>>>>>>>>>>>> anything using
>>>>>>>>>>>>>>>> Pyxida and we can put it on the web page.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> "Gossip" nodes are for learning about the network.
>>>>>>>>>>>>>>>> "Neighbors" are
>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>> constructing the coordinate.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> So, IIRC, the gossip process learns about nodes that
>>>>>>>>>>>>>>>> may or
>>>>>>>>>>>>>>>> may not
>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>> applied to the neighbor list.  That is, we may  
>>>>>>>>>>>>>>>> eventually
>>>>>>>>>>>>>>>> learn
>>>>>>>>>>>>>>>> about a
>>>>>>>>>>>>>>>> large portion of the network, via gossiping, but should
>>>>>>>>>>>>>>>> only
>>>>>>>>>>>>>>>> be using
>>>>>>>>>>>>>>>> MAX_NEIGHBORS at any given time for coordinate
>>>>>>>>>>>>>>>> computation.
>>>>>>>>>>>>>>>> The
>>>>>>>>>>>>>>>> reason
>>>>>>>>>>>>>>>> we don't want to use a fixed set of neighbors for
>>>>>>>>>>>>>>>> coordinate
>>>>>>>>>>>>>>>> computation
>>>>>>>>>>>>>>>> is because nodes may come and go from the system.
>>>>>>>>>>>>>>>> However,
>>>>>>>>>>>>>>>> it should
>>>>>>>>>>>>>>>> not be the case that we just rotate through everybody,
>>>>>>>>>>>>>>>> because
>>>>>>>>>>>>>>>> then, as
>>>>>>>>>>>>>>>> you note, we'd be computing against everyone's
>>>>>>>>>>>>>>>> coordinate/
>>>>>>>>>>>>>>>> latency,
>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>> this is counter to the point of network coordinates.
>>>>>>>>>>>>>>>> With this
>>>>>>>>>>>>>>>> gradual
>>>>>>>>>>>>>>>> transience as a desirable quality in mind, I am pretty
>>>>>>>>>>>>>>>> sure we
>>>>>>>>>>>>>>>> maintain
>>>>>>>>>>>>>>>> neighbors in a fairly fixed manner.  Does this not seem
>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>> be  the
>>>>>>>>>>>>>>>> case
>>>>>>>>>>>>>>>> to you?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> -jonathan
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Oct 29, 2008, at 9:20 PM, Eric Chan-Tin wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi Jonathan and Peter,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> First of all, just wanted to say that Pyxida has been
>>>>>>>>>>>>>>>>> working
>>>>>>>>>>>>>>>>> great on
>>>>>>>>>>>>>>>>> Planetlab for the past 4 months or so. Getting good  
>>>>>>>>>>>>>>>>> data
>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>> experiments
>>>>>>>>>>>>>>>>> going well.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> However, I do have one question/comment.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Every Pyxida node contains a list of Neighbors (in
>>>>>>>>>>>>>>>>> NCClient)  and a
>>>>>>>>>>>>>>>>> list
>>>>>>>>>>>>>>>>> of Gossip nodes (they are called neighbors in  
>>>>>>>>>>>>>>>>> NCManager
>>>>>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>> I will
>>>>>>>>>>>>>>>>> call
>>>>>>>>>>>>>>>>> them gossip for now so as not to confuse with the  
>>>>>>>>>>>>>>>>> 'real'
>>>>>>>>>>>>>>>>> neighbors
>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>> NCClient). Every 10 seconds, Pyxida will pick a random
>>>>>>>>>>>>>>>>> node
>>>>>>>>>>>>>>>>> (call
>>>>>>>>>>>>>>>>> it A)
>>>>>>>>>>>>>>>>> from Gossip and send a gossip request (basically  
>>>>>>>>>>>>>>>>> asking
>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>> rtt,
>>>>>>>>>>>>>>>>> coordinate, error). But Gossip contains all the  
>>>>>>>>>>>>>>>>> nodes in
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> network.
>>>>>>>>>>>>>>>>> The comment in the code said that was to avoid Pyxida
>>>>>>>>>>>>>>>>> becoming
>>>>>>>>>>>>>>>>> lonely
>>>>>>>>>>>>>>>>> and no-one to gossip with, which makes sense.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Once A responds, Pyxida will add A to its Neighbors  
>>>>>>>>>>>>>>>>> (if
>>>>>>>>>>>>>>>>> Neighbors
>>>>>>>>>>>>>>>>> does
>>>>>>>>>>>>>>>>> not contain A) and remove the first element from the
>>>>>>>>>>>>>>>>> Neighbors list.
>>>>>>>>>>>>>>>>> This seems a little bit odd. Was there any particular
>>>>>>>>>>>>>>>>> reasoning
>>>>>>>>>>>>>>>>> behind
>>>>>>>>>>>>>>>>> this?  Because it would seem like you're getting the
>>>>>>>>>>>>>>>>> coordinate from
>>>>>>>>>>>>>>>>> every node in the network eventually.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks a lot,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> -Eric
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Jonathan Ledlie wrote:
>>>>>>>>>>>>>>>>>> Your inclination is correct.  See forwarded message.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> -jonathan
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Apr 9, 2008, at 1:53 PM, ext Eric Chan-Tin wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thank you for the clarification.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I've been reading through your published papers and
>>>>>>>>>>>>>>>>>>> code
>>>>>>>>>>>>>>>>>>> and I
>>>>>>>>>>>>>>>>>>> must
>>>>>>>>>>>>>>>>>>> say I learned a lot about how Pyxida works :)
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I still have a quick and easy (hopefully) question.
>>>>>>>>>>>>>>>>>>> Each
>>>>>>>>>>>>>>>>>>> Pyxida
>>>>>>>>>>>>>>>>>>> node
>>>>>>>>>>>>>>>>>>> has a small amount of neighbors and learn about  
>>>>>>>>>>>>>>>>>>> other
>>>>>>>>>>>>>>>>>>> nodes
>>>>>>>>>>>>>>>>>>> through
>>>>>>>>>>>>>>>>>>> other nodes. I am running on about 200 machines, and
>>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>> seems
>>>>>>>>>>>>>>>>>>> like all
>>>>>>>>>>>>>>>>>>> the Pyxida nodes are adding each other to their
>>>>>>>>>>>>>>>>>>> upNeighbors set.
>>>>>>>>>>>>>>>>>>> NCClient.java has the MAX_NEIGHBORS set to 512.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I wanted to make sure that I am not misreading the
>>>>>>>>>>>>>>>>>>> code, but
>>>>>>>>>>>>>>>>>>> shouldn't
>>>>>>>>>>>>>>>>>>> each Pyxida node keep a small number of neighbors in
>>>>>>>>>>>>>>>>>>> its
>>>>>>>>>>>>>>>>>>> upNeighbors
>>>>>>>>>>>>>>>>>>> set. Else, it will be a lot of updates to send a
>>>>>>>>>>>>>>>>>>> gossipMessage
>>>>>>>>>>>>>>>>>>> to all
>>>>>>>>>>>>>>>>>>> 200 neighbors, and it also means that a Pyxida node
>>>>>>>>>>>>>>>>>>> has
>>>>>>>>>>>>>>>>>>> the whole
>>>>>>>>>>>>>>>>>>> network in its upNeighbors set - that seems wrong. I
>>>>>>>>>>>>>>>>>>> am
>>>>>>>>>>>>>>>>>>> inclined
>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>> change the value of MAX_NEIGHBORS but wanted to make
>>>>>>>>>>>>>>>>>>> sure
>>>>>>>>>>>>>>>>>>> that my
>>>>>>>>>>>>>>>>>>> understanding is correct first.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thank you,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Eric
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> ------------------------------------------------------ 
>>>>>>>>>>>>>>>>> -------------------
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> This SF.Net email is sponsored by the Moblin Your Move
>>>>>>>>>>>>>>>>> Developer's
>>>>>>>>>>>>>>>>> challenge Build the coolest Linux based applications
>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>> Moblin
>>>>>>>>>>>>>>>>> SDK &
>>>>>>>>>>>>>>>>> win great prizes Grand prize is a trip for two to an
>>>>>>>>>>>>>>>>> Open
>>>>>>>>>>>>>>>>> Source
>>>>>>>>>>>>>>>>> event
>>>>>>>>>>>>>>>>> anywhere in the world
>>>>>>>>>>>>>>>>> http://moblin-contest.org/redirect.php? 
>>>>>>>>>>>>>>>>> banner_id=100&url=/
>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>> Pyxida-users  mailing
>>>>>>>>>>>>>>>>> list Pyx...@li...
>>>>>>>>>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/pyxida-
>>>>>>>>>>>>>>>>> users
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> ---------------------------------------------------------- 
>>>>>>>>>>>>> ---------------
>>>>>>>>>>>>> This SF.Net email is sponsored by the Moblin Your Move
>>>>>>>>>>>>> Developer's challenge Build the coolest Linux based
>>>>>>>>>>>>> applications with Moblin SDK & win great prizes Grand  
>>>>>>>>>>>>> prize is
>>>>>>>>>>>>> a trip for two to an Open Source event anywhere in the  
>>>>>>>>>>>>> world
>>>>>>>>>>>>> http://moblin-contest.org/redirect.php?banner_id=100&url=/
>>>>>>>>>>>>> _______________________________________________ Pyxida- 
>>>>>>>>>>>>> users
>>>>>>>>>>>>> mailing list Pyx...@li...
>>>>>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/pyxida-users
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ------------------------------------------------------------ 
>>>>>>>>>>> ------------------
>>>>>>>>>>> SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las
>>>>>>>>>>> Vegas, Nevada. The future of the web can't happen without  
>>>>>>>>>>> you.
>>>>>>>>>>> Join us at MIX09 to help pave the way to the Next Web  
>>>>>>>>>>> now. Learn
>>>>>>>>>>> more and register at
>>>>>>>>>>> http://ad.doubleclick.net/clk;208669438;13503038;i?http:// 
>>>>>>>>>>> 2009.visitmix.com/
>>>>>>>>>>> _______________________________________________ Pyxida-users
>>>>>>>>>>> mailing list Pyx...@li...
>>>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/pyxida-users
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>
>>>
>>>
>>>
>>>
>>> -------------------------------------------------------------------- 
>>> ----------
>>> SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas,
>>> Nevada. The future of the web can't happen without you. Join us  
>>> at MIX09
>>> to help pave the way to the Next Web now. Learn more and register at
>>> http://ad.doubleclick.net/clk;208669438;13503038;i?http:// 
>>> 2009.visitmix.com/
>>> _______________________________________________ Pyxida-users mailing
>>> list Pyx...@li...
>>> https://lists.sourceforge.net/lists/listinfo/pyxida-users
>>
>
>
>
> ---------------------------------------------------------------------- 
> --------
> SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas,  
> Nevada.
> The future of the web can't happen without you.  Join us at MIX09  
> to help
> pave the way to the Next Web now. Learn more and register at
> http://ad.doubleclick.net/clk;208669438;13503038;i?http:// 
> 2009.visitmix.com/
> _______________________________________________
> Pyxida-users mailing list
> Pyx...@li...
> https://lists.sourceforge.net/lists/listinfo/pyxida-users