You can subscribe to this list here.
2006 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(2) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2007 |
Jan
|
Feb
|
Mar
(2) |
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(1) |
Nov
|
Dec
(1) |
2008 |
Jan
|
Feb
|
Mar
(8) |
Apr
(13) |
May
(11) |
Jun
(8) |
Jul
(19) |
Aug
(11) |
Sep
(18) |
Oct
(20) |
Nov
(22) |
Dec
(50) |
2009 |
Jan
(9) |
Feb
|
Mar
|
Apr
(21) |
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Eric Chan-T. <dch...@cs...> - 2008-12-19 21:18:25
|
Let me know if the error goes back up. I will then just run the new code on my PL slice and see what happens. -Eric Jonathan Ledlie wrote: > > What's strange is it did stabilize. Then it became unstable. It was > down to 0.1-0.2 error and then shot up to >0.9. > > I've restarted my node (jameson.eecs.harvard.edu). > > -jonathan > > On Dec 19, 2008, at 12:26 PM, Eric Chan-Tin wrote: > >> Looked at your changes and they look fine. One thing is that your node >> is never "stabilized" (in your code it's based on error, in mine it's >> based on time). Did you try to restart it and do you still get a high >> error? The error on my nodes is about 0.1-0.2. >> >> I could also run the new code on Planetlab and see what happens (I'll >> probably kill my experiments and restart them at the same time) >> >> -Eric >> |
From: Jonathan L. <jon...@ee...> - 2008-12-19 15:09:57
|
Eric, I ran the new code last night and got a weird result. Things were humming along smoothly and then at about midnight the system seemed to fall over. All node's errors (including my own) went very high. Any ideas? -jonathan On Dec 18, 2008, at 8:29 PM, Jonathan Ledlie wrote: > > http://java.sun.com/javase/downloads/index_jdk5.jsp > > "J2SE 5.0 is in its Java Technology End of Life (EOL) transition > period." > > :-) > > On Dec 18, 2008, at 8:24 PM, Eric Chan-Tin wrote: > >> jdk-1.5.0_09 >> >> Jonathan Ledlie wrote: >>> >>> Eric, >>> >>> I'm getting an error (an old favorite): >>> WA 1229649353501 NetUtil : Could not deserialize java >>> object: >>> java.io.StreamCorruptedException: invalid type code: CC >>> WA 1229649353501 ObjComm$ReadConnHa: No valid msgObject received >>> from >>> remoteAddr=planet1.scs.stanford.edu:26500. Ignoring. >>> >>> What java version are you running on PL? >>> >>> -jonathan >>> >>> On Dec 18, 2008, at 4:49 PM, Eric Chan-Tin wrote: >>> >>>> Do you mean the Pyxida code running on PlanetLab? I killed them >>>> after >>>> running it overnight. I am currently restarting the experiments. >>>> Try >>>> again >>>> in about 15 minutes and it should be ready. >>>> >>>> Fix1 is on port 28500 (and api.port 28501). Fix2 is on port 26500 >>>> (api.port >>>> 26501). The node you can contact is "planet1.scs.stanford.edu" or >>>> "planetlab01.cs.washington.edu". >>>> >>>> About your other email: Yes I know. The fix was to go through the >>>> list >>>> again, but I didn't want to implement that since it seems to be >>>> working >>>> fine. So go through the list once to calculate weight, and add >>>> neighbors >>>> into another list (sorted by weights). Then go through that new >>>> list. The >>>> new list will be sorted by higher weights, so older neighbors >>>> will be >>>> "tested" first. If you want it that way, I can do that. However, >>>> I do not >>>> think we need to. Let's assume iteration 0, the first node gets >>>> picked. At >>>> iteration 1, that first node will have a very small probability of >>>> getting >>>> picked again. >>>> >>>> -Eric >>>> >>>> >>>> On Dec 18 2008, Jonathan Ledlie wrote: >>>> >>>>> >>>>> Eric, >>>>> >>>>> Do you have a PL Pyxida slice that I can connect to to try this >>>>> out? >>>>> >>>>> That is, I'll run it on my machine, but connect to your slice >>>>> via the >>>>> right ports. >>>>> >>>>> -jonathan >>>>> >>>>> On Dec 18, 2008, at 11:35 AM, Eric Chan-Tin wrote: >>>>> >>>>>> Sorry for the delay. Was busy with final exams and projects. >>>>>> >>>>>> I made the changes. They can be found at >>>>>> >>>>>> http://www-users.cs.umn.edu/~dchantin/NCClient.java >>>>>> http://www-users.cs.umn.edu/~dchantin/NCManager.java >>>>>> http://www-users.cs.umn.edu/~dchantin/RemoteState.java >>>>>> >>>>>> All the changes are enclosed within >>>>>> //ERIC START >>>>>> ... >>>>>> //ERIC END >>>>>> >>>>>> I compared every reported "neighbors list" of each Pyxida node >>>>>> (about 370 >>>>>> total) and see how many neighbors did not change within 10 >>>>>> minutes. >>>>>> Then >>>>>> did the average for all the nodes. >>>>>> >>>>>> Original: 43.37% of the neighbors stayed the same on average. >>>>>> >>>>>> Fix0 (before we found the interesting situation): 74.73% >>>>>> >>>>>> Fix1 (with the weight probability of picking a neighbor to ping): >>>>>> 95.79% >>>>>> >>>>>> Fix2 (with the weight probability and the MIN_UPDATE_PING): >>>>>> 98.81% >>>>>> >>>>>> Also, the reported error, rrl, etc.. were much lower in the Fixed >>>>>> Pyxida >>>>>> than in the original one (this could not be significant, since >>>>>> the >>>>>> Original >>>>>> experiment was performed 1-2 months ago), but as expected, it >>>>>> was not >>>>>> higher. >>>>>> >>>>>> You can take a look at the code and test it if you want. If >>>>>> everything >>>>>> looks fine, I'll remove some of the debugging code and other >>>>>> "useless" >>>>>> comments (like //ERIC). Also, let me know which "fix" you >>>>>> prefer. I >>>>>> will >>>>>> then email you the updated code. >>>>>> >>>>>> Thanks, >>>>>> >>>>>> -Eric >>>>>> >>>>>> >>>>>> On Dec 10 2008, Jonathan Ledlie wrote: >>>>>> >>>>>>> >>>>>>> Seems reasonable. >>>>>>> >>>>>>> Both ideas (lastPingTime and a weighted probability based on >>>>>>> age) >>>>>>> seem >>>>>>> to be independent. >>>>>>> >>>>>>> Some of the code is more CPU intensive than it should be (e.g., >>>>>>> looping through potentially long lists), and that is a slight >>>>>>> downside >>>>>>> to the function approach vs a set, but this probably isn't a big >>>>>>> deal. >>>>>>> >>>>>>> -jonathan >>>>>>> >>>>>>> On Dec 10, 2008, at 10:40 PM, Eric Chan-Tin wrote: >>>>>>> >>>>>>>> okay. I think I might have misunderstood you. Your idea of >>>>>>>> lastPingTime >>>>>>>> looks interesting. >>>>>>>> >>>>>>>> My idea is to introduce a probability function. Right now, it >>>>>>>> just >>>>>>>> randomly picks one of the neighbors. What I am thinking it >>>>>>>> might >>>>>>>> do is >>>>>>>> that instead of just randomly picking a neighbor, pick a >>>>>>>> neighbor >>>>>>>> based >>>>>>>> on some function. So, the neighbor whose lastUpdateTime is >>>>>>>> oldest >>>>>>>> will >>>>>>>> have a higher chance of being picked than a neighbor who was >>>>>>>> just >>>>>>>> updated. >>>>>>>> >>>>>>>> Does that make more sense? >>>>>>>> >>>>>>>> -Eric >>>>>>>> >>>>>>>> Jonathan Ledlie wrote: >>>>>>>>> >>>>>>>>> Hmm. The advantage of lastPingTime is that you learn >>>>>>>>> whether or >>>>>>>>> not >>>>>>>>> you've tried to contact a neighbor and they haven't responded. >>>>>>>>> Then you >>>>>>>>> can use that knowledge to determine whether or not to toss >>>>>>>>> them. >>>>>>>>> >>>>>>>>> Maybe I'm not understanding your suggestion then. >>>>>>>>> >>>>>>>>> -jonathan >>>>>>>>> >>>>>>>>> On Dec 10, 2008, at 12:51 PM, Eric Chan-Tin wrote: >>>>>>>>> >>>>>>>>>> That's a good idea. Here is a little twist on it. >>>>>>>>>> >>>>>>>>>> Put a "weight" on each neighbor, based on the >>>>>>>>>> lastUpdateTime. So >>>>>>>>>> if >>>>>>>>>> currentTime - neighbor's lastUpdateTime, then the weight is >>>>>>>>>> higher. >>>>>>>>>> Then in getNeighborToPing, neighbor's with higher weight >>>>>>>>>> have a >>>>>>>>>> higher >>>>>>>>>> probability of getting picked than neighbor's with lower >>>>>>>>>> weight. >>>>>>>>>> >>>>>>>>>> The weight could be ( (currentTime - lastUpdateTime) / 15 >>>>>>>>>> minutes). >>>>>>>>>> It's not 10 minutes so as to avoid a probability of 1 if the >>>>>>>>>> lastUpdateTime was the previous performMaintenance. This >>>>>>>>>> should >>>>>>>>>> also >>>>>>>>>> avoid the problem of always pinging the oldest neighbor >>>>>>>>>> since it's >>>>>>>>>> about probability (weight). >>>>>>>>>> >>>>>>>>>> It seems a little bit less complicated than having >>>>>>>>>> lastPingTime >>>>>>>>>> and >>>>>>>>>> MIN_UPDATE_TIME. >>>>>>>>>> >>>>>>>>>> -Eric >>>>>>>>>> >>>>>>>>>> On Dec 10 2008, Jonathan Ledlie wrote: >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> This is an interesting situation. Let's think about the >>>>>>>>>>> goal. >>>>>>>>>>> The >>>>>>>>>>> goal is to keep pinging the same nodes as long as they are >>>>>>>>>>> alive, >>>>>>>>>>> and to replace a node soon after it dies, keeping the total >>>>>>>>>>> number >>>>>>>>>>> of up neighbors at approximately MAX_NEIGHBORS. >>>>>>>>>>> >>>>>>>>>>> How about the following: >>>>>>>>>>> - when you pick a neighbor to ping, you set a time on that >>>>>>>>>>> event >>>>>>>>>>> (in >>>>>>>>>>> a new per-neighbor timestamp LastPingTime) >>>>>>>>>>> - when you first insert a neighbor, set this timestamp to 0. >>>>>>>>>>> - when you loop in performMaintenance, remove nodes whose >>>>>>>>>>> LastPingTime - LastUpdateTime > threshold (like 10 minutes) >>>>>>>>>>> >>>>>>>>>>> - when picking a node, pick the guy with the oldest >>>>>>>>>>> LastPingTime >>>>>>>>>>> (not sure about this one -- I don't like situations where >>>>>>>>>>> you >>>>>>>>>>> pick >>>>>>>>>>> the absolute *oldest* because then we could get in a >>>>>>>>>>> situation >>>>>>>>>>> where >>>>>>>>>>> an old guy sits there, waiting to be kicked out, and we >>>>>>>>>>> keep >>>>>>>>>>> picking >>>>>>>>>>> him to be pinged; I'd prefer to pick randomly, or random >>>>>>>>>>> older >>>>>>>>>>> guys). >>>>>>>>>>> - a simple way around this is to simply have a self- >>>>>>>>>>> adjusting >>>>>>>>>>> threshold: so, we walk through the nodes, and everyone whose >>>>>>>>>>> LastPingTime is older than MIN_UPDATE_TIME_TO_PING ago (a >>>>>>>>>>> relative >>>>>>>>>>> time) get added to a set; we pick a guy randomly from the >>>>>>>>>>> set to >>>>>>>>>>> ping. If the set is empty, we increase >>>>>>>>>>> MIN_UPDATE_TIME_TO_PING, >>>>>>>>>>> and >>>>>>>>>>> wait for the next time the function is called. If the set >>>>>>>>>>> is too >>>>>>>>>>> large (relative to a fraction of MAX_NEIGHBORS), we >>>>>>>>>>> decrease it. >>>>>>>>>>> >>>>>>>>>>> We still might get some churn from lost UDP packets. You >>>>>>>>>>> might >>>>>>>>>>> also >>>>>>>>>>> give nodes a second chance to be pinged by including >>>>>>>>>>> LastPingTime - >>>>>>>>>>> LastUpdateTime > 1 minute in the set. Note that increases >>>>>>>>>>> and >>>>>>>>>>> decreases in MIN_UPDATE_TIME_TO_PING shouldn't be affected >>>>>>>>>>> by the >>>>>>>>>>> number of nodes that fall into this category. >>>>>>>>>>> >>>>>>>>>>> We still probably need a way for new potential neighbors >>>>>>>>>>> to be >>>>>>>>>>> rejected if our current set of neighbors is fine. Because I >>>>>>>>>>> still >>>>>>>>>>> want addNeigbor to really add the neighbor, how about >>>>>>>>>>> putting a >>>>>>>>>>> check before calling that to see if we need more neighbors. >>>>>>>>>>> >>>>>>>>>>> What thinks? >>>>>>>>>>> >>>>>>>>>>> -jonathan >>>>>>>>>>> >>>>>>>>>>> On Dec 9, 2008, at 11:37 PM, ext Eric Chan-Tin wrote: >>>>>>>>>>> >>>>>>>>>>>> Peter, >>>>>>>>>>>> >>>>>>>>>>>> Thanks. That fixed it. >>>>>>>>>>>> >>>>>>>>>>>> However, here is another problem I found while deploying >>>>>>>>>>>> Pyxida >>>>>>>>>>>> and >>>>>>>>>>>> running them for a while. >>>>>>>>>>>> >>>>>>>>>>>> In performMaintenance(), I loop through every neighbor >>>>>>>>>>>> and if >>>>>>>>>>>> (neighbor.getLastUpdateTime() < current_time - 10 minutes), >>>>>>>>>>>> then I >>>>>>>>>>>> remove that neighbor. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> In NCManager, I call getNeighborToPing, which randomly >>>>>>>>>>>> picks one >>>>>>>>>>>> of >>>>>>>>>>>> the >>>>>>>>>>>> neighbors (32 in this case). Pinging is performed every 10 >>>>>>>>>>>> seconds, so >>>>>>>>>>>> in 10 minutes, you can ping 60 nodes. However, if a >>>>>>>>>>>> neighbor A >>>>>>>>>>>> is not >>>>>>>>>>>> among those 60 nodes, it will get kicked out in >>>>>>>>>>>> performMaintenance(), >>>>>>>>>>>> although A is still alive (it just hasn't been contacted >>>>>>>>>>>> yet). >>>>>>>>>>>> >>>>>>>>>>>> There are some possible options. >>>>>>>>>>>> >>>>>>>>>>>> 1) Increase the timeout period from 10 minutes to ?? >>>>>>>>>>>> minutes. >>>>>>>>>>>> This >>>>>>>>>>>> doesn't quite work because say in Azureus, the number of >>>>>>>>>>>> neighbors is >>>>>>>>>>>> 512, so the timeout needs to be increased even more... It >>>>>>>>>>>> could >>>>>>>>>>>> be a >>>>>>>>>>>> function of MAX_NEIGHBORS... >>>>>>>>>>>> >>>>>>>>>>>> 2) When Pyxida replies to a ping request, it will update >>>>>>>>>>>> the >>>>>>>>>>>> neighbor's >>>>>>>>>>>> lastUpdateTime. This does not quite work either because it >>>>>>>>>>>> assumes a >>>>>>>>>>>> symmetric relationship which is most likely not the case. >>>>>>>>>>>> >>>>>>>>>>>> 3) Introduce some more logic to Pyxida so it remembers what >>>>>>>>>>>> nodes it >>>>>>>>>>>> contacted and does not send, say, more than 2 requests in >>>>>>>>>>>> 10 >>>>>>>>>>>> minutes to >>>>>>>>>>>> the same node. This solution isn't that great either. >>>>>>>>>>>> >>>>>>>>>>>> Let me know what you guys think. I am currently leaning >>>>>>>>>>>> towards >>>>>>>>>>>> option >>>>>>>>>>>> 1) but make the timeout period a function of MAX_NEIGHBORS, >>>>>>>>>>>> although I >>>>>>>>>>>> haven't quite found the perfect function yet. >>>>>>>>>>>> >>>>>>>>>>>> Here are some preliminary results. I compared the number of >>>>>>>>>>>> neighbors >>>>>>>>>>>> which are the same at each reported time (10 minutes). >>>>>>>>>>>> Before >>>>>>>>>>>> the fix, >>>>>>>>>>>> more than 50% of the neighbors changed within 10 minutes. >>>>>>>>>>>> After >>>>>>>>>>>> the >>>>>>>>>>>> fix, >>>>>>>>>>>> only about 30% of the neighbors changed within 10 >>>>>>>>>>>> minutes. It is >>>>>>>>>>>> more >>>>>>>>>>>> stable but I think 30% is still quite a big number. >>>>>>>>>>>> >>>>>>>>>>>> Sorry for the long email :) >>>>>>>>>>>> >>>>>>>>>>>> -Eric >>>>>>>>>>>> >>>>>>>>>>>> Peter Pietzuch wrote: >>>>>>>>>>>>> Hi, >>>>>>>>>>>>> >>>>>>>>>>>>> Eric Chan-Tin wrote: >>>>>>>>>>>>>> for (RemoteState<T> neighbor : neighbors) { //this is >>>>>>>>>>>>>> line >>>>>>>>>>>>>> 1029 >>>>>>>>>>>>>> if (neighbor.getLastUpdateTime() < (curr_time - >>>>>>>>>>>>>> MAINTENANCE_PERIOD)) { >>>>>>>>>>>>>> neighbors.remove(neighbor); >>>>>>>>>>>>>> } >>>>>>>>>>>>>> } >>>>>>>>>>>>> >>>>>>>>>>>>> IIRC you can't modify a collection while iterating over it >>>>>>>>>>>>> using the >>>>>>>>>>>>> above for loop. Instead you have to use a proper >>>>>>>>>>>>> Iterator<RemoteState<T>> and use it's remove method. >>>>>>>>>>>>> >>>>>>>>>>>>> Cheers, >>>>>>>>>>>>> Peter >>>>>>>>>>>>> >>>>>>>>>>>>>> Jonathan Ledlie wrote: >>>>>>>>>>>>>>> On Nov 3, 2008, at 3:46 PM, Eric Chan-Tin wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Jonathan, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I can insert some code to fix this and it doesn't >>>>>>>>>>>>>>>> look too >>>>>>>>>>>>>>>> hard >>>>>>>>>>>>>>>> but I do >>>>>>>>>>>>>>>> have some concerns. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I don't remember observing this behavior in our PL >>>>>>>>>>>>>>>>> deployment, but that could have been because we had >>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>> ping >>>>>>>>>>>>>>>>> time set >>>>>>>>>>>>>>>>> to 1-2 minutes, IIRC. >>>>>>>>>>>>>>>> The default ping time was set to 10 seconds + some >>>>>>>>>>>>>>>> Gaussian >>>>>>>>>>>>>>>> distribution, >>>>>>>>>>>>>>>> so that's what I used, but changing the ping time to 1 >>>>>>>>>>>>>>>> minute >>>>>>>>>>>>>>>> or more >>>>>>>>>>>>>>>> will >>>>>>>>>>>>>>>> result in the same effect except it will take longer >>>>>>>>>>>>>>>> for the >>>>>>>>>>>>>>>> "cycle" to >>>>>>>>>>>>>>>> occur. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> What I have in mind was the following: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 1) In NCClient.addNeighbor() would be changed to (in >>>>>>>>>>>>>>>> pseudocode) >>>>>>>>>>>>>>>> protected boolean addNeighbor(RemoteState<T> guy) { >>>>>>>>>>>>>>>> boolean added = false; >>>>>>>>>>>>>>>> //if neighbors.size() >= MAX_NEIGHBORS >>>>>>>>>>>>>>>> //Check last update time of each >>>>>>>>>>>>>>>> neighbor in >>>>>>>>>>>>>>>> neighbors >>>>>>>>>>>>>>>> //neighbor.getLastUpdateTime() >>>>>>>>>>>>>>>> //if any of them is older than some number >>>>>>>>>>>>>>>> TIMEOUT, >>>>>>>>>>>>>>>> remove the oldest one, then add "guy" to neighbors >>>>>>>>>>>>>>>> //else don't add "guy" to neighbors >>>>>>>>>>>>>>>> //else add "guy" to neighbors >>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> IMO this shouldn't be the behavior of this function >>>>>>>>>>>>>>> because >>>>>>>>>>>>>>> it >>>>>>>>>>>>>>> is called >>>>>>>>>>>>>>> "addNeighbor." Instead what could happen is that part >>>>>>>>>>>>>>> of the >>>>>>>>>>>>>>> performMaintenance() function could remove old >>>>>>>>>>>>>>> neighbors. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 2) In NCManager, getGossipNode() would just pick a >>>>>>>>>>>>>>>> random >>>>>>>>>>>>>>>> node >>>>>>>>>>>>>>>> from >>>>>>>>>>>>>>>> localNC.primaryNC.getNeighbors() to send a request to >>>>>>>>>>>>>>>> every 10 >>>>>>>>>>>>>>>> seconds. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 3) Pyxida will still keep track of all GossipNodes in >>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>> network. It >>>>>>>>>>>>>>>> could >>>>>>>>>>>>>>>> also (say every 10 minutes) check its >>>>>>>>>>>>>>>> NCClient.neighbors >>>>>>>>>>>>>>>> list >>>>>>>>>>>>>>>> and see if >>>>>>>>>>>>>>>> any neighbor is older than TIMEOUT, remove it, then >>>>>>>>>>>>>>>> pick a >>>>>>>>>>>>>>>> random >>>>>>>>>>>>>>>> GossipNode to send a ping request. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> The main problem with this scheme is that one Pyxida >>>>>>>>>>>>>>>> node >>>>>>>>>>>>>>>> would >>>>>>>>>>>>>>>> most >>>>>>>>>>>>>>>> likely >>>>>>>>>>>>>>>> stay with the same set of neighbors all the time. Its >>>>>>>>>>>>>>>> coordinates >>>>>>>>>>>>>>>> would be >>>>>>>>>>>>>>>> greatly affected by those 32 nodes (I am assuming >>>>>>>>>>>>>>>> MAX_NEIGHBORS >>>>>>>>>>>>>>>> = 32). >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Instead of (2), why not have the getNeighborToPing >>>>>>>>>>>>>>> function >>>>>>>>>>>>>>> determine >>>>>>>>>>>>>>> the behavior? If it returns a node, then ping it. If >>>>>>>>>>>>>>> it >>>>>>>>>>>>>>> returns null, >>>>>>>>>>>>>>> then have the NCManager pick some other node to use? If >>>>>>>>>>>>>>> things are >>>>>>>>>>>>>>> hunky-dory, getNeighborToPing() would return a node (a >>>>>>>>>>>>>>> current >>>>>>>>>>>>>>> member of >>>>>>>>>>>>>>> the neighbor set), if we're short on neighbors, then >>>>>>>>>>>>>>> it would >>>>>>>>>>>>>>> return >>>>>>>>>>>>>>> null. The periodic maintenance would have already >>>>>>>>>>>>>>> tossed old >>>>>>>>>>>>>>> nodes, so >>>>>>>>>>>>>>> this would be self-sustaining. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The "problem" you mention is fairly central to network >>>>>>>>>>>>>>> coordinates in >>>>>>>>>>>>>>> general. Sure the coordinates will be a little better >>>>>>>>>>>>>>> the >>>>>>>>>>>>>>> more >>>>>>>>>>>>>>> neighbors you have. But in a very large system >>>>>>>>>>>>>>> talking to >>>>>>>>>>>>>>> everybody >>>>>>>>>>>>>>> isn't feasible. Peter and I have a graph in one of the >>>>>>>>>>>>>>> workshop >>>>>>>>>>>>>>> papers >>>>>>>>>>>>>>> which shows the effect of increasing the number of >>>>>>>>>>>>>>> neighbors >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I don't think the network would be partitioned, but >>>>>>>>>>>>>>>> it could >>>>>>>>>>>>>>>> happen. For >>>>>>>>>>>>>>>> example, the boostrap nodes have each other in their >>>>>>>>>>>>>>>> neighbors >>>>>>>>>>>>>>>> list. >>>>>>>>>>>>>>>> Every >>>>>>>>>>>>>>>> new node bootstrapping to the network would contact >>>>>>>>>>>>>>>> those >>>>>>>>>>>>>>>> bootstrapping >>>>>>>>>>>>>>>> nodes first and add them to their neighbors. So every >>>>>>>>>>>>>>>> Pyxida >>>>>>>>>>>>>>>> node could >>>>>>>>>>>>>>>> have as its neighbors the "Bootstrap nodes", which >>>>>>>>>>>>>>>> would >>>>>>>>>>>>>>>> make >>>>>>>>>>>>>>>> Pyxida >>>>>>>>>>>>>>>> become >>>>>>>>>>>>>>>> a centralized system. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> As long as the initial bootstrap list is large enough, I >>>>>>>>>>>>>>> think >>>>>>>>>>>>>>> you are >>>>>>>>>>>>>>> right, that partitions won't be a problem. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> One last thing: When Pyxida receives a ping request >>>>>>>>>>>>>>>> from a >>>>>>>>>>>>>>>> node, it does >>>>>>>>>>>>>>>> not update its coordinates. Is there any particular >>>>>>>>>>>>>>>> reason >>>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>> that? >>>>>>>>>>>>>>>> Is it >>>>>>>>>>>>>>>> because Pyxida will not be able to calculate the RTT? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Yes. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Also, there is no "gossiping" of other nodes' >>>>>>>>>>>>>>>> coordinates, >>>>>>>>>>>>>>>> right? So if A >>>>>>>>>>>>>>>> has B in its neighbors list, it will from time to >>>>>>>>>>>>>>>> time tell >>>>>>>>>>>>>>>> other nodes >>>>>>>>>>>>>>>> about B's coordinates. I realize this has a lot of >>>>>>>>>>>>>>>> complications but the >>>>>>>>>>>>>>>> original Vivaldi paper was very fuzzy about this >>>>>>>>>>>>>>>> point and >>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>> previous >>>>>>>>>>>>>>>> point that it seems it is up to the implementors to >>>>>>>>>>>>>>>> decide >>>>>>>>>>>>>>>> what >>>>>>>>>>>>>>>> to do. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I don't think we are doing this. It could be useful in >>>>>>>>>>>>>>> certain >>>>>>>>>>>>>>> circumstance (e.g., picking potential new neighbors), >>>>>>>>>>>>>>> but >>>>>>>>>>>>>>> we're >>>>>>>>>>>>>>> not, IIRC. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I will have to think some more about this as there >>>>>>>>>>>>>>>> are a lot >>>>>>>>>>>>>>>> of >>>>>>>>>>>>>>>> nasty >>>>>>>>>>>>>>>> little side effects that could happen, but these are my >>>>>>>>>>>>>>>> thoughts. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -Eric >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Sounds good. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -jonathan >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Nov 3 2008, Jonathan Ledlie wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Eric, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Based on a look at the code just now, I see where >>>>>>>>>>>>>>>>> you are >>>>>>>>>>>>>>>>> coming from: >>>>>>>>>>>>>>>>> with these params on PL, there isn't anything >>>>>>>>>>>>>>>>> stopping the >>>>>>>>>>>>>>>>> neighbor >>>>>>>>>>>>>>>>> list from circulating pretty quickly through all >>>>>>>>>>>>>>>>> nodes. In >>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>> Azureus implementation, the guys fed to the alg (fed >>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>> processSample) >>>>>>>>>>>>>>>>> weren't changing much because they were coming from >>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>> routing finger >>>>>>>>>>>>>>>>> tables. I don't remember observing this behavior in >>>>>>>>>>>>>>>>> our PL >>>>>>>>>>>>>>>>> deployment, but that could have been because we had >>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>> ping >>>>>>>>>>>>>>>>> time set >>>>>>>>>>>>>>>>> to 1-2 minutes, IIRC. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Do you want to insert some code that will fix this? >>>>>>>>>>>>>>>>> Basically >>>>>>>>>>>>>>>>> we want >>>>>>>>>>>>>>>>> to only add a new neighbor if our oldest neighbor >>>>>>>>>>>>>>>>> should be >>>>>>>>>>>>>>>>> tossed (or >>>>>>>>>>>>>>>>> if we haven't reached MAX_NEIGHBORS). It's probably >>>>>>>>>>>>>>>>> best >>>>>>>>>>>>>>>>> from a >>>>>>>>>>>>>>>>> clarity standpoint if addNeighbor does continue to >>>>>>>>>>>>>>>>> actually >>>>>>>>>>>>>>>>> add the >>>>>>>>>>>>>>>>> neighbor in all cases. Probably the best thing to >>>>>>>>>>>>>>>>> do is to >>>>>>>>>>>>>>>>> have >>>>>>>>>>>>>>>>> NCClient drop old neighbors internally and have >>>>>>>>>>>>>>>>> NCManager >>>>>>>>>>>>>>>>> ask >>>>>>>>>>>>>>>>> it which >>>>>>>>>>>>>>>>> nodes should be pinged via getNeighborToPing. So >>>>>>>>>>>>>>>>> NCClient >>>>>>>>>>>>>>>>> would >>>>>>>>>>>>>>>>> continue to have the same behavior on processSample >>>>>>>>>>>>>>>>> (it >>>>>>>>>>>>>>>>> would not >>>>>>>>>>>>>>>>> reject new information) and NCManager and NCClient >>>>>>>>>>>>>>>>> would >>>>>>>>>>>>>>>>> cooperate to >>>>>>>>>>>>>>>>> measure RTTs to current neighbors. We do want to make >>>>>>>>>>>>>>>>> sure, >>>>>>>>>>>>>>>>> however, >>>>>>>>>>>>>>>>> that the gossip set stays large enough so that >>>>>>>>>>>>>>>>> partitions >>>>>>>>>>>>>>>>> don't occur. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Do you want to be added to the SourceForge Pyxida >>>>>>>>>>>>>>>>> developers >>>>>>>>>>>>>>>>> to make >>>>>>>>>>>>>>>>> this change? (Please test it out carefully first.) >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> -jonathan >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Oct 30, 2008, at 11:32 AM, ext Eric Chan-Tin wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Jonathan, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> From what I saw, eventually, you learn about all >>>>>>>>>>>>>>>>>> the nodes >>>>>>>>>>>>>>>>>> in >>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>> network. Every 10 seconds, Pyxida picks a random >>>>>>>>>>>>>>>>>> "Gossip" >>>>>>>>>>>>>>>>>> node from >>>>>>>>>>>>>>>>>> its >>>>>>>>>>>>>>>>>> list and sends a request. The Gossip node will >>>>>>>>>>>>>>>>>> reply back >>>>>>>>>>>>>>>>>> with an >>>>>>>>>>>>>>>>>> "upNeighbor", that is, a "Gossip" node that is "up" >>>>>>>>>>>>>>>>>> or >>>>>>>>>>>>>>>>>> with 10% >>>>>>>>>>>>>>>>>> probability, a node that is "pending". Since >>>>>>>>>>>>>>>>>> naturally, >>>>>>>>>>>>>>>>>> all >>>>>>>>>>>>>>>>>> the nodes >>>>>>>>>>>>>>>>>> bootstrap off the same set of nodes, every node will >>>>>>>>>>>>>>>>>> eventually know >>>>>>>>>>>>>>>>>> of >>>>>>>>>>>>>>>>>> every other node. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Pyxida might not be rotating through everybody >>>>>>>>>>>>>>>>>> continuously >>>>>>>>>>>>>>>>>> (although >>>>>>>>>>>>>>>>>> eventually, since it knows everybody, it will have >>>>>>>>>>>>>>>>>> added >>>>>>>>>>>>>>>>>> everybody to >>>>>>>>>>>>>>>>>> its Neighbors list at some point), but could be >>>>>>>>>>>>>>>>>> removing a >>>>>>>>>>>>>>>>>> node that >>>>>>>>>>>>>>>>>> is >>>>>>>>>>>>>>>>>> "alive". So, if MAX_NEIGHBORS = 32, let's call them >>>>>>>>>>>>>>>>>> A_1, >>>>>>>>>>>>>>>>>> A_2, >>>>>>>>>>>>>>>>>> ..., >>>>>>>>>>>>>>>>>> A_32. >>>>>>>>>>>>>>>>>> Pyxida picks a random Gossip node B, sends it a >>>>>>>>>>>>>>>>>> request >>>>>>>>>>>>>>>>>> and B >>>>>>>>>>>>>>>>>> replies >>>>>>>>>>>>>>>>>> back. Pyxida will add B to its Neighbors list, see >>>>>>>>>>>>>>>>>> that >>>>>>>>>>>>>>>>>> size of >>>>>>>>>>>>>>>>>> Neighbors > MAX_NEIGHBORS, kick A_1 out. Then from >>>>>>>>>>>>>>>>>> those >>>>>>>>>>>>>>>>>> 32 >>>>>>>>>>>>>>>>>> neighbors >>>>>>>>>>>>>>>>>> A_2, ...., A_32, B, it will calculate the force, >>>>>>>>>>>>>>>>>> and apply >>>>>>>>>>>>>>>>>> it >>>>>>>>>>>>>>>>>> to the >>>>>>>>>>>>>>>>>> current system coordinates. Thus, if every 10 >>>>>>>>>>>>>>>>>> seconds, >>>>>>>>>>>>>>>>>> Pyxida >>>>>>>>>>>>>>>>>> picks a >>>>>>>>>>>>>>>>>> different set of nodes C, D, E, etc... it would keep >>>>>>>>>>>>>>>>>> kicking >>>>>>>>>>>>>>>>>> nodes out >>>>>>>>>>>>>>>>>> of the neighbors list. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> It's the "kicking A_1 out" part that seemed a bit >>>>>>>>>>>>>>>>>> confusing. >>>>>>>>>>>>>>>>>> Churn >>>>>>>>>>>>>>>>>> is a >>>>>>>>>>>>>>>>>> good reason but you don't check that A_1 might >>>>>>>>>>>>>>>>>> still be >>>>>>>>>>>>>>>>>> alive >>>>>>>>>>>>>>>>>> or maybe >>>>>>>>>>>>>>>>>> A_1 was just added. I have been printing the >>>>>>>>>>>>>>>>>> neighbors >>>>>>>>>>>>>>>>>> list >>>>>>>>>>>>>>>>>> every 10 >>>>>>>>>>>>>>>>>> minutes, and they are completely different between >>>>>>>>>>>>>>>>>> those >>>>>>>>>>>>>>>>>> 10 >>>>>>>>>>>>>>>>>> minutes. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> One possible problem might be that I am choosing >>>>>>>>>>>>>>>>>> the value >>>>>>>>>>>>>>>>>> of >>>>>>>>>>>>>>>>>> MAX_NEIGHBORS to be too small. Originally it was >>>>>>>>>>>>>>>>>> set to >>>>>>>>>>>>>>>>>> 512 >>>>>>>>>>>>>>>>>> (which was >>>>>>>>>>>>>>>>>> meant for Azureus). Since I am using Planetlab >>>>>>>>>>>>>>>>>> where there >>>>>>>>>>>>>>>>>> are about >>>>>>>>>>>>>>>>>> 400-500 "stable" nodes, I changed the value to 32. >>>>>>>>>>>>>>>>>> But I >>>>>>>>>>>>>>>>>> don't think >>>>>>>>>>>>>>>>>> that should be a problem since log(n) = log(512) = 9. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> -Eric >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Jonathan Ledlie wrote: >>>>>>>>>>>>>>>>>>> Eric, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Glad to hear it. Let Peter and me know if you >>>>>>>>>>>>>>>>>>> publish >>>>>>>>>>>>>>>>>>> anything using >>>>>>>>>>>>>>>>>>> Pyxida and we can put it on the web page. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> "Gossip" nodes are for learning about the network. >>>>>>>>>>>>>>>>>>> "Neighbors" are >>>>>>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>>>>> constructing the coordinate. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> So, IIRC, the gossip process learns about nodes that >>>>>>>>>>>>>>>>>>> may or >>>>>>>>>>>>>>>>>>> may not >>>>>>>>>>>>>>>>>>> be >>>>>>>>>>>>>>>>>>> applied to the neighbor list. That is, we may >>>>>>>>>>>>>>>>>>> eventually >>>>>>>>>>>>>>>>>>> learn >>>>>>>>>>>>>>>>>>> about a >>>>>>>>>>>>>>>>>>> large portion of the network, via gossiping, but >>>>>>>>>>>>>>>>>>> should >>>>>>>>>>>>>>>>>>> only >>>>>>>>>>>>>>>>>>> be using >>>>>>>>>>>>>>>>>>> MAX_NEIGHBORS at any given time for coordinate >>>>>>>>>>>>>>>>>>> computation. >>>>>>>>>>>>>>>>>>> The >>>>>>>>>>>>>>>>>>> reason >>>>>>>>>>>>>>>>>>> we don't want to use a fixed set of neighbors for >>>>>>>>>>>>>>>>>>> coordinate >>>>>>>>>>>>>>>>>>> computation >>>>>>>>>>>>>>>>>>> is because nodes may come and go from the system. >>>>>>>>>>>>>>>>>>> However, >>>>>>>>>>>>>>>>>>> it should >>>>>>>>>>>>>>>>>>> not be the case that we just rotate through >>>>>>>>>>>>>>>>>>> everybody, >>>>>>>>>>>>>>>>>>> because >>>>>>>>>>>>>>>>>>> then, as >>>>>>>>>>>>>>>>>>> you note, we'd be computing against everyone's >>>>>>>>>>>>>>>>>>> coordinate/ >>>>>>>>>>>>>>>>>>> latency, >>>>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>> this is counter to the point of network coordinates. >>>>>>>>>>>>>>>>>>> With this >>>>>>>>>>>>>>>>>>> gradual >>>>>>>>>>>>>>>>>>> transience as a desirable quality in mind, I am >>>>>>>>>>>>>>>>>>> pretty >>>>>>>>>>>>>>>>>>> sure we >>>>>>>>>>>>>>>>>>> maintain >>>>>>>>>>>>>>>>>>> neighbors in a fairly fixed manner. Does this not >>>>>>>>>>>>>>>>>>> seem >>>>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>> be the >>>>>>>>>>>>>>>>>>> case >>>>>>>>>>>>>>>>>>> to you? >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> -jonathan >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Oct 29, 2008, at 9:20 PM, Eric Chan-Tin wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Hi Jonathan and Peter, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> First of all, just wanted to say that Pyxida has >>>>>>>>>>>>>>>>>>>> been >>>>>>>>>>>>>>>>>>>> working >>>>>>>>>>>>>>>>>>>> great on >>>>>>>>>>>>>>>>>>>> Planetlab for the past 4 months or so. Getting >>>>>>>>>>>>>>>>>>>> good data >>>>>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>>> experiments >>>>>>>>>>>>>>>>>>>> going well. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> However, I do have one question/comment. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Every Pyxida node contains a list of Neighbors (in >>>>>>>>>>>>>>>>>>>> NCClient) and a >>>>>>>>>>>>>>>>>>>> list >>>>>>>>>>>>>>>>>>>> of Gossip nodes (they are called neighbors in >>>>>>>>>>>>>>>>>>>> NCManager >>>>>>>>>>>>>>>>>>>> but >>>>>>>>>>>>>>>>>>>> I will >>>>>>>>>>>>>>>>>>>> call >>>>>>>>>>>>>>>>>>>> them gossip for now so as not to confuse with the >>>>>>>>>>>>>>>>>>>> 'real' >>>>>>>>>>>>>>>>>>>> neighbors >>>>>>>>>>>>>>>>>>>> in >>>>>>>>>>>>>>>>>>>> NCClient). Every 10 seconds, Pyxida will pick a >>>>>>>>>>>>>>>>>>>> random >>>>>>>>>>>>>>>>>>>> node >>>>>>>>>>>>>>>>>>>> (call >>>>>>>>>>>>>>>>>>>> it A) >>>>>>>>>>>>>>>>>>>> from Gossip and send a gossip request (basically >>>>>>>>>>>>>>>>>>>> asking >>>>>>>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>>>>>> rtt, >>>>>>>>>>>>>>>>>>>> coordinate, error). But Gossip contains all the >>>>>>>>>>>>>>>>>>>> nodes in >>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>> network. >>>>>>>>>>>>>>>>>>>> The comment in the code said that was to avoid >>>>>>>>>>>>>>>>>>>> Pyxida >>>>>>>>>>>>>>>>>>>> becoming >>>>>>>>>>>>>>>>>>>> lonely >>>>>>>>>>>>>>>>>>>> and no-one to gossip with, which makes sense. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Once A responds, Pyxida will add A to its >>>>>>>>>>>>>>>>>>>> Neighbors (if >>>>>>>>>>>>>>>>>>>> Neighbors >>>>>>>>>>>>>>>>>>>> does >>>>>>>>>>>>>>>>>>>> not contain A) and remove the first element from >>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>> Neighbors list. >>>>>>>>>>>>>>>>>>>> This seems a little bit odd. Was there any >>>>>>>>>>>>>>>>>>>> particular >>>>>>>>>>>>>>>>>>>> reasoning >>>>>>>>>>>>>>>>>>>> behind >>>>>>>>>>>>>>>>>>>> this? Because it would seem like you're getting >>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>> coordinate from >>>>>>>>>>>>>>>>>>>> every node in the network eventually. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Thanks a lot, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> -Eric >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Jonathan Ledlie wrote: >>>>>>>>>>>>>>>>>>>>> Your inclination is correct. See forwarded >>>>>>>>>>>>>>>>>>>>> message. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> -jonathan >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On Apr 9, 2008, at 1:53 PM, ext Eric Chan-Tin >>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Thank you for the clarification. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> I've been reading through your published papers >>>>>>>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>>>>> code >>>>>>>>>>>>>>>>>>>>>> and I >>>>>>>>>>>>>>>>>>>>>> must >>>>>>>>>>>>>>>>>>>>>> say I learned a lot about how Pyxida works :) >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> I still have a quick and easy (hopefully) >>>>>>>>>>>>>>>>>>>>>> question. >>>>>>>>>>>>>>>>>>>>>> Each >>>>>>>>>>>>>>>>>>>>>> Pyxida >>>>>>>>>>>>>>>>>>>>>> node >>>>>>>>>>>>>>>>>>>>>> has a small amount of neighbors and learn about >>>>>>>>>>>>>>>>>>>>>> other >>>>>>>>>>>>>>>>>>>>>> nodes >>>>>>>>>>>>>>>>>>>>>> through >>>>>>>>>>>>>>>>>>>>>> other nodes. I am running on about 200 >>>>>>>>>>>>>>>>>>>>>> machines, and >>>>>>>>>>>>>>>>>>>>>> it >>>>>>>>>>>>>>>>>>>>>> seems >>>>>>>>>>>>>>>>>>>>>> like all >>>>>>>>>>>>>>>>>>>>>> the Pyxida nodes are adding each other to their >>>>>>>>>>>>>>>>>>>>>> upNeighbors set. >>>>>>>>>>>>>>>>>>>>>> NCClient.java has the MAX_NEIGHBORS set to 512. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> I wanted to make sure that I am not misreading >>>>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>> code, but >>>>>>>>>>>>>>>>>>>>>> shouldn't >>>>>>>>>>>>>>>>>>>>>> each Pyxida node keep a small number of >>>>>>>>>>>>>>>>>>>>>> neighbors in >>>>>>>>>>>>>>>>>>>>>> its >>>>>>>>>>>>>>>>>>>>>> upNeighbors >>>>>>>>>>>>>>>>>>>>>> set. Else, it will be a lot of updates to send a >>>>>>>>>>>>>>>>>>>>>> gossipMessage >>>>>>>>>>>>>>>>>>>>>> to all >>>>>>>>>>>>>>>>>>>>>> 200 neighbors, and it also means that a Pyxida >>>>>>>>>>>>>>>>>>>>>> node >>>>>>>>>>>>>>>>>>>>>> has >>>>>>>>>>>>>>>>>>>>>> the whole >>>>>>>>>>>>>>>>>>>>>> network in its upNeighbors set - that seems >>>>>>>>>>>>>>>>>>>>>> wrong. I >>>>>>>>>>>>>>>>>>>>>> am >>>>>>>>>>>>>>>>>>>>>> inclined >>>>>>>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>>>>> change the value of MAX_NEIGHBORS but wanted to >>>>>>>>>>>>>>>>>>>>>> make >>>>>>>>>>>>>>>>>>>>>> sure >>>>>>>>>>>>>>>>>>>>>> that my >>>>>>>>>>>>>>>>>>>>>> understanding is correct first. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Thank you, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Eric >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> ---------------------------------------------------- >>>>>>>>>>>>>>>>>>>> --------------------- >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> This SF.Net email is sponsored by the Moblin Your >>>>>>>>>>>>>>>>>>>> Move >>>>>>>>>>>>>>>>>>>> Developer's >>>>>>>>>>>>>>>>>>>> challenge Build the coolest Linux based >>>>>>>>>>>>>>>>>>>> applications >>>>>>>>>>>>>>>>>>>> with >>>>>>>>>>>>>>>>>>>> Moblin >>>>>>>>>>>>>>>>>>>> SDK & >>>>>>>>>>>>>>>>>>>> win great prizes Grand prize is a trip for two to >>>>>>>>>>>>>>>>>>>> an >>>>>>>>>>>>>>>>>>>> Open >>>>>>>>>>>>>>>>>>>> Source >>>>>>>>>>>>>>>>>>>> event >>>>>>>>>>>>>>>>>>>> anywhere in the world >>>>>>>>>>>>>>>>>>>> http://moblin-contest.org/redirect.php? >>>>>>>>>>>>>>>>>>>> banner_id=100&url=/ >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>>>> Pyxida-users mailing >>>>>>>>>>>>>>>>>>>> list Pyx...@li... >>>>>>>>>>>>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/ >>>>>>>>>>>>>>>>>>>> pyxida- >>>>>>>>>>>>>>>>>>>> users >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -------------------------------------------------------- >>>>>>>>>>>>>>>> ----------------- >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> This SF.Net email is sponsored by the Moblin Your Move >>>>>>>>>>>>>>>> Developer's challenge Build the coolest Linux based >>>>>>>>>>>>>>>> applications with Moblin SDK & win great prizes Grand >>>>>>>>>>>>>>>> prize is >>>>>>>>>>>>>>>> a trip for two to an Open Source event anywhere in >>>>>>>>>>>>>>>> the world >>>>>>>>>>>>>>>> http://moblin-contest.org/redirect.php? >>>>>>>>>>>>>>>> banner_id=100&url=/ >>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>> Pyxida-users >>>>>>>>>>>>>>>> mailing list Pyx...@li... >>>>>>>>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/pyxida- >>>>>>>>>>>>>>>> users >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> ---------------------------------------------------------- >>>>>>>>>>>>>> -------------------- >>>>>>>>>>>>>> >>>>>>>>>>>>>> SF.Net email is Sponsored by MIX09, March 18-20, 2009 >>>>>>>>>>>>>> in Las >>>>>>>>>>>>>> Vegas, Nevada. The future of the web can't happen >>>>>>>>>>>>>> without you. >>>>>>>>>>>>>> Join us at MIX09 to help pave the way to the Next Web >>>>>>>>>>>>>> now. >>>>>>>>>>>>>> Learn >>>>>>>>>>>>>> more and register at >>>>>>>>>>>>>> http://ad.doubleclick.net/clk;208669438;13503038;i? >>>>>>>>>>>>>> http://2009.visitmix.com/ >>>>>>>>>>>>>> >>>>>>>>>>>>>> _______________________________________________ Pyxida- >>>>>>>>>>>>>> users >>>>>>>>>>>>>> mailing list Pyx...@li... >>>>>>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/pyxida-users >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> ------------------------------------------------------------------ >>>>>> ------------ >>>>>> >>>>>> SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las >>>>>> Vegas, >>>>>> Nevada. The future of the web can't happen without you. Join us >>>>>> at >>>>>> MIX09 >>>>>> to help pave the way to the Next Web now. Learn more and >>>>>> register at >>>>>> http://ad.doubleclick.net/clk;208669438;13503038;i?http:// >>>>>> 2009.visitmix.com/ >>>>>> >>>>>> _______________________________________________ Pyxida-users >>>>>> mailing >>>>>> list Pyx...@li... >>>>>> https://lists.sourceforge.net/lists/listinfo/pyxida-users >>>>> >>>> >>>> >>>> >>>> -------------------------------------------------------------------- >>>> ---------- >>>> >>>> SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, >>>> Nevada. >>>> The future of the web can't happen without you. Join us at MIX09 >>>> to help >>>> pave the way to the Next Web now. Learn more and register at >>>> http://ad.doubleclick.net/clk;208669438;13503038;i?http:// >>>> 2009.visitmix.com/ >>>> >>>> _______________________________________________ >>>> Pyxida-users mailing list >>>> Pyx...@li... >>>> https://lists.sourceforge.net/lists/listinfo/pyxida-users >> >> ---------------------------------------------------------------------- >> -------- >> SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, >> Nevada. >> The future of the web can't happen without you. Join us at MIX09 >> to help >> pave the way to the Next Web now. Learn more and register at >> http://ad.doubleclick.net/clk;208669438;13503038;i?http:// >> 2009.visitmix.com/ >> _______________________________________________ >> Pyxida-users mailing list >> Pyx...@li... >> https://lists.sourceforge.net/lists/listinfo/pyxida-users > > > ------------------------------------------------------------------------------ > SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, > Nevada. > The future of the web can't happen without you. Join us at MIX09 to > help > pave the way to the Next Web now. Learn more and register at > http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/ > _______________________________________________ > Pyxida-users mailing list > Pyx...@li... > https://lists.sourceforge.net/lists/listinfo/pyxida-users |
From: Jonathan L. <jon...@ee...> - 2008-12-19 01:29:49
|
http://java.sun.com/javase/downloads/index_jdk5.jsp "J2SE 5.0 is in its Java Technology End of Life (EOL) transition period." :-) On Dec 18, 2008, at 8:24 PM, Eric Chan-Tin wrote: > jdk-1.5.0_09 > > Jonathan Ledlie wrote: >> >> Eric, >> >> I'm getting an error (an old favorite): >> WA 1229649353501 NetUtil : Could not deserialize java >> object: >> java.io.StreamCorruptedException: invalid type code: CC >> WA 1229649353501 ObjComm$ReadConnHa: No valid msgObject received from >> remoteAddr=planet1.scs.stanford.edu:26500. Ignoring. >> >> What java version are you running on PL? >> >> -jonathan >> >> On Dec 18, 2008, at 4:49 PM, Eric Chan-Tin wrote: >> >>> Do you mean the Pyxida code running on PlanetLab? I killed them >>> after >>> running it overnight. I am currently restarting the experiments. Try >>> again >>> in about 15 minutes and it should be ready. >>> >>> Fix1 is on port 28500 (and api.port 28501). Fix2 is on port 26500 >>> (api.port >>> 26501). The node you can contact is "planet1.scs.stanford.edu" or >>> "planetlab01.cs.washington.edu". >>> >>> About your other email: Yes I know. The fix was to go through the >>> list >>> again, but I didn't want to implement that since it seems to be >>> working >>> fine. So go through the list once to calculate weight, and add >>> neighbors >>> into another list (sorted by weights). Then go through that new >>> list. The >>> new list will be sorted by higher weights, so older neighbors >>> will be >>> "tested" first. If you want it that way, I can do that. However, >>> I do not >>> think we need to. Let's assume iteration 0, the first node gets >>> picked. At >>> iteration 1, that first node will have a very small probability of >>> getting >>> picked again. >>> >>> -Eric >>> >>> >>> On Dec 18 2008, Jonathan Ledlie wrote: >>> >>>> >>>> Eric, >>>> >>>> Do you have a PL Pyxida slice that I can connect to to try this >>>> out? >>>> >>>> That is, I'll run it on my machine, but connect to your slice >>>> via the >>>> right ports. >>>> >>>> -jonathan >>>> >>>> On Dec 18, 2008, at 11:35 AM, Eric Chan-Tin wrote: >>>> >>>>> Sorry for the delay. Was busy with final exams and projects. >>>>> >>>>> I made the changes. They can be found at >>>>> >>>>> http://www-users.cs.umn.edu/~dchantin/NCClient.java >>>>> http://www-users.cs.umn.edu/~dchantin/NCManager.java >>>>> http://www-users.cs.umn.edu/~dchantin/RemoteState.java >>>>> >>>>> All the changes are enclosed within >>>>> //ERIC START >>>>> ... >>>>> //ERIC END >>>>> >>>>> I compared every reported "neighbors list" of each Pyxida node >>>>> (about 370 >>>>> total) and see how many neighbors did not change within 10 >>>>> minutes. >>>>> Then >>>>> did the average for all the nodes. >>>>> >>>>> Original: 43.37% of the neighbors stayed the same on average. >>>>> >>>>> Fix0 (before we found the interesting situation): 74.73% >>>>> >>>>> Fix1 (with the weight probability of picking a neighbor to ping): >>>>> 95.79% >>>>> >>>>> Fix2 (with the weight probability and the MIN_UPDATE_PING): 98.81% >>>>> >>>>> Also, the reported error, rrl, etc.. were much lower in the Fixed >>>>> Pyxida >>>>> than in the original one (this could not be significant, since the >>>>> Original >>>>> experiment was performed 1-2 months ago), but as expected, it >>>>> was not >>>>> higher. >>>>> >>>>> You can take a look at the code and test it if you want. If >>>>> everything >>>>> looks fine, I'll remove some of the debugging code and other >>>>> "useless" >>>>> comments (like //ERIC). Also, let me know which "fix" you >>>>> prefer. I >>>>> will >>>>> then email you the updated code. >>>>> >>>>> Thanks, >>>>> >>>>> -Eric >>>>> >>>>> >>>>> On Dec 10 2008, Jonathan Ledlie wrote: >>>>> >>>>>> >>>>>> Seems reasonable. >>>>>> >>>>>> Both ideas (lastPingTime and a weighted probability based on age) >>>>>> seem >>>>>> to be independent. >>>>>> >>>>>> Some of the code is more CPU intensive than it should be (e.g., >>>>>> looping through potentially long lists), and that is a slight >>>>>> downside >>>>>> to the function approach vs a set, but this probably isn't a big >>>>>> deal. >>>>>> >>>>>> -jonathan >>>>>> >>>>>> On Dec 10, 2008, at 10:40 PM, Eric Chan-Tin wrote: >>>>>> >>>>>>> okay. I think I might have misunderstood you. Your idea of >>>>>>> lastPingTime >>>>>>> looks interesting. >>>>>>> >>>>>>> My idea is to introduce a probability function. Right now, it >>>>>>> just >>>>>>> randomly picks one of the neighbors. What I am thinking it might >>>>>>> do is >>>>>>> that instead of just randomly picking a neighbor, pick a >>>>>>> neighbor >>>>>>> based >>>>>>> on some function. So, the neighbor whose lastUpdateTime is >>>>>>> oldest >>>>>>> will >>>>>>> have a higher chance of being picked than a neighbor who was >>>>>>> just >>>>>>> updated. >>>>>>> >>>>>>> Does that make more sense? >>>>>>> >>>>>>> -Eric >>>>>>> >>>>>>> Jonathan Ledlie wrote: >>>>>>>> >>>>>>>> Hmm. The advantage of lastPingTime is that you learn >>>>>>>> whether or >>>>>>>> not >>>>>>>> you've tried to contact a neighbor and they haven't responded. >>>>>>>> Then you >>>>>>>> can use that knowledge to determine whether or not to toss >>>>>>>> them. >>>>>>>> >>>>>>>> Maybe I'm not understanding your suggestion then. >>>>>>>> >>>>>>>> -jonathan >>>>>>>> >>>>>>>> On Dec 10, 2008, at 12:51 PM, Eric Chan-Tin wrote: >>>>>>>> >>>>>>>>> That's a good idea. Here is a little twist on it. >>>>>>>>> >>>>>>>>> Put a "weight" on each neighbor, based on the >>>>>>>>> lastUpdateTime. So >>>>>>>>> if >>>>>>>>> currentTime - neighbor's lastUpdateTime, then the weight is >>>>>>>>> higher. >>>>>>>>> Then in getNeighborToPing, neighbor's with higher weight >>>>>>>>> have a >>>>>>>>> higher >>>>>>>>> probability of getting picked than neighbor's with lower >>>>>>>>> weight. >>>>>>>>> >>>>>>>>> The weight could be ( (currentTime - lastUpdateTime) / 15 >>>>>>>>> minutes). >>>>>>>>> It's not 10 minutes so as to avoid a probability of 1 if the >>>>>>>>> lastUpdateTime was the previous performMaintenance. This >>>>>>>>> should >>>>>>>>> also >>>>>>>>> avoid the problem of always pinging the oldest neighbor >>>>>>>>> since it's >>>>>>>>> about probability (weight). >>>>>>>>> >>>>>>>>> It seems a little bit less complicated than having >>>>>>>>> lastPingTime >>>>>>>>> and >>>>>>>>> MIN_UPDATE_TIME. >>>>>>>>> >>>>>>>>> -Eric >>>>>>>>> >>>>>>>>> On Dec 10 2008, Jonathan Ledlie wrote: >>>>>>>>> >>>>>>>>>> >>>>>>>>>> This is an interesting situation. Let's think about the >>>>>>>>>> goal. >>>>>>>>>> The >>>>>>>>>> goal is to keep pinging the same nodes as long as they are >>>>>>>>>> alive, >>>>>>>>>> and to replace a node soon after it dies, keeping the total >>>>>>>>>> number >>>>>>>>>> of up neighbors at approximately MAX_NEIGHBORS. >>>>>>>>>> >>>>>>>>>> How about the following: >>>>>>>>>> - when you pick a neighbor to ping, you set a time on that >>>>>>>>>> event >>>>>>>>>> (in >>>>>>>>>> a new per-neighbor timestamp LastPingTime) >>>>>>>>>> - when you first insert a neighbor, set this timestamp to 0. >>>>>>>>>> - when you loop in performMaintenance, remove nodes whose >>>>>>>>>> LastPingTime - LastUpdateTime > threshold (like 10 minutes) >>>>>>>>>> >>>>>>>>>> - when picking a node, pick the guy with the oldest >>>>>>>>>> LastPingTime >>>>>>>>>> (not sure about this one -- I don't like situations where >>>>>>>>>> you >>>>>>>>>> pick >>>>>>>>>> the absolute *oldest* because then we could get in a >>>>>>>>>> situation >>>>>>>>>> where >>>>>>>>>> an old guy sits there, waiting to be kicked out, and we keep >>>>>>>>>> picking >>>>>>>>>> him to be pinged; I'd prefer to pick randomly, or random >>>>>>>>>> older >>>>>>>>>> guys). >>>>>>>>>> - a simple way around this is to simply have a self-adjusting >>>>>>>>>> threshold: so, we walk through the nodes, and everyone whose >>>>>>>>>> LastPingTime is older than MIN_UPDATE_TIME_TO_PING ago (a >>>>>>>>>> relative >>>>>>>>>> time) get added to a set; we pick a guy randomly from the >>>>>>>>>> set to >>>>>>>>>> ping. If the set is empty, we increase >>>>>>>>>> MIN_UPDATE_TIME_TO_PING, >>>>>>>>>> and >>>>>>>>>> wait for the next time the function is called. If the set >>>>>>>>>> is too >>>>>>>>>> large (relative to a fraction of MAX_NEIGHBORS), we >>>>>>>>>> decrease it. >>>>>>>>>> >>>>>>>>>> We still might get some churn from lost UDP packets. You >>>>>>>>>> might >>>>>>>>>> also >>>>>>>>>> give nodes a second chance to be pinged by including >>>>>>>>>> LastPingTime - >>>>>>>>>> LastUpdateTime > 1 minute in the set. Note that increases >>>>>>>>>> and >>>>>>>>>> decreases in MIN_UPDATE_TIME_TO_PING shouldn't be affected >>>>>>>>>> by the >>>>>>>>>> number of nodes that fall into this category. >>>>>>>>>> >>>>>>>>>> We still probably need a way for new potential neighbors >>>>>>>>>> to be >>>>>>>>>> rejected if our current set of neighbors is fine. Because I >>>>>>>>>> still >>>>>>>>>> want addNeigbor to really add the neighbor, how about >>>>>>>>>> putting a >>>>>>>>>> check before calling that to see if we need more neighbors. >>>>>>>>>> >>>>>>>>>> What thinks? >>>>>>>>>> >>>>>>>>>> -jonathan >>>>>>>>>> >>>>>>>>>> On Dec 9, 2008, at 11:37 PM, ext Eric Chan-Tin wrote: >>>>>>>>>> >>>>>>>>>>> Peter, >>>>>>>>>>> >>>>>>>>>>> Thanks. That fixed it. >>>>>>>>>>> >>>>>>>>>>> However, here is another problem I found while deploying >>>>>>>>>>> Pyxida >>>>>>>>>>> and >>>>>>>>>>> running them for a while. >>>>>>>>>>> >>>>>>>>>>> In performMaintenance(), I loop through every neighbor >>>>>>>>>>> and if >>>>>>>>>>> (neighbor.getLastUpdateTime() < current_time - 10 minutes), >>>>>>>>>>> then I >>>>>>>>>>> remove that neighbor. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> In NCManager, I call getNeighborToPing, which randomly >>>>>>>>>>> picks one >>>>>>>>>>> of >>>>>>>>>>> the >>>>>>>>>>> neighbors (32 in this case). Pinging is performed every 10 >>>>>>>>>>> seconds, so >>>>>>>>>>> in 10 minutes, you can ping 60 nodes. However, if a >>>>>>>>>>> neighbor A >>>>>>>>>>> is not >>>>>>>>>>> among those 60 nodes, it will get kicked out in >>>>>>>>>>> performMaintenance(), >>>>>>>>>>> although A is still alive (it just hasn't been contacted >>>>>>>>>>> yet). >>>>>>>>>>> >>>>>>>>>>> There are some possible options. >>>>>>>>>>> >>>>>>>>>>> 1) Increase the timeout period from 10 minutes to ?? >>>>>>>>>>> minutes. >>>>>>>>>>> This >>>>>>>>>>> doesn't quite work because say in Azureus, the number of >>>>>>>>>>> neighbors is >>>>>>>>>>> 512, so the timeout needs to be increased even more... It >>>>>>>>>>> could >>>>>>>>>>> be a >>>>>>>>>>> function of MAX_NEIGHBORS... >>>>>>>>>>> >>>>>>>>>>> 2) When Pyxida replies to a ping request, it will update the >>>>>>>>>>> neighbor's >>>>>>>>>>> lastUpdateTime. This does not quite work either because it >>>>>>>>>>> assumes a >>>>>>>>>>> symmetric relationship which is most likely not the case. >>>>>>>>>>> >>>>>>>>>>> 3) Introduce some more logic to Pyxida so it remembers what >>>>>>>>>>> nodes it >>>>>>>>>>> contacted and does not send, say, more than 2 requests in 10 >>>>>>>>>>> minutes to >>>>>>>>>>> the same node. This solution isn't that great either. >>>>>>>>>>> >>>>>>>>>>> Let me know what you guys think. I am currently leaning >>>>>>>>>>> towards >>>>>>>>>>> option >>>>>>>>>>> 1) but make the timeout period a function of MAX_NEIGHBORS, >>>>>>>>>>> although I >>>>>>>>>>> haven't quite found the perfect function yet. >>>>>>>>>>> >>>>>>>>>>> Here are some preliminary results. I compared the number of >>>>>>>>>>> neighbors >>>>>>>>>>> which are the same at each reported time (10 minutes). >>>>>>>>>>> Before >>>>>>>>>>> the fix, >>>>>>>>>>> more than 50% of the neighbors changed within 10 minutes. >>>>>>>>>>> After >>>>>>>>>>> the >>>>>>>>>>> fix, >>>>>>>>>>> only about 30% of the neighbors changed within 10 >>>>>>>>>>> minutes. It is >>>>>>>>>>> more >>>>>>>>>>> stable but I think 30% is still quite a big number. >>>>>>>>>>> >>>>>>>>>>> Sorry for the long email :) >>>>>>>>>>> >>>>>>>>>>> -Eric >>>>>>>>>>> >>>>>>>>>>> Peter Pietzuch wrote: >>>>>>>>>>>> Hi, >>>>>>>>>>>> >>>>>>>>>>>> Eric Chan-Tin wrote: >>>>>>>>>>>>> for (RemoteState<T> neighbor : neighbors) { //this is line >>>>>>>>>>>>> 1029 >>>>>>>>>>>>> if (neighbor.getLastUpdateTime() < (curr_time - >>>>>>>>>>>>> MAINTENANCE_PERIOD)) { >>>>>>>>>>>>> neighbors.remove(neighbor); >>>>>>>>>>>>> } >>>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> IIRC you can't modify a collection while iterating over it >>>>>>>>>>>> using the >>>>>>>>>>>> above for loop. Instead you have to use a proper >>>>>>>>>>>> Iterator<RemoteState<T>> and use it's remove method. >>>>>>>>>>>> >>>>>>>>>>>> Cheers, >>>>>>>>>>>> Peter >>>>>>>>>>>> >>>>>>>>>>>>> Jonathan Ledlie wrote: >>>>>>>>>>>>>> On Nov 3, 2008, at 3:46 PM, Eric Chan-Tin wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Jonathan, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I can insert some code to fix this and it doesn't >>>>>>>>>>>>>>> look too >>>>>>>>>>>>>>> hard >>>>>>>>>>>>>>> but I do >>>>>>>>>>>>>>> have some concerns. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I don't remember observing this behavior in our PL >>>>>>>>>>>>>>>> deployment, but that could have been because we had the >>>>>>>>>>>>>>>> ping >>>>>>>>>>>>>>>> time set >>>>>>>>>>>>>>>> to 1-2 minutes, IIRC. >>>>>>>>>>>>>>> The default ping time was set to 10 seconds + some >>>>>>>>>>>>>>> Gaussian >>>>>>>>>>>>>>> distribution, >>>>>>>>>>>>>>> so that's what I used, but changing the ping time to 1 >>>>>>>>>>>>>>> minute >>>>>>>>>>>>>>> or more >>>>>>>>>>>>>>> will >>>>>>>>>>>>>>> result in the same effect except it will take longer >>>>>>>>>>>>>>> for the >>>>>>>>>>>>>>> "cycle" to >>>>>>>>>>>>>>> occur. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> What I have in mind was the following: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 1) In NCClient.addNeighbor() would be changed to (in >>>>>>>>>>>>>>> pseudocode) >>>>>>>>>>>>>>> protected boolean addNeighbor(RemoteState<T> guy) { >>>>>>>>>>>>>>> boolean added = false; >>>>>>>>>>>>>>> //if neighbors.size() >= MAX_NEIGHBORS >>>>>>>>>>>>>>> //Check last update time of each >>>>>>>>>>>>>>> neighbor in >>>>>>>>>>>>>>> neighbors >>>>>>>>>>>>>>> //neighbor.getLastUpdateTime() >>>>>>>>>>>>>>> //if any of them is older than some number >>>>>>>>>>>>>>> TIMEOUT, >>>>>>>>>>>>>>> remove the oldest one, then add "guy" to neighbors >>>>>>>>>>>>>>> //else don't add "guy" to neighbors >>>>>>>>>>>>>>> //else add "guy" to neighbors >>>>>>>>>>>>>>> } >>>>>>>>>>>>>>> >>>>>>>>>>>>>> IMO this shouldn't be the behavior of this function >>>>>>>>>>>>>> because >>>>>>>>>>>>>> it >>>>>>>>>>>>>> is called >>>>>>>>>>>>>> "addNeighbor." Instead what could happen is that part >>>>>>>>>>>>>> of the >>>>>>>>>>>>>> performMaintenance() function could remove old neighbors. >>>>>>>>>>>>>> >>>>>>>>>>>>>>> 2) In NCManager, getGossipNode() would just pick a >>>>>>>>>>>>>>> random >>>>>>>>>>>>>>> node >>>>>>>>>>>>>>> from >>>>>>>>>>>>>>> localNC.primaryNC.getNeighbors() to send a request to >>>>>>>>>>>>>>> every 10 >>>>>>>>>>>>>>> seconds. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 3) Pyxida will still keep track of all GossipNodes in >>>>>>>>>>>>>>> the >>>>>>>>>>>>>>> network. It >>>>>>>>>>>>>>> could >>>>>>>>>>>>>>> also (say every 10 minutes) check its NCClient.neighbors >>>>>>>>>>>>>>> list >>>>>>>>>>>>>>> and see if >>>>>>>>>>>>>>> any neighbor is older than TIMEOUT, remove it, then >>>>>>>>>>>>>>> pick a >>>>>>>>>>>>>>> random >>>>>>>>>>>>>>> GossipNode to send a ping request. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The main problem with this scheme is that one Pyxida >>>>>>>>>>>>>>> node >>>>>>>>>>>>>>> would >>>>>>>>>>>>>>> most >>>>>>>>>>>>>>> likely >>>>>>>>>>>>>>> stay with the same set of neighbors all the time. Its >>>>>>>>>>>>>>> coordinates >>>>>>>>>>>>>>> would be >>>>>>>>>>>>>>> greatly affected by those 32 nodes (I am assuming >>>>>>>>>>>>>>> MAX_NEIGHBORS >>>>>>>>>>>>>>> = 32). >>>>>>>>>>>>>>> >>>>>>>>>>>>>> Instead of (2), why not have the getNeighborToPing >>>>>>>>>>>>>> function >>>>>>>>>>>>>> determine >>>>>>>>>>>>>> the behavior? If it returns a node, then ping it. If it >>>>>>>>>>>>>> returns null, >>>>>>>>>>>>>> then have the NCManager pick some other node to use? If >>>>>>>>>>>>>> things are >>>>>>>>>>>>>> hunky-dory, getNeighborToPing() would return a node (a >>>>>>>>>>>>>> current >>>>>>>>>>>>>> member of >>>>>>>>>>>>>> the neighbor set), if we're short on neighbors, then >>>>>>>>>>>>>> it would >>>>>>>>>>>>>> return >>>>>>>>>>>>>> null. The periodic maintenance would have already >>>>>>>>>>>>>> tossed old >>>>>>>>>>>>>> nodes, so >>>>>>>>>>>>>> this would be self-sustaining. >>>>>>>>>>>>>> >>>>>>>>>>>>>> The "problem" you mention is fairly central to network >>>>>>>>>>>>>> coordinates in >>>>>>>>>>>>>> general. Sure the coordinates will be a little better >>>>>>>>>>>>>> the >>>>>>>>>>>>>> more >>>>>>>>>>>>>> neighbors you have. But in a very large system >>>>>>>>>>>>>> talking to >>>>>>>>>>>>>> everybody >>>>>>>>>>>>>> isn't feasible. Peter and I have a graph in one of the >>>>>>>>>>>>>> workshop >>>>>>>>>>>>>> papers >>>>>>>>>>>>>> which shows the effect of increasing the number of >>>>>>>>>>>>>> neighbors >>>>>>>>>>>>>> >>>>>>>>>>>>>>> I don't think the network would be partitioned, but >>>>>>>>>>>>>>> it could >>>>>>>>>>>>>>> happen. For >>>>>>>>>>>>>>> example, the boostrap nodes have each other in their >>>>>>>>>>>>>>> neighbors >>>>>>>>>>>>>>> list. >>>>>>>>>>>>>>> Every >>>>>>>>>>>>>>> new node bootstrapping to the network would contact >>>>>>>>>>>>>>> those >>>>>>>>>>>>>>> bootstrapping >>>>>>>>>>>>>>> nodes first and add them to their neighbors. So every >>>>>>>>>>>>>>> Pyxida >>>>>>>>>>>>>>> node could >>>>>>>>>>>>>>> have as its neighbors the "Bootstrap nodes", which would >>>>>>>>>>>>>>> make >>>>>>>>>>>>>>> Pyxida >>>>>>>>>>>>>>> become >>>>>>>>>>>>>>> a centralized system. >>>>>>>>>>>>>>> >>>>>>>>>>>>>> As long as the initial bootstrap list is large enough, I >>>>>>>>>>>>>> think >>>>>>>>>>>>>> you are >>>>>>>>>>>>>> right, that partitions won't be a problem. >>>>>>>>>>>>>> >>>>>>>>>>>>>>> One last thing: When Pyxida receives a ping request >>>>>>>>>>>>>>> from a >>>>>>>>>>>>>>> node, it does >>>>>>>>>>>>>>> not update its coordinates. Is there any particular >>>>>>>>>>>>>>> reason >>>>>>>>>>>>>>> for >>>>>>>>>>>>>>> that? >>>>>>>>>>>>>>> Is it >>>>>>>>>>>>>>> because Pyxida will not be able to calculate the RTT? >>>>>>>>>>>>>>> >>>>>>>>>>>>>> Yes. >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Also, there is no "gossiping" of other nodes' >>>>>>>>>>>>>>> coordinates, >>>>>>>>>>>>>>> right? So if A >>>>>>>>>>>>>>> has B in its neighbors list, it will from time to >>>>>>>>>>>>>>> time tell >>>>>>>>>>>>>>> other nodes >>>>>>>>>>>>>>> about B's coordinates. I realize this has a lot of >>>>>>>>>>>>>>> complications but the >>>>>>>>>>>>>>> original Vivaldi paper was very fuzzy about this >>>>>>>>>>>>>>> point and >>>>>>>>>>>>>>> the >>>>>>>>>>>>>>> previous >>>>>>>>>>>>>>> point that it seems it is up to the implementors to >>>>>>>>>>>>>>> decide >>>>>>>>>>>>>>> what >>>>>>>>>>>>>>> to do. >>>>>>>>>>>>>>> >>>>>>>>>>>>>> I don't think we are doing this. It could be useful in >>>>>>>>>>>>>> certain >>>>>>>>>>>>>> circumstance (e.g., picking potential new neighbors), but >>>>>>>>>>>>>> we're >>>>>>>>>>>>>> not, IIRC. >>>>>>>>>>>>>> >>>>>>>>>>>>>>> I will have to think some more about this as there >>>>>>>>>>>>>>> are a lot >>>>>>>>>>>>>>> of >>>>>>>>>>>>>>> nasty >>>>>>>>>>>>>>> little side effects that could happen, but these are my >>>>>>>>>>>>>>> thoughts. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -Eric >>>>>>>>>>>>>>> >>>>>>>>>>>>>> Sounds good. >>>>>>>>>>>>>> >>>>>>>>>>>>>> -jonathan >>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Nov 3 2008, Jonathan Ledlie wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Eric, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Based on a look at the code just now, I see where >>>>>>>>>>>>>>>> you are >>>>>>>>>>>>>>>> coming from: >>>>>>>>>>>>>>>> with these params on PL, there isn't anything >>>>>>>>>>>>>>>> stopping the >>>>>>>>>>>>>>>> neighbor >>>>>>>>>>>>>>>> list from circulating pretty quickly through all >>>>>>>>>>>>>>>> nodes. In >>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>> Azureus implementation, the guys fed to the alg (fed to >>>>>>>>>>>>>>>> processSample) >>>>>>>>>>>>>>>> weren't changing much because they were coming from the >>>>>>>>>>>>>>>> routing finger >>>>>>>>>>>>>>>> tables. I don't remember observing this behavior in >>>>>>>>>>>>>>>> our PL >>>>>>>>>>>>>>>> deployment, but that could have been because we had the >>>>>>>>>>>>>>>> ping >>>>>>>>>>>>>>>> time set >>>>>>>>>>>>>>>> to 1-2 minutes, IIRC. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Do you want to insert some code that will fix this? >>>>>>>>>>>>>>>> Basically >>>>>>>>>>>>>>>> we want >>>>>>>>>>>>>>>> to only add a new neighbor if our oldest neighbor >>>>>>>>>>>>>>>> should be >>>>>>>>>>>>>>>> tossed (or >>>>>>>>>>>>>>>> if we haven't reached MAX_NEIGHBORS). It's probably >>>>>>>>>>>>>>>> best >>>>>>>>>>>>>>>> from a >>>>>>>>>>>>>>>> clarity standpoint if addNeighbor does continue to >>>>>>>>>>>>>>>> actually >>>>>>>>>>>>>>>> add the >>>>>>>>>>>>>>>> neighbor in all cases. Probably the best thing to >>>>>>>>>>>>>>>> do is to >>>>>>>>>>>>>>>> have >>>>>>>>>>>>>>>> NCClient drop old neighbors internally and have >>>>>>>>>>>>>>>> NCManager >>>>>>>>>>>>>>>> ask >>>>>>>>>>>>>>>> it which >>>>>>>>>>>>>>>> nodes should be pinged via getNeighborToPing. So >>>>>>>>>>>>>>>> NCClient >>>>>>>>>>>>>>>> would >>>>>>>>>>>>>>>> continue to have the same behavior on processSample (it >>>>>>>>>>>>>>>> would not >>>>>>>>>>>>>>>> reject new information) and NCManager and NCClient >>>>>>>>>>>>>>>> would >>>>>>>>>>>>>>>> cooperate to >>>>>>>>>>>>>>>> measure RTTs to current neighbors. We do want to make >>>>>>>>>>>>>>>> sure, >>>>>>>>>>>>>>>> however, >>>>>>>>>>>>>>>> that the gossip set stays large enough so that >>>>>>>>>>>>>>>> partitions >>>>>>>>>>>>>>>> don't occur. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Do you want to be added to the SourceForge Pyxida >>>>>>>>>>>>>>>> developers >>>>>>>>>>>>>>>> to make >>>>>>>>>>>>>>>> this change? (Please test it out carefully first.) >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -jonathan >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Oct 30, 2008, at 11:32 AM, ext Eric Chan-Tin wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Jonathan, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> From what I saw, eventually, you learn about all >>>>>>>>>>>>>>>>> the nodes >>>>>>>>>>>>>>>>> in >>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>> network. Every 10 seconds, Pyxida picks a random >>>>>>>>>>>>>>>>> "Gossip" >>>>>>>>>>>>>>>>> node from >>>>>>>>>>>>>>>>> its >>>>>>>>>>>>>>>>> list and sends a request. The Gossip node will >>>>>>>>>>>>>>>>> reply back >>>>>>>>>>>>>>>>> with an >>>>>>>>>>>>>>>>> "upNeighbor", that is, a "Gossip" node that is "up" or >>>>>>>>>>>>>>>>> with 10% >>>>>>>>>>>>>>>>> probability, a node that is "pending". Since >>>>>>>>>>>>>>>>> naturally, >>>>>>>>>>>>>>>>> all >>>>>>>>>>>>>>>>> the nodes >>>>>>>>>>>>>>>>> bootstrap off the same set of nodes, every node will >>>>>>>>>>>>>>>>> eventually know >>>>>>>>>>>>>>>>> of >>>>>>>>>>>>>>>>> every other node. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Pyxida might not be rotating through everybody >>>>>>>>>>>>>>>>> continuously >>>>>>>>>>>>>>>>> (although >>>>>>>>>>>>>>>>> eventually, since it knows everybody, it will have >>>>>>>>>>>>>>>>> added >>>>>>>>>>>>>>>>> everybody to >>>>>>>>>>>>>>>>> its Neighbors list at some point), but could be >>>>>>>>>>>>>>>>> removing a >>>>>>>>>>>>>>>>> node that >>>>>>>>>>>>>>>>> is >>>>>>>>>>>>>>>>> "alive". So, if MAX_NEIGHBORS = 32, let's call them >>>>>>>>>>>>>>>>> A_1, >>>>>>>>>>>>>>>>> A_2, >>>>>>>>>>>>>>>>> ..., >>>>>>>>>>>>>>>>> A_32. >>>>>>>>>>>>>>>>> Pyxida picks a random Gossip node B, sends it a >>>>>>>>>>>>>>>>> request >>>>>>>>>>>>>>>>> and B >>>>>>>>>>>>>>>>> replies >>>>>>>>>>>>>>>>> back. Pyxida will add B to its Neighbors list, see >>>>>>>>>>>>>>>>> that >>>>>>>>>>>>>>>>> size of >>>>>>>>>>>>>>>>> Neighbors > MAX_NEIGHBORS, kick A_1 out. Then from >>>>>>>>>>>>>>>>> those >>>>>>>>>>>>>>>>> 32 >>>>>>>>>>>>>>>>> neighbors >>>>>>>>>>>>>>>>> A_2, ...., A_32, B, it will calculate the force, >>>>>>>>>>>>>>>>> and apply >>>>>>>>>>>>>>>>> it >>>>>>>>>>>>>>>>> to the >>>>>>>>>>>>>>>>> current system coordinates. Thus, if every 10 seconds, >>>>>>>>>>>>>>>>> Pyxida >>>>>>>>>>>>>>>>> picks a >>>>>>>>>>>>>>>>> different set of nodes C, D, E, etc... it would keep >>>>>>>>>>>>>>>>> kicking >>>>>>>>>>>>>>>>> nodes out >>>>>>>>>>>>>>>>> of the neighbors list. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> It's the "kicking A_1 out" part that seemed a bit >>>>>>>>>>>>>>>>> confusing. >>>>>>>>>>>>>>>>> Churn >>>>>>>>>>>>>>>>> is a >>>>>>>>>>>>>>>>> good reason but you don't check that A_1 might >>>>>>>>>>>>>>>>> still be >>>>>>>>>>>>>>>>> alive >>>>>>>>>>>>>>>>> or maybe >>>>>>>>>>>>>>>>> A_1 was just added. I have been printing the neighbors >>>>>>>>>>>>>>>>> list >>>>>>>>>>>>>>>>> every 10 >>>>>>>>>>>>>>>>> minutes, and they are completely different between >>>>>>>>>>>>>>>>> those >>>>>>>>>>>>>>>>> 10 >>>>>>>>>>>>>>>>> minutes. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> One possible problem might be that I am choosing >>>>>>>>>>>>>>>>> the value >>>>>>>>>>>>>>>>> of >>>>>>>>>>>>>>>>> MAX_NEIGHBORS to be too small. Originally it was >>>>>>>>>>>>>>>>> set to >>>>>>>>>>>>>>>>> 512 >>>>>>>>>>>>>>>>> (which was >>>>>>>>>>>>>>>>> meant for Azureus). Since I am using Planetlab >>>>>>>>>>>>>>>>> where there >>>>>>>>>>>>>>>>> are about >>>>>>>>>>>>>>>>> 400-500 "stable" nodes, I changed the value to 32. >>>>>>>>>>>>>>>>> But I >>>>>>>>>>>>>>>>> don't think >>>>>>>>>>>>>>>>> that should be a problem since log(n) = log(512) = 9. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> -Eric >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Jonathan Ledlie wrote: >>>>>>>>>>>>>>>>>> Eric, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Glad to hear it. Let Peter and me know if you >>>>>>>>>>>>>>>>>> publish >>>>>>>>>>>>>>>>>> anything using >>>>>>>>>>>>>>>>>> Pyxida and we can put it on the web page. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> "Gossip" nodes are for learning about the network. >>>>>>>>>>>>>>>>>> "Neighbors" are >>>>>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>>>> constructing the coordinate. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> So, IIRC, the gossip process learns about nodes that >>>>>>>>>>>>>>>>>> may or >>>>>>>>>>>>>>>>>> may not >>>>>>>>>>>>>>>>>> be >>>>>>>>>>>>>>>>>> applied to the neighbor list. That is, we may >>>>>>>>>>>>>>>>>> eventually >>>>>>>>>>>>>>>>>> learn >>>>>>>>>>>>>>>>>> about a >>>>>>>>>>>>>>>>>> large portion of the network, via gossiping, but >>>>>>>>>>>>>>>>>> should >>>>>>>>>>>>>>>>>> only >>>>>>>>>>>>>>>>>> be using >>>>>>>>>>>>>>>>>> MAX_NEIGHBORS at any given time for coordinate >>>>>>>>>>>>>>>>>> computation. >>>>>>>>>>>>>>>>>> The >>>>>>>>>>>>>>>>>> reason >>>>>>>>>>>>>>>>>> we don't want to use a fixed set of neighbors for >>>>>>>>>>>>>>>>>> coordinate >>>>>>>>>>>>>>>>>> computation >>>>>>>>>>>>>>>>>> is because nodes may come and go from the system. >>>>>>>>>>>>>>>>>> However, >>>>>>>>>>>>>>>>>> it should >>>>>>>>>>>>>>>>>> not be the case that we just rotate through >>>>>>>>>>>>>>>>>> everybody, >>>>>>>>>>>>>>>>>> because >>>>>>>>>>>>>>>>>> then, as >>>>>>>>>>>>>>>>>> you note, we'd be computing against everyone's >>>>>>>>>>>>>>>>>> coordinate/ >>>>>>>>>>>>>>>>>> latency, >>>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>> this is counter to the point of network coordinates. >>>>>>>>>>>>>>>>>> With this >>>>>>>>>>>>>>>>>> gradual >>>>>>>>>>>>>>>>>> transience as a desirable quality in mind, I am >>>>>>>>>>>>>>>>>> pretty >>>>>>>>>>>>>>>>>> sure we >>>>>>>>>>>>>>>>>> maintain >>>>>>>>>>>>>>>>>> neighbors in a fairly fixed manner. Does this not >>>>>>>>>>>>>>>>>> seem >>>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>> be the >>>>>>>>>>>>>>>>>> case >>>>>>>>>>>>>>>>>> to you? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> -jonathan >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Oct 29, 2008, at 9:20 PM, Eric Chan-Tin wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Hi Jonathan and Peter, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> First of all, just wanted to say that Pyxida has >>>>>>>>>>>>>>>>>>> been >>>>>>>>>>>>>>>>>>> working >>>>>>>>>>>>>>>>>>> great on >>>>>>>>>>>>>>>>>>> Planetlab for the past 4 months or so. Getting >>>>>>>>>>>>>>>>>>> good data >>>>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>> experiments >>>>>>>>>>>>>>>>>>> going well. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> However, I do have one question/comment. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Every Pyxida node contains a list of Neighbors (in >>>>>>>>>>>>>>>>>>> NCClient) and a >>>>>>>>>>>>>>>>>>> list >>>>>>>>>>>>>>>>>>> of Gossip nodes (they are called neighbors in >>>>>>>>>>>>>>>>>>> NCManager >>>>>>>>>>>>>>>>>>> but >>>>>>>>>>>>>>>>>>> I will >>>>>>>>>>>>>>>>>>> call >>>>>>>>>>>>>>>>>>> them gossip for now so as not to confuse with the >>>>>>>>>>>>>>>>>>> 'real' >>>>>>>>>>>>>>>>>>> neighbors >>>>>>>>>>>>>>>>>>> in >>>>>>>>>>>>>>>>>>> NCClient). Every 10 seconds, Pyxida will pick a >>>>>>>>>>>>>>>>>>> random >>>>>>>>>>>>>>>>>>> node >>>>>>>>>>>>>>>>>>> (call >>>>>>>>>>>>>>>>>>> it A) >>>>>>>>>>>>>>>>>>> from Gossip and send a gossip request (basically >>>>>>>>>>>>>>>>>>> asking >>>>>>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>>>>> rtt, >>>>>>>>>>>>>>>>>>> coordinate, error). But Gossip contains all the >>>>>>>>>>>>>>>>>>> nodes in >>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>> network. >>>>>>>>>>>>>>>>>>> The comment in the code said that was to avoid >>>>>>>>>>>>>>>>>>> Pyxida >>>>>>>>>>>>>>>>>>> becoming >>>>>>>>>>>>>>>>>>> lonely >>>>>>>>>>>>>>>>>>> and no-one to gossip with, which makes sense. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Once A responds, Pyxida will add A to its >>>>>>>>>>>>>>>>>>> Neighbors (if >>>>>>>>>>>>>>>>>>> Neighbors >>>>>>>>>>>>>>>>>>> does >>>>>>>>>>>>>>>>>>> not contain A) and remove the first element from the >>>>>>>>>>>>>>>>>>> Neighbors list. >>>>>>>>>>>>>>>>>>> This seems a little bit odd. Was there any >>>>>>>>>>>>>>>>>>> particular >>>>>>>>>>>>>>>>>>> reasoning >>>>>>>>>>>>>>>>>>> behind >>>>>>>>>>>>>>>>>>> this? Because it would seem like you're getting the >>>>>>>>>>>>>>>>>>> coordinate from >>>>>>>>>>>>>>>>>>> every node in the network eventually. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Thanks a lot, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> -Eric >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Jonathan Ledlie wrote: >>>>>>>>>>>>>>>>>>>> Your inclination is correct. See forwarded >>>>>>>>>>>>>>>>>>>> message. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> -jonathan >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Apr 9, 2008, at 1:53 PM, ext Eric Chan-Tin >>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Thank you for the clarification. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> I've been reading through your published papers >>>>>>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>>>> code >>>>>>>>>>>>>>>>>>>>> and I >>>>>>>>>>>>>>>>>>>>> must >>>>>>>>>>>>>>>>>>>>> say I learned a lot about how Pyxida works :) >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> I still have a quick and easy (hopefully) >>>>>>>>>>>>>>>>>>>>> question. >>>>>>>>>>>>>>>>>>>>> Each >>>>>>>>>>>>>>>>>>>>> Pyxida >>>>>>>>>>>>>>>>>>>>> node >>>>>>>>>>>>>>>>>>>>> has a small amount of neighbors and learn about >>>>>>>>>>>>>>>>>>>>> other >>>>>>>>>>>>>>>>>>>>> nodes >>>>>>>>>>>>>>>>>>>>> through >>>>>>>>>>>>>>>>>>>>> other nodes. I am running on about 200 >>>>>>>>>>>>>>>>>>>>> machines, and >>>>>>>>>>>>>>>>>>>>> it >>>>>>>>>>>>>>>>>>>>> seems >>>>>>>>>>>>>>>>>>>>> like all >>>>>>>>>>>>>>>>>>>>> the Pyxida nodes are adding each other to their >>>>>>>>>>>>>>>>>>>>> upNeighbors set. >>>>>>>>>>>>>>>>>>>>> NCClient.java has the MAX_NEIGHBORS set to 512. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> I wanted to make sure that I am not misreading the >>>>>>>>>>>>>>>>>>>>> code, but >>>>>>>>>>>>>>>>>>>>> shouldn't >>>>>>>>>>>>>>>>>>>>> each Pyxida node keep a small number of >>>>>>>>>>>>>>>>>>>>> neighbors in >>>>>>>>>>>>>>>>>>>>> its >>>>>>>>>>>>>>>>>>>>> upNeighbors >>>>>>>>>>>>>>>>>>>>> set. Else, it will be a lot of updates to send a >>>>>>>>>>>>>>>>>>>>> gossipMessage >>>>>>>>>>>>>>>>>>>>> to all >>>>>>>>>>>>>>>>>>>>> 200 neighbors, and it also means that a Pyxida >>>>>>>>>>>>>>>>>>>>> node >>>>>>>>>>>>>>>>>>>>> has >>>>>>>>>>>>>>>>>>>>> the whole >>>>>>>>>>>>>>>>>>>>> network in its upNeighbors set - that seems >>>>>>>>>>>>>>>>>>>>> wrong. I >>>>>>>>>>>>>>>>>>>>> am >>>>>>>>>>>>>>>>>>>>> inclined >>>>>>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>>>> change the value of MAX_NEIGHBORS but wanted to >>>>>>>>>>>>>>>>>>>>> make >>>>>>>>>>>>>>>>>>>>> sure >>>>>>>>>>>>>>>>>>>>> that my >>>>>>>>>>>>>>>>>>>>> understanding is correct first. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Thank you, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Eric >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> ---------------------------------------------------- >>>>>>>>>>>>>>>>>>> --------------------- >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> This SF.Net email is sponsored by the Moblin Your >>>>>>>>>>>>>>>>>>> Move >>>>>>>>>>>>>>>>>>> Developer's >>>>>>>>>>>>>>>>>>> challenge Build the coolest Linux based applications >>>>>>>>>>>>>>>>>>> with >>>>>>>>>>>>>>>>>>> Moblin >>>>>>>>>>>>>>>>>>> SDK & >>>>>>>>>>>>>>>>>>> win great prizes Grand prize is a trip for two to an >>>>>>>>>>>>>>>>>>> Open >>>>>>>>>>>>>>>>>>> Source >>>>>>>>>>>>>>>>>>> event >>>>>>>>>>>>>>>>>>> anywhere in the world >>>>>>>>>>>>>>>>>>> http://moblin-contest.org/redirect.php? >>>>>>>>>>>>>>>>>>> banner_id=100&url=/ >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>>> Pyxida-users mailing >>>>>>>>>>>>>>>>>>> list Pyx...@li... >>>>>>>>>>>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/pyxida- >>>>>>>>>>>>>>>>>>> users >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -------------------------------------------------------- >>>>>>>>>>>>>>> ----------------- >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> This SF.Net email is sponsored by the Moblin Your Move >>>>>>>>>>>>>>> Developer's challenge Build the coolest Linux based >>>>>>>>>>>>>>> applications with Moblin SDK & win great prizes Grand >>>>>>>>>>>>>>> prize is >>>>>>>>>>>>>>> a trip for two to an Open Source event anywhere in >>>>>>>>>>>>>>> the world >>>>>>>>>>>>>>> http://moblin-contest.org/redirect.php? >>>>>>>>>>>>>>> banner_id=100&url=/ >>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>> Pyxida-users >>>>>>>>>>>>>>> mailing list Pyx...@li... >>>>>>>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/pyxida- >>>>>>>>>>>>>>> users >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> ---------------------------------------------------------- >>>>>>>>>>>>> -------------------- >>>>>>>>>>>>> >>>>>>>>>>>>> SF.Net email is Sponsored by MIX09, March 18-20, 2009 >>>>>>>>>>>>> in Las >>>>>>>>>>>>> Vegas, Nevada. The future of the web can't happen >>>>>>>>>>>>> without you. >>>>>>>>>>>>> Join us at MIX09 to help pave the way to the Next Web now. >>>>>>>>>>>>> Learn >>>>>>>>>>>>> more and register at >>>>>>>>>>>>> http://ad.doubleclick.net/clk;208669438;13503038;i? >>>>>>>>>>>>> http://2009.visitmix.com/ >>>>>>>>>>>>> >>>>>>>>>>>>> _______________________________________________ Pyxida- >>>>>>>>>>>>> users >>>>>>>>>>>>> mailing list Pyx...@li... >>>>>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/pyxida-users >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> >>>>> ------------------------------------------------------------------ >>>>> ------------ >>>>> >>>>> SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las >>>>> Vegas, >>>>> Nevada. The future of the web can't happen without you. Join us at >>>>> MIX09 >>>>> to help pave the way to the Next Web now. Learn more and >>>>> register at >>>>> http://ad.doubleclick.net/clk;208669438;13503038;i?http:// >>>>> 2009.visitmix.com/ >>>>> >>>>> _______________________________________________ Pyxida-users >>>>> mailing >>>>> list Pyx...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/pyxida-users >>>> >>> >>> >>> >>> -------------------------------------------------------------------- >>> ---------- >>> >>> SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, >>> Nevada. >>> The future of the web can't happen without you. Join us at MIX09 >>> to help >>> pave the way to the Next Web now. Learn more and register at >>> http://ad.doubleclick.net/clk;208669438;13503038;i?http:// >>> 2009.visitmix.com/ >>> >>> _______________________________________________ >>> Pyxida-users mailing list >>> Pyx...@li... >>> https://lists.sourceforge.net/lists/listinfo/pyxida-users > > ---------------------------------------------------------------------- > -------- > SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, > Nevada. > The future of the web can't happen without you. Join us at MIX09 > to help > pave the way to the Next Web now. Learn more and register at > http://ad.doubleclick.net/clk;208669438;13503038;i?http:// > 2009.visitmix.com/ > _______________________________________________ > Pyxida-users mailing list > Pyx...@li... > https://lists.sourceforge.net/lists/listinfo/pyxida-users |
From: Eric Chan-T. <dch...@cs...> - 2008-12-19 01:24:49
|
jdk-1.5.0_09 Jonathan Ledlie wrote: > > Eric, > > I'm getting an error (an old favorite): > WA 1229649353501 NetUtil : Could not deserialize java object: > java.io.StreamCorruptedException: invalid type code: CC > WA 1229649353501 ObjComm$ReadConnHa: No valid msgObject received from > remoteAddr=planet1.scs.stanford.edu:26500. Ignoring. > > What java version are you running on PL? > > -jonathan > > On Dec 18, 2008, at 4:49 PM, Eric Chan-Tin wrote: > >> Do you mean the Pyxida code running on PlanetLab? I killed them after >> running it overnight. I am currently restarting the experiments. Try >> again >> in about 15 minutes and it should be ready. >> >> Fix1 is on port 28500 (and api.port 28501). Fix2 is on port 26500 >> (api.port >> 26501). The node you can contact is "planet1.scs.stanford.edu" or >> "planetlab01.cs.washington.edu". >> >> About your other email: Yes I know. The fix was to go through the list >> again, but I didn't want to implement that since it seems to be working >> fine. So go through the list once to calculate weight, and add neighbors >> into another list (sorted by weights). Then go through that new list. The >> new list will be sorted by higher weights, so older neighbors will be >> "tested" first. If you want it that way, I can do that. However, I do not >> think we need to. Let's assume iteration 0, the first node gets >> picked. At >> iteration 1, that first node will have a very small probability of >> getting >> picked again. >> >> -Eric >> >> >> On Dec 18 2008, Jonathan Ledlie wrote: >> >>> >>> Eric, >>> >>> Do you have a PL Pyxida slice that I can connect to to try this out? >>> >>> That is, I'll run it on my machine, but connect to your slice via the >>> right ports. >>> >>> -jonathan >>> >>> On Dec 18, 2008, at 11:35 AM, Eric Chan-Tin wrote: >>> >>>> Sorry for the delay. Was busy with final exams and projects. >>>> >>>> I made the changes. They can be found at >>>> >>>> http://www-users.cs.umn.edu/~dchantin/NCClient.java >>>> http://www-users.cs.umn.edu/~dchantin/NCManager.java >>>> http://www-users.cs.umn.edu/~dchantin/RemoteState.java >>>> >>>> All the changes are enclosed within >>>> //ERIC START >>>> ... >>>> //ERIC END >>>> >>>> I compared every reported "neighbors list" of each Pyxida node >>>> (about 370 >>>> total) and see how many neighbors did not change within 10 minutes. >>>> Then >>>> did the average for all the nodes. >>>> >>>> Original: 43.37% of the neighbors stayed the same on average. >>>> >>>> Fix0 (before we found the interesting situation): 74.73% >>>> >>>> Fix1 (with the weight probability of picking a neighbor to ping): >>>> 95.79% >>>> >>>> Fix2 (with the weight probability and the MIN_UPDATE_PING): 98.81% >>>> >>>> Also, the reported error, rrl, etc.. were much lower in the Fixed >>>> Pyxida >>>> than in the original one (this could not be significant, since the >>>> Original >>>> experiment was performed 1-2 months ago), but as expected, it was not >>>> higher. >>>> >>>> You can take a look at the code and test it if you want. If everything >>>> looks fine, I'll remove some of the debugging code and other "useless" >>>> comments (like //ERIC). Also, let me know which "fix" you prefer. I >>>> will >>>> then email you the updated code. >>>> >>>> Thanks, >>>> >>>> -Eric >>>> >>>> >>>> On Dec 10 2008, Jonathan Ledlie wrote: >>>> >>>>> >>>>> Seems reasonable. >>>>> >>>>> Both ideas (lastPingTime and a weighted probability based on age) >>>>> seem >>>>> to be independent. >>>>> >>>>> Some of the code is more CPU intensive than it should be (e.g., >>>>> looping through potentially long lists), and that is a slight >>>>> downside >>>>> to the function approach vs a set, but this probably isn't a big >>>>> deal. >>>>> >>>>> -jonathan >>>>> >>>>> On Dec 10, 2008, at 10:40 PM, Eric Chan-Tin wrote: >>>>> >>>>>> okay. I think I might have misunderstood you. Your idea of >>>>>> lastPingTime >>>>>> looks interesting. >>>>>> >>>>>> My idea is to introduce a probability function. Right now, it just >>>>>> randomly picks one of the neighbors. What I am thinking it might >>>>>> do is >>>>>> that instead of just randomly picking a neighbor, pick a neighbor >>>>>> based >>>>>> on some function. So, the neighbor whose lastUpdateTime is oldest >>>>>> will >>>>>> have a higher chance of being picked than a neighbor who was just >>>>>> updated. >>>>>> >>>>>> Does that make more sense? >>>>>> >>>>>> -Eric >>>>>> >>>>>> Jonathan Ledlie wrote: >>>>>>> >>>>>>> Hmm. The advantage of lastPingTime is that you learn whether or >>>>>>> not >>>>>>> you've tried to contact a neighbor and they haven't responded. >>>>>>> Then you >>>>>>> can use that knowledge to determine whether or not to toss them. >>>>>>> >>>>>>> Maybe I'm not understanding your suggestion then. >>>>>>> >>>>>>> -jonathan >>>>>>> >>>>>>> On Dec 10, 2008, at 12:51 PM, Eric Chan-Tin wrote: >>>>>>> >>>>>>>> That's a good idea. Here is a little twist on it. >>>>>>>> >>>>>>>> Put a "weight" on each neighbor, based on the lastUpdateTime. So >>>>>>>> if >>>>>>>> currentTime - neighbor's lastUpdateTime, then the weight is >>>>>>>> higher. >>>>>>>> Then in getNeighborToPing, neighbor's with higher weight have a >>>>>>>> higher >>>>>>>> probability of getting picked than neighbor's with lower weight. >>>>>>>> >>>>>>>> The weight could be ( (currentTime - lastUpdateTime) / 15 >>>>>>>> minutes). >>>>>>>> It's not 10 minutes so as to avoid a probability of 1 if the >>>>>>>> lastUpdateTime was the previous performMaintenance. This should >>>>>>>> also >>>>>>>> avoid the problem of always pinging the oldest neighbor since it's >>>>>>>> about probability (weight). >>>>>>>> >>>>>>>> It seems a little bit less complicated than having lastPingTime >>>>>>>> and >>>>>>>> MIN_UPDATE_TIME. >>>>>>>> >>>>>>>> -Eric >>>>>>>> >>>>>>>> On Dec 10 2008, Jonathan Ledlie wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> This is an interesting situation. Let's think about the goal. >>>>>>>>> The >>>>>>>>> goal is to keep pinging the same nodes as long as they are alive, >>>>>>>>> and to replace a node soon after it dies, keeping the total >>>>>>>>> number >>>>>>>>> of up neighbors at approximately MAX_NEIGHBORS. >>>>>>>>> >>>>>>>>> How about the following: >>>>>>>>> - when you pick a neighbor to ping, you set a time on that event >>>>>>>>> (in >>>>>>>>> a new per-neighbor timestamp LastPingTime) >>>>>>>>> - when you first insert a neighbor, set this timestamp to 0. >>>>>>>>> - when you loop in performMaintenance, remove nodes whose >>>>>>>>> LastPingTime - LastUpdateTime > threshold (like 10 minutes) >>>>>>>>> >>>>>>>>> - when picking a node, pick the guy with the oldest LastPingTime >>>>>>>>> (not sure about this one -- I don't like situations where you >>>>>>>>> pick >>>>>>>>> the absolute *oldest* because then we could get in a situation >>>>>>>>> where >>>>>>>>> an old guy sits there, waiting to be kicked out, and we keep >>>>>>>>> picking >>>>>>>>> him to be pinged; I'd prefer to pick randomly, or random older >>>>>>>>> guys). >>>>>>>>> - a simple way around this is to simply have a self-adjusting >>>>>>>>> threshold: so, we walk through the nodes, and everyone whose >>>>>>>>> LastPingTime is older than MIN_UPDATE_TIME_TO_PING ago (a >>>>>>>>> relative >>>>>>>>> time) get added to a set; we pick a guy randomly from the set to >>>>>>>>> ping. If the set is empty, we increase MIN_UPDATE_TIME_TO_PING, >>>>>>>>> and >>>>>>>>> wait for the next time the function is called. If the set is too >>>>>>>>> large (relative to a fraction of MAX_NEIGHBORS), we decrease it. >>>>>>>>> >>>>>>>>> We still might get some churn from lost UDP packets. You might >>>>>>>>> also >>>>>>>>> give nodes a second chance to be pinged by including >>>>>>>>> LastPingTime - >>>>>>>>> LastUpdateTime > 1 minute in the set. Note that increases and >>>>>>>>> decreases in MIN_UPDATE_TIME_TO_PING shouldn't be affected by the >>>>>>>>> number of nodes that fall into this category. >>>>>>>>> >>>>>>>>> We still probably need a way for new potential neighbors to be >>>>>>>>> rejected if our current set of neighbors is fine. Because I >>>>>>>>> still >>>>>>>>> want addNeigbor to really add the neighbor, how about putting a >>>>>>>>> check before calling that to see if we need more neighbors. >>>>>>>>> >>>>>>>>> What thinks? >>>>>>>>> >>>>>>>>> -jonathan >>>>>>>>> >>>>>>>>> On Dec 9, 2008, at 11:37 PM, ext Eric Chan-Tin wrote: >>>>>>>>> >>>>>>>>>> Peter, >>>>>>>>>> >>>>>>>>>> Thanks. That fixed it. >>>>>>>>>> >>>>>>>>>> However, here is another problem I found while deploying Pyxida >>>>>>>>>> and >>>>>>>>>> running them for a while. >>>>>>>>>> >>>>>>>>>> In performMaintenance(), I loop through every neighbor and if >>>>>>>>>> (neighbor.getLastUpdateTime() < current_time - 10 minutes), >>>>>>>>>> then I >>>>>>>>>> remove that neighbor. >>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> In NCManager, I call getNeighborToPing, which randomly picks one >>>>>>>>>> of >>>>>>>>>> the >>>>>>>>>> neighbors (32 in this case). Pinging is performed every 10 >>>>>>>>>> seconds, so >>>>>>>>>> in 10 minutes, you can ping 60 nodes. However, if a neighbor A >>>>>>>>>> is not >>>>>>>>>> among those 60 nodes, it will get kicked out in >>>>>>>>>> performMaintenance(), >>>>>>>>>> although A is still alive (it just hasn't been contacted yet). >>>>>>>>>> >>>>>>>>>> There are some possible options. >>>>>>>>>> >>>>>>>>>> 1) Increase the timeout period from 10 minutes to ?? minutes. >>>>>>>>>> This >>>>>>>>>> doesn't quite work because say in Azureus, the number of >>>>>>>>>> neighbors is >>>>>>>>>> 512, so the timeout needs to be increased even more... It could >>>>>>>>>> be a >>>>>>>>>> function of MAX_NEIGHBORS... >>>>>>>>>> >>>>>>>>>> 2) When Pyxida replies to a ping request, it will update the >>>>>>>>>> neighbor's >>>>>>>>>> lastUpdateTime. This does not quite work either because it >>>>>>>>>> assumes a >>>>>>>>>> symmetric relationship which is most likely not the case. >>>>>>>>>> >>>>>>>>>> 3) Introduce some more logic to Pyxida so it remembers what >>>>>>>>>> nodes it >>>>>>>>>> contacted and does not send, say, more than 2 requests in 10 >>>>>>>>>> minutes to >>>>>>>>>> the same node. This solution isn't that great either. >>>>>>>>>> >>>>>>>>>> Let me know what you guys think. I am currently leaning towards >>>>>>>>>> option >>>>>>>>>> 1) but make the timeout period a function of MAX_NEIGHBORS, >>>>>>>>>> although I >>>>>>>>>> haven't quite found the perfect function yet. >>>>>>>>>> >>>>>>>>>> Here are some preliminary results. I compared the number of >>>>>>>>>> neighbors >>>>>>>>>> which are the same at each reported time (10 minutes). Before >>>>>>>>>> the fix, >>>>>>>>>> more than 50% of the neighbors changed within 10 minutes. After >>>>>>>>>> the >>>>>>>>>> fix, >>>>>>>>>> only about 30% of the neighbors changed within 10 minutes. It is >>>>>>>>>> more >>>>>>>>>> stable but I think 30% is still quite a big number. >>>>>>>>>> >>>>>>>>>> Sorry for the long email :) >>>>>>>>>> >>>>>>>>>> -Eric >>>>>>>>>> >>>>>>>>>> Peter Pietzuch wrote: >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> Eric Chan-Tin wrote: >>>>>>>>>>>> for (RemoteState<T> neighbor : neighbors) { //this is line >>>>>>>>>>>> 1029 >>>>>>>>>>>> if (neighbor.getLastUpdateTime() < (curr_time - >>>>>>>>>>>> MAINTENANCE_PERIOD)) { >>>>>>>>>>>> neighbors.remove(neighbor); >>>>>>>>>>>> } >>>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> IIRC you can't modify a collection while iterating over it >>>>>>>>>>> using the >>>>>>>>>>> above for loop. Instead you have to use a proper >>>>>>>>>>> Iterator<RemoteState<T>> and use it's remove method. >>>>>>>>>>> >>>>>>>>>>> Cheers, >>>>>>>>>>> Peter >>>>>>>>>>> >>>>>>>>>>>> Jonathan Ledlie wrote: >>>>>>>>>>>>> On Nov 3, 2008, at 3:46 PM, Eric Chan-Tin wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Jonathan, >>>>>>>>>>>>>> >>>>>>>>>>>>>> I can insert some code to fix this and it doesn't look too >>>>>>>>>>>>>> hard >>>>>>>>>>>>>> but I do >>>>>>>>>>>>>> have some concerns. >>>>>>>>>>>>>> >>>>>>>>>>>>>>> I don't remember observing this behavior in our PL >>>>>>>>>>>>>>> deployment, but that could have been because we had the >>>>>>>>>>>>>>> ping >>>>>>>>>>>>>>> time set >>>>>>>>>>>>>>> to 1-2 minutes, IIRC. >>>>>>>>>>>>>> The default ping time was set to 10 seconds + some Gaussian >>>>>>>>>>>>>> distribution, >>>>>>>>>>>>>> so that's what I used, but changing the ping time to 1 >>>>>>>>>>>>>> minute >>>>>>>>>>>>>> or more >>>>>>>>>>>>>> will >>>>>>>>>>>>>> result in the same effect except it will take longer for the >>>>>>>>>>>>>> "cycle" to >>>>>>>>>>>>>> occur. >>>>>>>>>>>>>> >>>>>>>>>>>>>> What I have in mind was the following: >>>>>>>>>>>>>> >>>>>>>>>>>>>> 1) In NCClient.addNeighbor() would be changed to (in >>>>>>>>>>>>>> pseudocode) >>>>>>>>>>>>>> protected boolean addNeighbor(RemoteState<T> guy) { >>>>>>>>>>>>>> boolean added = false; >>>>>>>>>>>>>> //if neighbors.size() >= MAX_NEIGHBORS >>>>>>>>>>>>>> //Check last update time of each neighbor in >>>>>>>>>>>>>> neighbors >>>>>>>>>>>>>> //neighbor.getLastUpdateTime() >>>>>>>>>>>>>> //if any of them is older than some number >>>>>>>>>>>>>> TIMEOUT, >>>>>>>>>>>>>> remove the oldest one, then add "guy" to neighbors >>>>>>>>>>>>>> //else don't add "guy" to neighbors >>>>>>>>>>>>>> //else add "guy" to neighbors >>>>>>>>>>>>>> } >>>>>>>>>>>>>> >>>>>>>>>>>>> IMO this shouldn't be the behavior of this function because >>>>>>>>>>>>> it >>>>>>>>>>>>> is called >>>>>>>>>>>>> "addNeighbor." Instead what could happen is that part of the >>>>>>>>>>>>> performMaintenance() function could remove old neighbors. >>>>>>>>>>>>> >>>>>>>>>>>>>> 2) In NCManager, getGossipNode() would just pick a random >>>>>>>>>>>>>> node >>>>>>>>>>>>>> from >>>>>>>>>>>>>> localNC.primaryNC.getNeighbors() to send a request to >>>>>>>>>>>>>> every 10 >>>>>>>>>>>>>> seconds. >>>>>>>>>>>>>> >>>>>>>>>>>>>> 3) Pyxida will still keep track of all GossipNodes in the >>>>>>>>>>>>>> network. It >>>>>>>>>>>>>> could >>>>>>>>>>>>>> also (say every 10 minutes) check its NCClient.neighbors >>>>>>>>>>>>>> list >>>>>>>>>>>>>> and see if >>>>>>>>>>>>>> any neighbor is older than TIMEOUT, remove it, then pick a >>>>>>>>>>>>>> random >>>>>>>>>>>>>> GossipNode to send a ping request. >>>>>>>>>>>>>> >>>>>>>>>>>>>> The main problem with this scheme is that one Pyxida node >>>>>>>>>>>>>> would >>>>>>>>>>>>>> most >>>>>>>>>>>>>> likely >>>>>>>>>>>>>> stay with the same set of neighbors all the time. Its >>>>>>>>>>>>>> coordinates >>>>>>>>>>>>>> would be >>>>>>>>>>>>>> greatly affected by those 32 nodes (I am assuming >>>>>>>>>>>>>> MAX_NEIGHBORS >>>>>>>>>>>>>> = 32). >>>>>>>>>>>>>> >>>>>>>>>>>>> Instead of (2), why not have the getNeighborToPing function >>>>>>>>>>>>> determine >>>>>>>>>>>>> the behavior? If it returns a node, then ping it. If it >>>>>>>>>>>>> returns null, >>>>>>>>>>>>> then have the NCManager pick some other node to use? If >>>>>>>>>>>>> things are >>>>>>>>>>>>> hunky-dory, getNeighborToPing() would return a node (a >>>>>>>>>>>>> current >>>>>>>>>>>>> member of >>>>>>>>>>>>> the neighbor set), if we're short on neighbors, then it would >>>>>>>>>>>>> return >>>>>>>>>>>>> null. The periodic maintenance would have already tossed old >>>>>>>>>>>>> nodes, so >>>>>>>>>>>>> this would be self-sustaining. >>>>>>>>>>>>> >>>>>>>>>>>>> The "problem" you mention is fairly central to network >>>>>>>>>>>>> coordinates in >>>>>>>>>>>>> general. Sure the coordinates will be a little better the >>>>>>>>>>>>> more >>>>>>>>>>>>> neighbors you have. But in a very large system talking to >>>>>>>>>>>>> everybody >>>>>>>>>>>>> isn't feasible. Peter and I have a graph in one of the >>>>>>>>>>>>> workshop >>>>>>>>>>>>> papers >>>>>>>>>>>>> which shows the effect of increasing the number of neighbors >>>>>>>>>>>>> >>>>>>>>>>>>>> I don't think the network would be partitioned, but it could >>>>>>>>>>>>>> happen. For >>>>>>>>>>>>>> example, the boostrap nodes have each other in their >>>>>>>>>>>>>> neighbors >>>>>>>>>>>>>> list. >>>>>>>>>>>>>> Every >>>>>>>>>>>>>> new node bootstrapping to the network would contact those >>>>>>>>>>>>>> bootstrapping >>>>>>>>>>>>>> nodes first and add them to their neighbors. So every Pyxida >>>>>>>>>>>>>> node could >>>>>>>>>>>>>> have as its neighbors the "Bootstrap nodes", which would >>>>>>>>>>>>>> make >>>>>>>>>>>>>> Pyxida >>>>>>>>>>>>>> become >>>>>>>>>>>>>> a centralized system. >>>>>>>>>>>>>> >>>>>>>>>>>>> As long as the initial bootstrap list is large enough, I >>>>>>>>>>>>> think >>>>>>>>>>>>> you are >>>>>>>>>>>>> right, that partitions won't be a problem. >>>>>>>>>>>>> >>>>>>>>>>>>>> One last thing: When Pyxida receives a ping request from a >>>>>>>>>>>>>> node, it does >>>>>>>>>>>>>> not update its coordinates. Is there any particular reason >>>>>>>>>>>>>> for >>>>>>>>>>>>>> that? >>>>>>>>>>>>>> Is it >>>>>>>>>>>>>> because Pyxida will not be able to calculate the RTT? >>>>>>>>>>>>>> >>>>>>>>>>>>> Yes. >>>>>>>>>>>>> >>>>>>>>>>>>>> Also, there is no "gossiping" of other nodes' coordinates, >>>>>>>>>>>>>> right? So if A >>>>>>>>>>>>>> has B in its neighbors list, it will from time to time tell >>>>>>>>>>>>>> other nodes >>>>>>>>>>>>>> about B's coordinates. I realize this has a lot of >>>>>>>>>>>>>> complications but the >>>>>>>>>>>>>> original Vivaldi paper was very fuzzy about this point and >>>>>>>>>>>>>> the >>>>>>>>>>>>>> previous >>>>>>>>>>>>>> point that it seems it is up to the implementors to decide >>>>>>>>>>>>>> what >>>>>>>>>>>>>> to do. >>>>>>>>>>>>>> >>>>>>>>>>>>> I don't think we are doing this. It could be useful in >>>>>>>>>>>>> certain >>>>>>>>>>>>> circumstance (e.g., picking potential new neighbors), but >>>>>>>>>>>>> we're >>>>>>>>>>>>> not, IIRC. >>>>>>>>>>>>> >>>>>>>>>>>>>> I will have to think some more about this as there are a lot >>>>>>>>>>>>>> of >>>>>>>>>>>>>> nasty >>>>>>>>>>>>>> little side effects that could happen, but these are my >>>>>>>>>>>>>> thoughts. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> >>>>>>>>>>>>>> -Eric >>>>>>>>>>>>>> >>>>>>>>>>>>> Sounds good. >>>>>>>>>>>>> >>>>>>>>>>>>> -jonathan >>>>>>>>>>>>> >>>>>>>>>>>>>> On Nov 3 2008, Jonathan Ledlie wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Eric, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Based on a look at the code just now, I see where you are >>>>>>>>>>>>>>> coming from: >>>>>>>>>>>>>>> with these params on PL, there isn't anything stopping the >>>>>>>>>>>>>>> neighbor >>>>>>>>>>>>>>> list from circulating pretty quickly through all nodes. In >>>>>>>>>>>>>>> the >>>>>>>>>>>>>>> Azureus implementation, the guys fed to the alg (fed to >>>>>>>>>>>>>>> processSample) >>>>>>>>>>>>>>> weren't changing much because they were coming from the >>>>>>>>>>>>>>> routing finger >>>>>>>>>>>>>>> tables. I don't remember observing this behavior in our PL >>>>>>>>>>>>>>> deployment, but that could have been because we had the >>>>>>>>>>>>>>> ping >>>>>>>>>>>>>>> time set >>>>>>>>>>>>>>> to 1-2 minutes, IIRC. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Do you want to insert some code that will fix this? >>>>>>>>>>>>>>> Basically >>>>>>>>>>>>>>> we want >>>>>>>>>>>>>>> to only add a new neighbor if our oldest neighbor should be >>>>>>>>>>>>>>> tossed (or >>>>>>>>>>>>>>> if we haven't reached MAX_NEIGHBORS). It's probably best >>>>>>>>>>>>>>> from a >>>>>>>>>>>>>>> clarity standpoint if addNeighbor does continue to actually >>>>>>>>>>>>>>> add the >>>>>>>>>>>>>>> neighbor in all cases. Probably the best thing to do is to >>>>>>>>>>>>>>> have >>>>>>>>>>>>>>> NCClient drop old neighbors internally and have NCManager >>>>>>>>>>>>>>> ask >>>>>>>>>>>>>>> it which >>>>>>>>>>>>>>> nodes should be pinged via getNeighborToPing. So NCClient >>>>>>>>>>>>>>> would >>>>>>>>>>>>>>> continue to have the same behavior on processSample (it >>>>>>>>>>>>>>> would not >>>>>>>>>>>>>>> reject new information) and NCManager and NCClient would >>>>>>>>>>>>>>> cooperate to >>>>>>>>>>>>>>> measure RTTs to current neighbors. We do want to make >>>>>>>>>>>>>>> sure, >>>>>>>>>>>>>>> however, >>>>>>>>>>>>>>> that the gossip set stays large enough so that partitions >>>>>>>>>>>>>>> don't occur. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Do you want to be added to the SourceForge Pyxida >>>>>>>>>>>>>>> developers >>>>>>>>>>>>>>> to make >>>>>>>>>>>>>>> this change? (Please test it out carefully first.) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -jonathan >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Oct 30, 2008, at 11:32 AM, ext Eric Chan-Tin wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Jonathan, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> From what I saw, eventually, you learn about all the nodes >>>>>>>>>>>>>>>> in >>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>> network. Every 10 seconds, Pyxida picks a random "Gossip" >>>>>>>>>>>>>>>> node from >>>>>>>>>>>>>>>> its >>>>>>>>>>>>>>>> list and sends a request. The Gossip node will reply back >>>>>>>>>>>>>>>> with an >>>>>>>>>>>>>>>> "upNeighbor", that is, a "Gossip" node that is "up" or >>>>>>>>>>>>>>>> with 10% >>>>>>>>>>>>>>>> probability, a node that is "pending". Since naturally, >>>>>>>>>>>>>>>> all >>>>>>>>>>>>>>>> the nodes >>>>>>>>>>>>>>>> bootstrap off the same set of nodes, every node will >>>>>>>>>>>>>>>> eventually know >>>>>>>>>>>>>>>> of >>>>>>>>>>>>>>>> every other node. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Pyxida might not be rotating through everybody >>>>>>>>>>>>>>>> continuously >>>>>>>>>>>>>>>> (although >>>>>>>>>>>>>>>> eventually, since it knows everybody, it will have added >>>>>>>>>>>>>>>> everybody to >>>>>>>>>>>>>>>> its Neighbors list at some point), but could be removing a >>>>>>>>>>>>>>>> node that >>>>>>>>>>>>>>>> is >>>>>>>>>>>>>>>> "alive". So, if MAX_NEIGHBORS = 32, let's call them A_1, >>>>>>>>>>>>>>>> A_2, >>>>>>>>>>>>>>>> ..., >>>>>>>>>>>>>>>> A_32. >>>>>>>>>>>>>>>> Pyxida picks a random Gossip node B, sends it a request >>>>>>>>>>>>>>>> and B >>>>>>>>>>>>>>>> replies >>>>>>>>>>>>>>>> back. Pyxida will add B to its Neighbors list, see that >>>>>>>>>>>>>>>> size of >>>>>>>>>>>>>>>> Neighbors > MAX_NEIGHBORS, kick A_1 out. Then from those >>>>>>>>>>>>>>>> 32 >>>>>>>>>>>>>>>> neighbors >>>>>>>>>>>>>>>> A_2, ...., A_32, B, it will calculate the force, and apply >>>>>>>>>>>>>>>> it >>>>>>>>>>>>>>>> to the >>>>>>>>>>>>>>>> current system coordinates. Thus, if every 10 seconds, >>>>>>>>>>>>>>>> Pyxida >>>>>>>>>>>>>>>> picks a >>>>>>>>>>>>>>>> different set of nodes C, D, E, etc... it would keep >>>>>>>>>>>>>>>> kicking >>>>>>>>>>>>>>>> nodes out >>>>>>>>>>>>>>>> of the neighbors list. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> It's the "kicking A_1 out" part that seemed a bit >>>>>>>>>>>>>>>> confusing. >>>>>>>>>>>>>>>> Churn >>>>>>>>>>>>>>>> is a >>>>>>>>>>>>>>>> good reason but you don't check that A_1 might still be >>>>>>>>>>>>>>>> alive >>>>>>>>>>>>>>>> or maybe >>>>>>>>>>>>>>>> A_1 was just added. I have been printing the neighbors >>>>>>>>>>>>>>>> list >>>>>>>>>>>>>>>> every 10 >>>>>>>>>>>>>>>> minutes, and they are completely different between those >>>>>>>>>>>>>>>> 10 >>>>>>>>>>>>>>>> minutes. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> One possible problem might be that I am choosing the value >>>>>>>>>>>>>>>> of >>>>>>>>>>>>>>>> MAX_NEIGHBORS to be too small. Originally it was set to >>>>>>>>>>>>>>>> 512 >>>>>>>>>>>>>>>> (which was >>>>>>>>>>>>>>>> meant for Azureus). Since I am using Planetlab where there >>>>>>>>>>>>>>>> are about >>>>>>>>>>>>>>>> 400-500 "stable" nodes, I changed the value to 32. But I >>>>>>>>>>>>>>>> don't think >>>>>>>>>>>>>>>> that should be a problem since log(n) = log(512) = 9. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -Eric >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Jonathan Ledlie wrote: >>>>>>>>>>>>>>>>> Eric, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Glad to hear it. Let Peter and me know if you publish >>>>>>>>>>>>>>>>> anything using >>>>>>>>>>>>>>>>> Pyxida and we can put it on the web page. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> "Gossip" nodes are for learning about the network. >>>>>>>>>>>>>>>>> "Neighbors" are >>>>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>>> constructing the coordinate. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> So, IIRC, the gossip process learns about nodes that >>>>>>>>>>>>>>>>> may or >>>>>>>>>>>>>>>>> may not >>>>>>>>>>>>>>>>> be >>>>>>>>>>>>>>>>> applied to the neighbor list. That is, we may eventually >>>>>>>>>>>>>>>>> learn >>>>>>>>>>>>>>>>> about a >>>>>>>>>>>>>>>>> large portion of the network, via gossiping, but should >>>>>>>>>>>>>>>>> only >>>>>>>>>>>>>>>>> be using >>>>>>>>>>>>>>>>> MAX_NEIGHBORS at any given time for coordinate >>>>>>>>>>>>>>>>> computation. >>>>>>>>>>>>>>>>> The >>>>>>>>>>>>>>>>> reason >>>>>>>>>>>>>>>>> we don't want to use a fixed set of neighbors for >>>>>>>>>>>>>>>>> coordinate >>>>>>>>>>>>>>>>> computation >>>>>>>>>>>>>>>>> is because nodes may come and go from the system. >>>>>>>>>>>>>>>>> However, >>>>>>>>>>>>>>>>> it should >>>>>>>>>>>>>>>>> not be the case that we just rotate through everybody, >>>>>>>>>>>>>>>>> because >>>>>>>>>>>>>>>>> then, as >>>>>>>>>>>>>>>>> you note, we'd be computing against everyone's >>>>>>>>>>>>>>>>> coordinate/ >>>>>>>>>>>>>>>>> latency, >>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>> this is counter to the point of network coordinates. >>>>>>>>>>>>>>>>> With this >>>>>>>>>>>>>>>>> gradual >>>>>>>>>>>>>>>>> transience as a desirable quality in mind, I am pretty >>>>>>>>>>>>>>>>> sure we >>>>>>>>>>>>>>>>> maintain >>>>>>>>>>>>>>>>> neighbors in a fairly fixed manner. Does this not seem >>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>> be the >>>>>>>>>>>>>>>>> case >>>>>>>>>>>>>>>>> to you? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> -jonathan >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Oct 29, 2008, at 9:20 PM, Eric Chan-Tin wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Hi Jonathan and Peter, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> First of all, just wanted to say that Pyxida has been >>>>>>>>>>>>>>>>>> working >>>>>>>>>>>>>>>>>> great on >>>>>>>>>>>>>>>>>> Planetlab for the past 4 months or so. Getting good data >>>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>> experiments >>>>>>>>>>>>>>>>>> going well. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> However, I do have one question/comment. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Every Pyxida node contains a list of Neighbors (in >>>>>>>>>>>>>>>>>> NCClient) and a >>>>>>>>>>>>>>>>>> list >>>>>>>>>>>>>>>>>> of Gossip nodes (they are called neighbors in NCManager >>>>>>>>>>>>>>>>>> but >>>>>>>>>>>>>>>>>> I will >>>>>>>>>>>>>>>>>> call >>>>>>>>>>>>>>>>>> them gossip for now so as not to confuse with the 'real' >>>>>>>>>>>>>>>>>> neighbors >>>>>>>>>>>>>>>>>> in >>>>>>>>>>>>>>>>>> NCClient). Every 10 seconds, Pyxida will pick a random >>>>>>>>>>>>>>>>>> node >>>>>>>>>>>>>>>>>> (call >>>>>>>>>>>>>>>>>> it A) >>>>>>>>>>>>>>>>>> from Gossip and send a gossip request (basically asking >>>>>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>>>> rtt, >>>>>>>>>>>>>>>>>> coordinate, error). But Gossip contains all the nodes in >>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>> network. >>>>>>>>>>>>>>>>>> The comment in the code said that was to avoid Pyxida >>>>>>>>>>>>>>>>>> becoming >>>>>>>>>>>>>>>>>> lonely >>>>>>>>>>>>>>>>>> and no-one to gossip with, which makes sense. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Once A responds, Pyxida will add A to its Neighbors (if >>>>>>>>>>>>>>>>>> Neighbors >>>>>>>>>>>>>>>>>> does >>>>>>>>>>>>>>>>>> not contain A) and remove the first element from the >>>>>>>>>>>>>>>>>> Neighbors list. >>>>>>>>>>>>>>>>>> This seems a little bit odd. Was there any particular >>>>>>>>>>>>>>>>>> reasoning >>>>>>>>>>>>>>>>>> behind >>>>>>>>>>>>>>>>>> this? Because it would seem like you're getting the >>>>>>>>>>>>>>>>>> coordinate from >>>>>>>>>>>>>>>>>> every node in the network eventually. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thanks a lot, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> -Eric >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Jonathan Ledlie wrote: >>>>>>>>>>>>>>>>>>> Your inclination is correct. See forwarded message. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> -jonathan >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Apr 9, 2008, at 1:53 PM, ext Eric Chan-Tin wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Thank you for the clarification. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> I've been reading through your published papers and >>>>>>>>>>>>>>>>>>>> code >>>>>>>>>>>>>>>>>>>> and I >>>>>>>>>>>>>>>>>>>> must >>>>>>>>>>>>>>>>>>>> say I learned a lot about how Pyxida works :) >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> I still have a quick and easy (hopefully) question. >>>>>>>>>>>>>>>>>>>> Each >>>>>>>>>>>>>>>>>>>> Pyxida >>>>>>>>>>>>>>>>>>>> node >>>>>>>>>>>>>>>>>>>> has a small amount of neighbors and learn about other >>>>>>>>>>>>>>>>>>>> nodes >>>>>>>>>>>>>>>>>>>> through >>>>>>>>>>>>>>>>>>>> other nodes. I am running on about 200 machines, and >>>>>>>>>>>>>>>>>>>> it >>>>>>>>>>>>>>>>>>>> seems >>>>>>>>>>>>>>>>>>>> like all >>>>>>>>>>>>>>>>>>>> the Pyxida nodes are adding each other to their >>>>>>>>>>>>>>>>>>>> upNeighbors set. >>>>>>>>>>>>>>>>>>>> NCClient.java has the MAX_NEIGHBORS set to 512. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> I wanted to make sure that I am not misreading the >>>>>>>>>>>>>>>>>>>> code, but >>>>>>>>>>>>>>>>>>>> shouldn't >>>>>>>>>>>>>>>>>>>> each Pyxida node keep a small number of neighbors in >>>>>>>>>>>>>>>>>>>> its >>>>>>>>>>>>>>>>>>>> upNeighbors >>>>>>>>>>>>>>>>>>>> set. Else, it will be a lot of updates to send a >>>>>>>>>>>>>>>>>>>> gossipMessage >>>>>>>>>>>>>>>>>>>> to all >>>>>>>>>>>>>>>>>>>> 200 neighbors, and it also means that a Pyxida node >>>>>>>>>>>>>>>>>>>> has >>>>>>>>>>>>>>>>>>>> the whole >>>>>>>>>>>>>>>>>>>> network in its upNeighbors set - that seems wrong. I >>>>>>>>>>>>>>>>>>>> am >>>>>>>>>>>>>>>>>>>> inclined >>>>>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>>> change the value of MAX_NEIGHBORS but wanted to make >>>>>>>>>>>>>>>>>>>> sure >>>>>>>>>>>>>>>>>>>> that my >>>>>>>>>>>>>>>>>>>> understanding is correct first. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Thank you, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Eric >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> ------------------------------------------------------------------------- >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> This SF.Net email is sponsored by the Moblin Your Move >>>>>>>>>>>>>>>>>> Developer's >>>>>>>>>>>>>>>>>> challenge Build the coolest Linux based applications >>>>>>>>>>>>>>>>>> with >>>>>>>>>>>>>>>>>> Moblin >>>>>>>>>>>>>>>>>> SDK & >>>>>>>>>>>>>>>>>> win great prizes Grand prize is a trip for two to an >>>>>>>>>>>>>>>>>> Open >>>>>>>>>>>>>>>>>> Source >>>>>>>>>>>>>>>>>> event >>>>>>>>>>>>>>>>>> anywhere in the world >>>>>>>>>>>>>>>>>> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>> Pyxida-users mailing >>>>>>>>>>>>>>>>>> list Pyx...@li... >>>>>>>>>>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/pyxida- >>>>>>>>>>>>>>>>>> users >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> ------------------------------------------------------------------------- >>>>>>>>>>>>>> >>>>>>>>>>>>>> This SF.Net email is sponsored by the Moblin Your Move >>>>>>>>>>>>>> Developer's challenge Build the coolest Linux based >>>>>>>>>>>>>> applications with Moblin SDK & win great prizes Grand >>>>>>>>>>>>>> prize is >>>>>>>>>>>>>> a trip for two to an Open Source event anywhere in the world >>>>>>>>>>>>>> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >>>>>>>>>>>>>> _______________________________________________ Pyxida-users >>>>>>>>>>>>>> mailing list Pyx...@li... >>>>>>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/pyxida-users >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> ------------------------------------------------------------------------------ >>>>>>>>>>>> >>>>>>>>>>>> SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las >>>>>>>>>>>> Vegas, Nevada. The future of the web can't happen without you. >>>>>>>>>>>> Join us at MIX09 to help pave the way to the Next Web now. >>>>>>>>>>>> Learn >>>>>>>>>>>> more and register at >>>>>>>>>>>> http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/ >>>>>>>>>>>> >>>>>>>>>>>> _______________________________________________ Pyxida-users >>>>>>>>>>>> mailing list Pyx...@li... >>>>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/pyxida-users >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>> >>>> >>>> >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> >>>> SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, >>>> Nevada. The future of the web can't happen without you. Join us at >>>> MIX09 >>>> to help pave the way to the Next Web now. Learn more and register at >>>> http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/ >>>> >>>> _______________________________________________ Pyxida-users mailing >>>> list Pyx...@li... >>>> https://lists.sourceforge.net/lists/listinfo/pyxida-users >>> >> >> >> >> ------------------------------------------------------------------------------ >> >> SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, >> Nevada. >> The future of the web can't happen without you. Join us at MIX09 to help >> pave the way to the Next Web now. Learn more and register at >> http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/ >> >> _______________________________________________ >> Pyxida-users mailing list >> Pyx...@li... >> https://lists.sourceforge.net/lists/listinfo/pyxida-users |
From: Jonathan L. <jon...@ee...> - 2008-12-19 01:23:12
|
Eric, I'm getting an error (an old favorite): WA 1229649353501 NetUtil : Could not deserialize java object: java.io.StreamCorruptedException: invalid type code: CC WA 1229649353501 ObjComm$ReadConnHa: No valid msgObject received from remoteAddr=planet1.scs.stanford.edu:26500. Ignoring. What java version are you running on PL? -jonathan On Dec 18, 2008, at 4:49 PM, Eric Chan-Tin wrote: > Do you mean the Pyxida code running on PlanetLab? I killed them after > running it overnight. I am currently restarting the experiments. > Try again > in about 15 minutes and it should be ready. > > Fix1 is on port 28500 (and api.port 28501). Fix2 is on port 26500 > (api.port > 26501). The node you can contact is "planet1.scs.stanford.edu" or > "planetlab01.cs.washington.edu". > > About your other email: Yes I know. The fix was to go through the list > again, but I didn't want to implement that since it seems to be > working > fine. So go through the list once to calculate weight, and add > neighbors > into another list (sorted by weights). Then go through that new > list. The > new list will be sorted by higher weights, so older neighbors will be > "tested" first. If you want it that way, I can do that. However, I > do not > think we need to. Let's assume iteration 0, the first node gets > picked. At > iteration 1, that first node will have a very small probability of > getting > picked again. > > -Eric > > > On Dec 18 2008, Jonathan Ledlie wrote: > >> >> Eric, >> >> Do you have a PL Pyxida slice that I can connect to to try this out? >> >> That is, I'll run it on my machine, but connect to your slice via the >> right ports. >> >> -jonathan >> >> On Dec 18, 2008, at 11:35 AM, Eric Chan-Tin wrote: >> >>> Sorry for the delay. Was busy with final exams and projects. >>> >>> I made the changes. They can be found at >>> >>> http://www-users.cs.umn.edu/~dchantin/NCClient.java >>> http://www-users.cs.umn.edu/~dchantin/NCManager.java >>> http://www-users.cs.umn.edu/~dchantin/RemoteState.java >>> >>> All the changes are enclosed within >>> //ERIC START >>> ... >>> //ERIC END >>> >>> I compared every reported "neighbors list" of each Pyxida node >>> (about 370 >>> total) and see how many neighbors did not change within 10 minutes. >>> Then >>> did the average for all the nodes. >>> >>> Original: 43.37% of the neighbors stayed the same on average. >>> >>> Fix0 (before we found the interesting situation): 74.73% >>> >>> Fix1 (with the weight probability of picking a neighbor to ping): >>> 95.79% >>> >>> Fix2 (with the weight probability and the MIN_UPDATE_PING): 98.81% >>> >>> Also, the reported error, rrl, etc.. were much lower in the Fixed >>> Pyxida >>> than in the original one (this could not be significant, since the >>> Original >>> experiment was performed 1-2 months ago), but as expected, it was >>> not >>> higher. >>> >>> You can take a look at the code and test it if you want. If >>> everything >>> looks fine, I'll remove some of the debugging code and other >>> "useless" >>> comments (like //ERIC). Also, let me know which "fix" you prefer. I >>> will >>> then email you the updated code. >>> >>> Thanks, >>> >>> -Eric >>> >>> >>> On Dec 10 2008, Jonathan Ledlie wrote: >>> >>>> >>>> Seems reasonable. >>>> >>>> Both ideas (lastPingTime and a weighted probability based on age) >>>> seem >>>> to be independent. >>>> >>>> Some of the code is more CPU intensive than it should be (e.g., >>>> looping through potentially long lists), and that is a slight >>>> downside >>>> to the function approach vs a set, but this probably isn't a big >>>> deal. >>>> >>>> -jonathan >>>> >>>> On Dec 10, 2008, at 10:40 PM, Eric Chan-Tin wrote: >>>> >>>>> okay. I think I might have misunderstood you. Your idea of >>>>> lastPingTime >>>>> looks interesting. >>>>> >>>>> My idea is to introduce a probability function. Right now, it just >>>>> randomly picks one of the neighbors. What I am thinking it might >>>>> do is >>>>> that instead of just randomly picking a neighbor, pick a neighbor >>>>> based >>>>> on some function. So, the neighbor whose lastUpdateTime is oldest >>>>> will >>>>> have a higher chance of being picked than a neighbor who was just >>>>> updated. >>>>> >>>>> Does that make more sense? >>>>> >>>>> -Eric >>>>> >>>>> Jonathan Ledlie wrote: >>>>>> >>>>>> Hmm. The advantage of lastPingTime is that you learn whether or >>>>>> not >>>>>> you've tried to contact a neighbor and they haven't responded. >>>>>> Then you >>>>>> can use that knowledge to determine whether or not to toss them. >>>>>> >>>>>> Maybe I'm not understanding your suggestion then. >>>>>> >>>>>> -jonathan >>>>>> >>>>>> On Dec 10, 2008, at 12:51 PM, Eric Chan-Tin wrote: >>>>>> >>>>>>> That's a good idea. Here is a little twist on it. >>>>>>> >>>>>>> Put a "weight" on each neighbor, based on the lastUpdateTime. So >>>>>>> if >>>>>>> currentTime - neighbor's lastUpdateTime, then the weight is >>>>>>> higher. >>>>>>> Then in getNeighborToPing, neighbor's with higher weight have a >>>>>>> higher >>>>>>> probability of getting picked than neighbor's with lower weight. >>>>>>> >>>>>>> The weight could be ( (currentTime - lastUpdateTime) / 15 >>>>>>> minutes). >>>>>>> It's not 10 minutes so as to avoid a probability of 1 if the >>>>>>> lastUpdateTime was the previous performMaintenance. This should >>>>>>> also >>>>>>> avoid the problem of always pinging the oldest neighbor since >>>>>>> it's >>>>>>> about probability (weight). >>>>>>> >>>>>>> It seems a little bit less complicated than having lastPingTime >>>>>>> and >>>>>>> MIN_UPDATE_TIME. >>>>>>> >>>>>>> -Eric >>>>>>> >>>>>>> On Dec 10 2008, Jonathan Ledlie wrote: >>>>>>> >>>>>>>> >>>>>>>> This is an interesting situation. Let's think about the goal. >>>>>>>> The >>>>>>>> goal is to keep pinging the same nodes as long as they are >>>>>>>> alive, >>>>>>>> and to replace a node soon after it dies, keeping the total >>>>>>>> number >>>>>>>> of up neighbors at approximately MAX_NEIGHBORS. >>>>>>>> >>>>>>>> How about the following: >>>>>>>> - when you pick a neighbor to ping, you set a time on that >>>>>>>> event >>>>>>>> (in >>>>>>>> a new per-neighbor timestamp LastPingTime) >>>>>>>> - when you first insert a neighbor, set this timestamp to 0. >>>>>>>> - when you loop in performMaintenance, remove nodes whose >>>>>>>> LastPingTime - LastUpdateTime > threshold (like 10 minutes) >>>>>>>> >>>>>>>> - when picking a node, pick the guy with the oldest >>>>>>>> LastPingTime >>>>>>>> (not sure about this one -- I don't like situations where you >>>>>>>> pick >>>>>>>> the absolute *oldest* because then we could get in a situation >>>>>>>> where >>>>>>>> an old guy sits there, waiting to be kicked out, and we keep >>>>>>>> picking >>>>>>>> him to be pinged; I'd prefer to pick randomly, or random older >>>>>>>> guys). >>>>>>>> - a simple way around this is to simply have a self-adjusting >>>>>>>> threshold: so, we walk through the nodes, and everyone whose >>>>>>>> LastPingTime is older than MIN_UPDATE_TIME_TO_PING ago (a >>>>>>>> relative >>>>>>>> time) get added to a set; we pick a guy randomly from the >>>>>>>> set to >>>>>>>> ping. If the set is empty, we increase >>>>>>>> MIN_UPDATE_TIME_TO_PING, >>>>>>>> and >>>>>>>> wait for the next time the function is called. If the set >>>>>>>> is too >>>>>>>> large (relative to a fraction of MAX_NEIGHBORS), we decrease >>>>>>>> it. >>>>>>>> >>>>>>>> We still might get some churn from lost UDP packets. You might >>>>>>>> also >>>>>>>> give nodes a second chance to be pinged by including >>>>>>>> LastPingTime - >>>>>>>> LastUpdateTime > 1 minute in the set. Note that increases and >>>>>>>> decreases in MIN_UPDATE_TIME_TO_PING shouldn't be affected >>>>>>>> by the >>>>>>>> number of nodes that fall into this category. >>>>>>>> >>>>>>>> We still probably need a way for new potential neighbors to be >>>>>>>> rejected if our current set of neighbors is fine. Because I >>>>>>>> still >>>>>>>> want addNeigbor to really add the neighbor, how about putting a >>>>>>>> check before calling that to see if we need more neighbors. >>>>>>>> >>>>>>>> What thinks? >>>>>>>> >>>>>>>> -jonathan >>>>>>>> >>>>>>>> On Dec 9, 2008, at 11:37 PM, ext Eric Chan-Tin wrote: >>>>>>>> >>>>>>>>> Peter, >>>>>>>>> >>>>>>>>> Thanks. That fixed it. >>>>>>>>> >>>>>>>>> However, here is another problem I found while deploying >>>>>>>>> Pyxida >>>>>>>>> and >>>>>>>>> running them for a while. >>>>>>>>> >>>>>>>>> In performMaintenance(), I loop through every neighbor and if >>>>>>>>> (neighbor.getLastUpdateTime() < current_time - 10 minutes), >>>>>>>>> then I >>>>>>>>> remove that neighbor. >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> In NCManager, I call getNeighborToPing, which randomly >>>>>>>>> picks one >>>>>>>>> of >>>>>>>>> the >>>>>>>>> neighbors (32 in this case). Pinging is performed every 10 >>>>>>>>> seconds, so >>>>>>>>> in 10 minutes, you can ping 60 nodes. However, if a neighbor A >>>>>>>>> is not >>>>>>>>> among those 60 nodes, it will get kicked out in >>>>>>>>> performMaintenance(), >>>>>>>>> although A is still alive (it just hasn't been contacted yet). >>>>>>>>> >>>>>>>>> There are some possible options. >>>>>>>>> >>>>>>>>> 1) Increase the timeout period from 10 minutes to ?? minutes. >>>>>>>>> This >>>>>>>>> doesn't quite work because say in Azureus, the number of >>>>>>>>> neighbors is >>>>>>>>> 512, so the timeout needs to be increased even more... It >>>>>>>>> could >>>>>>>>> be a >>>>>>>>> function of MAX_NEIGHBORS... >>>>>>>>> >>>>>>>>> 2) When Pyxida replies to a ping request, it will update the >>>>>>>>> neighbor's >>>>>>>>> lastUpdateTime. This does not quite work either because it >>>>>>>>> assumes a >>>>>>>>> symmetric relationship which is most likely not the case. >>>>>>>>> >>>>>>>>> 3) Introduce some more logic to Pyxida so it remembers what >>>>>>>>> nodes it >>>>>>>>> contacted and does not send, say, more than 2 requests in 10 >>>>>>>>> minutes to >>>>>>>>> the same node. This solution isn't that great either. >>>>>>>>> >>>>>>>>> Let me know what you guys think. I am currently leaning >>>>>>>>> towards >>>>>>>>> option >>>>>>>>> 1) but make the timeout period a function of MAX_NEIGHBORS, >>>>>>>>> although I >>>>>>>>> haven't quite found the perfect function yet. >>>>>>>>> >>>>>>>>> Here are some preliminary results. I compared the number of >>>>>>>>> neighbors >>>>>>>>> which are the same at each reported time (10 minutes). Before >>>>>>>>> the fix, >>>>>>>>> more than 50% of the neighbors changed within 10 minutes. >>>>>>>>> After >>>>>>>>> the >>>>>>>>> fix, >>>>>>>>> only about 30% of the neighbors changed within 10 minutes. >>>>>>>>> It is >>>>>>>>> more >>>>>>>>> stable but I think 30% is still quite a big number. >>>>>>>>> >>>>>>>>> Sorry for the long email :) >>>>>>>>> >>>>>>>>> -Eric >>>>>>>>> >>>>>>>>> Peter Pietzuch wrote: >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> Eric Chan-Tin wrote: >>>>>>>>>>> for (RemoteState<T> neighbor : neighbors) { //this is line >>>>>>>>>>> 1029 >>>>>>>>>>> if (neighbor.getLastUpdateTime() < (curr_time - >>>>>>>>>>> MAINTENANCE_PERIOD)) { >>>>>>>>>>> neighbors.remove(neighbor); >>>>>>>>>>> } >>>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> IIRC you can't modify a collection while iterating over it >>>>>>>>>> using the >>>>>>>>>> above for loop. Instead you have to use a proper >>>>>>>>>> Iterator<RemoteState<T>> and use it's remove method. >>>>>>>>>> >>>>>>>>>> Cheers, >>>>>>>>>> Peter >>>>>>>>>> >>>>>>>>>>> Jonathan Ledlie wrote: >>>>>>>>>>>> On Nov 3, 2008, at 3:46 PM, Eric Chan-Tin wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Jonathan, >>>>>>>>>>>>> >>>>>>>>>>>>> I can insert some code to fix this and it doesn't look too >>>>>>>>>>>>> hard >>>>>>>>>>>>> but I do >>>>>>>>>>>>> have some concerns. >>>>>>>>>>>>> >>>>>>>>>>>>>> I don't remember observing this behavior in our PL >>>>>>>>>>>>>> deployment, but that could have been because we had the >>>>>>>>>>>>>> ping >>>>>>>>>>>>>> time set >>>>>>>>>>>>>> to 1-2 minutes, IIRC. >>>>>>>>>>>>> The default ping time was set to 10 seconds + some >>>>>>>>>>>>> Gaussian >>>>>>>>>>>>> distribution, >>>>>>>>>>>>> so that's what I used, but changing the ping time to 1 >>>>>>>>>>>>> minute >>>>>>>>>>>>> or more >>>>>>>>>>>>> will >>>>>>>>>>>>> result in the same effect except it will take longer >>>>>>>>>>>>> for the >>>>>>>>>>>>> "cycle" to >>>>>>>>>>>>> occur. >>>>>>>>>>>>> >>>>>>>>>>>>> What I have in mind was the following: >>>>>>>>>>>>> >>>>>>>>>>>>> 1) In NCClient.addNeighbor() would be changed to (in >>>>>>>>>>>>> pseudocode) >>>>>>>>>>>>> protected boolean addNeighbor(RemoteState<T> guy) { >>>>>>>>>>>>> boolean added = false; >>>>>>>>>>>>> //if neighbors.size() >= MAX_NEIGHBORS >>>>>>>>>>>>> //Check last update time of each neighbor in >>>>>>>>>>>>> neighbors >>>>>>>>>>>>> //neighbor.getLastUpdateTime() >>>>>>>>>>>>> //if any of them is older than some number >>>>>>>>>>>>> TIMEOUT, >>>>>>>>>>>>> remove the oldest one, then add "guy" to neighbors >>>>>>>>>>>>> //else don't add "guy" to neighbors >>>>>>>>>>>>> //else add "guy" to neighbors >>>>>>>>>>>>> } >>>>>>>>>>>>> >>>>>>>>>>>> IMO this shouldn't be the behavior of this function because >>>>>>>>>>>> it >>>>>>>>>>>> is called >>>>>>>>>>>> "addNeighbor." Instead what could happen is that part >>>>>>>>>>>> of the >>>>>>>>>>>> performMaintenance() function could remove old neighbors. >>>>>>>>>>>> >>>>>>>>>>>>> 2) In NCManager, getGossipNode() would just pick a random >>>>>>>>>>>>> node >>>>>>>>>>>>> from >>>>>>>>>>>>> localNC.primaryNC.getNeighbors() to send a request to >>>>>>>>>>>>> every 10 >>>>>>>>>>>>> seconds. >>>>>>>>>>>>> >>>>>>>>>>>>> 3) Pyxida will still keep track of all GossipNodes in the >>>>>>>>>>>>> network. It >>>>>>>>>>>>> could >>>>>>>>>>>>> also (say every 10 minutes) check its NCClient.neighbors >>>>>>>>>>>>> list >>>>>>>>>>>>> and see if >>>>>>>>>>>>> any neighbor is older than TIMEOUT, remove it, then pick a >>>>>>>>>>>>> random >>>>>>>>>>>>> GossipNode to send a ping request. >>>>>>>>>>>>> >>>>>>>>>>>>> The main problem with this scheme is that one Pyxida node >>>>>>>>>>>>> would >>>>>>>>>>>>> most >>>>>>>>>>>>> likely >>>>>>>>>>>>> stay with the same set of neighbors all the time. Its >>>>>>>>>>>>> coordinates >>>>>>>>>>>>> would be >>>>>>>>>>>>> greatly affected by those 32 nodes (I am assuming >>>>>>>>>>>>> MAX_NEIGHBORS >>>>>>>>>>>>> = 32). >>>>>>>>>>>>> >>>>>>>>>>>> Instead of (2), why not have the getNeighborToPing function >>>>>>>>>>>> determine >>>>>>>>>>>> the behavior? If it returns a node, then ping it. If it >>>>>>>>>>>> returns null, >>>>>>>>>>>> then have the NCManager pick some other node to use? If >>>>>>>>>>>> things are >>>>>>>>>>>> hunky-dory, getNeighborToPing() would return a node (a >>>>>>>>>>>> current >>>>>>>>>>>> member of >>>>>>>>>>>> the neighbor set), if we're short on neighbors, then it >>>>>>>>>>>> would >>>>>>>>>>>> return >>>>>>>>>>>> null. The periodic maintenance would have already >>>>>>>>>>>> tossed old >>>>>>>>>>>> nodes, so >>>>>>>>>>>> this would be self-sustaining. >>>>>>>>>>>> >>>>>>>>>>>> The "problem" you mention is fairly central to network >>>>>>>>>>>> coordinates in >>>>>>>>>>>> general. Sure the coordinates will be a little better the >>>>>>>>>>>> more >>>>>>>>>>>> neighbors you have. But in a very large system talking to >>>>>>>>>>>> everybody >>>>>>>>>>>> isn't feasible. Peter and I have a graph in one of the >>>>>>>>>>>> workshop >>>>>>>>>>>> papers >>>>>>>>>>>> which shows the effect of increasing the number of >>>>>>>>>>>> neighbors >>>>>>>>>>>> >>>>>>>>>>>>> I don't think the network would be partitioned, but it >>>>>>>>>>>>> could >>>>>>>>>>>>> happen. For >>>>>>>>>>>>> example, the boostrap nodes have each other in their >>>>>>>>>>>>> neighbors >>>>>>>>>>>>> list. >>>>>>>>>>>>> Every >>>>>>>>>>>>> new node bootstrapping to the network would contact those >>>>>>>>>>>>> bootstrapping >>>>>>>>>>>>> nodes first and add them to their neighbors. So every >>>>>>>>>>>>> Pyxida >>>>>>>>>>>>> node could >>>>>>>>>>>>> have as its neighbors the "Bootstrap nodes", which would >>>>>>>>>>>>> make >>>>>>>>>>>>> Pyxida >>>>>>>>>>>>> become >>>>>>>>>>>>> a centralized system. >>>>>>>>>>>>> >>>>>>>>>>>> As long as the initial bootstrap list is large enough, I >>>>>>>>>>>> think >>>>>>>>>>>> you are >>>>>>>>>>>> right, that partitions won't be a problem. >>>>>>>>>>>> >>>>>>>>>>>>> One last thing: When Pyxida receives a ping request from a >>>>>>>>>>>>> node, it does >>>>>>>>>>>>> not update its coordinates. Is there any particular reason >>>>>>>>>>>>> for >>>>>>>>>>>>> that? >>>>>>>>>>>>> Is it >>>>>>>>>>>>> because Pyxida will not be able to calculate the RTT? >>>>>>>>>>>>> >>>>>>>>>>>> Yes. >>>>>>>>>>>> >>>>>>>>>>>>> Also, there is no "gossiping" of other nodes' coordinates, >>>>>>>>>>>>> right? So if A >>>>>>>>>>>>> has B in its neighbors list, it will from time to time >>>>>>>>>>>>> tell >>>>>>>>>>>>> other nodes >>>>>>>>>>>>> about B's coordinates. I realize this has a lot of >>>>>>>>>>>>> complications but the >>>>>>>>>>>>> original Vivaldi paper was very fuzzy about this point and >>>>>>>>>>>>> the >>>>>>>>>>>>> previous >>>>>>>>>>>>> point that it seems it is up to the implementors to decide >>>>>>>>>>>>> what >>>>>>>>>>>>> to do. >>>>>>>>>>>>> >>>>>>>>>>>> I don't think we are doing this. It could be useful in >>>>>>>>>>>> certain >>>>>>>>>>>> circumstance (e.g., picking potential new neighbors), but >>>>>>>>>>>> we're >>>>>>>>>>>> not, IIRC. >>>>>>>>>>>> >>>>>>>>>>>>> I will have to think some more about this as there are >>>>>>>>>>>>> a lot >>>>>>>>>>>>> of >>>>>>>>>>>>> nasty >>>>>>>>>>>>> little side effects that could happen, but these are my >>>>>>>>>>>>> thoughts. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> >>>>>>>>>>>>> -Eric >>>>>>>>>>>>> >>>>>>>>>>>> Sounds good. >>>>>>>>>>>> >>>>>>>>>>>> -jonathan >>>>>>>>>>>> >>>>>>>>>>>>> On Nov 3 2008, Jonathan Ledlie wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Eric, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Based on a look at the code just now, I see where you are >>>>>>>>>>>>>> coming from: >>>>>>>>>>>>>> with these params on PL, there isn't anything stopping >>>>>>>>>>>>>> the >>>>>>>>>>>>>> neighbor >>>>>>>>>>>>>> list from circulating pretty quickly through all >>>>>>>>>>>>>> nodes. In >>>>>>>>>>>>>> the >>>>>>>>>>>>>> Azureus implementation, the guys fed to the alg (fed to >>>>>>>>>>>>>> processSample) >>>>>>>>>>>>>> weren't changing much because they were coming from the >>>>>>>>>>>>>> routing finger >>>>>>>>>>>>>> tables. I don't remember observing this behavior in >>>>>>>>>>>>>> our PL >>>>>>>>>>>>>> deployment, but that could have been because we had the >>>>>>>>>>>>>> ping >>>>>>>>>>>>>> time set >>>>>>>>>>>>>> to 1-2 minutes, IIRC. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Do you want to insert some code that will fix this? >>>>>>>>>>>>>> Basically >>>>>>>>>>>>>> we want >>>>>>>>>>>>>> to only add a new neighbor if our oldest neighbor >>>>>>>>>>>>>> should be >>>>>>>>>>>>>> tossed (or >>>>>>>>>>>>>> if we haven't reached MAX_NEIGHBORS). It's probably best >>>>>>>>>>>>>> from a >>>>>>>>>>>>>> clarity standpoint if addNeighbor does continue to >>>>>>>>>>>>>> actually >>>>>>>>>>>>>> add the >>>>>>>>>>>>>> neighbor in all cases. Probably the best thing to do >>>>>>>>>>>>>> is to >>>>>>>>>>>>>> have >>>>>>>>>>>>>> NCClient drop old neighbors internally and have NCManager >>>>>>>>>>>>>> ask >>>>>>>>>>>>>> it which >>>>>>>>>>>>>> nodes should be pinged via getNeighborToPing. So >>>>>>>>>>>>>> NCClient >>>>>>>>>>>>>> would >>>>>>>>>>>>>> continue to have the same behavior on processSample (it >>>>>>>>>>>>>> would not >>>>>>>>>>>>>> reject new information) and NCManager and NCClient would >>>>>>>>>>>>>> cooperate to >>>>>>>>>>>>>> measure RTTs to current neighbors. We do want to make >>>>>>>>>>>>>> sure, >>>>>>>>>>>>>> however, >>>>>>>>>>>>>> that the gossip set stays large enough so that partitions >>>>>>>>>>>>>> don't occur. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Do you want to be added to the SourceForge Pyxida >>>>>>>>>>>>>> developers >>>>>>>>>>>>>> to make >>>>>>>>>>>>>> this change? (Please test it out carefully first.) >>>>>>>>>>>>>> >>>>>>>>>>>>>> -jonathan >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Oct 30, 2008, at 11:32 AM, ext Eric Chan-Tin wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Jonathan, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> From what I saw, eventually, you learn about all the >>>>>>>>>>>>>>> nodes >>>>>>>>>>>>>>> in >>>>>>>>>>>>>>> the >>>>>>>>>>>>>>> network. Every 10 seconds, Pyxida picks a random >>>>>>>>>>>>>>> "Gossip" >>>>>>>>>>>>>>> node from >>>>>>>>>>>>>>> its >>>>>>>>>>>>>>> list and sends a request. The Gossip node will reply >>>>>>>>>>>>>>> back >>>>>>>>>>>>>>> with an >>>>>>>>>>>>>>> "upNeighbor", that is, a "Gossip" node that is "up" or >>>>>>>>>>>>>>> with 10% >>>>>>>>>>>>>>> probability, a node that is "pending". Since naturally, >>>>>>>>>>>>>>> all >>>>>>>>>>>>>>> the nodes >>>>>>>>>>>>>>> bootstrap off the same set of nodes, every node will >>>>>>>>>>>>>>> eventually know >>>>>>>>>>>>>>> of >>>>>>>>>>>>>>> every other node. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Pyxida might not be rotating through everybody >>>>>>>>>>>>>>> continuously >>>>>>>>>>>>>>> (although >>>>>>>>>>>>>>> eventually, since it knows everybody, it will have added >>>>>>>>>>>>>>> everybody to >>>>>>>>>>>>>>> its Neighbors list at some point), but could be >>>>>>>>>>>>>>> removing a >>>>>>>>>>>>>>> node that >>>>>>>>>>>>>>> is >>>>>>>>>>>>>>> "alive". So, if MAX_NEIGHBORS = 32, let's call them A_1, >>>>>>>>>>>>>>> A_2, >>>>>>>>>>>>>>> ..., >>>>>>>>>>>>>>> A_32. >>>>>>>>>>>>>>> Pyxida picks a random Gossip node B, sends it a request >>>>>>>>>>>>>>> and B >>>>>>>>>>>>>>> replies >>>>>>>>>>>>>>> back. Pyxida will add B to its Neighbors list, see that >>>>>>>>>>>>>>> size of >>>>>>>>>>>>>>> Neighbors > MAX_NEIGHBORS, kick A_1 out. Then from those >>>>>>>>>>>>>>> 32 >>>>>>>>>>>>>>> neighbors >>>>>>>>>>>>>>> A_2, ...., A_32, B, it will calculate the force, and >>>>>>>>>>>>>>> apply >>>>>>>>>>>>>>> it >>>>>>>>>>>>>>> to the >>>>>>>>>>>>>>> current system coordinates. Thus, if every 10 seconds, >>>>>>>>>>>>>>> Pyxida >>>>>>>>>>>>>>> picks a >>>>>>>>>>>>>>> different set of nodes C, D, E, etc... it would keep >>>>>>>>>>>>>>> kicking >>>>>>>>>>>>>>> nodes out >>>>>>>>>>>>>>> of the neighbors list. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> It's the "kicking A_1 out" part that seemed a bit >>>>>>>>>>>>>>> confusing. >>>>>>>>>>>>>>> Churn >>>>>>>>>>>>>>> is a >>>>>>>>>>>>>>> good reason but you don't check that A_1 might still be >>>>>>>>>>>>>>> alive >>>>>>>>>>>>>>> or maybe >>>>>>>>>>>>>>> A_1 was just added. I have been printing the neighbors >>>>>>>>>>>>>>> list >>>>>>>>>>>>>>> every 10 >>>>>>>>>>>>>>> minutes, and they are completely different between those >>>>>>>>>>>>>>> 10 >>>>>>>>>>>>>>> minutes. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> One possible problem might be that I am choosing the >>>>>>>>>>>>>>> value >>>>>>>>>>>>>>> of >>>>>>>>>>>>>>> MAX_NEIGHBORS to be too small. Originally it was set to >>>>>>>>>>>>>>> 512 >>>>>>>>>>>>>>> (which was >>>>>>>>>>>>>>> meant for Azureus). Since I am using Planetlab where >>>>>>>>>>>>>>> there >>>>>>>>>>>>>>> are about >>>>>>>>>>>>>>> 400-500 "stable" nodes, I changed the value to 32. But I >>>>>>>>>>>>>>> don't think >>>>>>>>>>>>>>> that should be a problem since log(n) = log(512) = 9. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -Eric >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Jonathan Ledlie wrote: >>>>>>>>>>>>>>>> Eric, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Glad to hear it. Let Peter and me know if you publish >>>>>>>>>>>>>>>> anything using >>>>>>>>>>>>>>>> Pyxida and we can put it on the web page. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> "Gossip" nodes are for learning about the network. >>>>>>>>>>>>>>>> "Neighbors" are >>>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>> constructing the coordinate. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> So, IIRC, the gossip process learns about nodes that >>>>>>>>>>>>>>>> may or >>>>>>>>>>>>>>>> may not >>>>>>>>>>>>>>>> be >>>>>>>>>>>>>>>> applied to the neighbor list. That is, we may >>>>>>>>>>>>>>>> eventually >>>>>>>>>>>>>>>> learn >>>>>>>>>>>>>>>> about a >>>>>>>>>>>>>>>> large portion of the network, via gossiping, but should >>>>>>>>>>>>>>>> only >>>>>>>>>>>>>>>> be using >>>>>>>>>>>>>>>> MAX_NEIGHBORS at any given time for coordinate >>>>>>>>>>>>>>>> computation. >>>>>>>>>>>>>>>> The >>>>>>>>>>>>>>>> reason >>>>>>>>>>>>>>>> we don't want to use a fixed set of neighbors for >>>>>>>>>>>>>>>> coordinate >>>>>>>>>>>>>>>> computation >>>>>>>>>>>>>>>> is because nodes may come and go from the system. >>>>>>>>>>>>>>>> However, >>>>>>>>>>>>>>>> it should >>>>>>>>>>>>>>>> not be the case that we just rotate through everybody, >>>>>>>>>>>>>>>> because >>>>>>>>>>>>>>>> then, as >>>>>>>>>>>>>>>> you note, we'd be computing against everyone's >>>>>>>>>>>>>>>> coordinate/ >>>>>>>>>>>>>>>> latency, >>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>> this is counter to the point of network coordinates. >>>>>>>>>>>>>>>> With this >>>>>>>>>>>>>>>> gradual >>>>>>>>>>>>>>>> transience as a desirable quality in mind, I am pretty >>>>>>>>>>>>>>>> sure we >>>>>>>>>>>>>>>> maintain >>>>>>>>>>>>>>>> neighbors in a fairly fixed manner. Does this not seem >>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>> be the >>>>>>>>>>>>>>>> case >>>>>>>>>>>>>>>> to you? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -jonathan >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Oct 29, 2008, at 9:20 PM, Eric Chan-Tin wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hi Jonathan and Peter, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> First of all, just wanted to say that Pyxida has been >>>>>>>>>>>>>>>>> working >>>>>>>>>>>>>>>>> great on >>>>>>>>>>>>>>>>> Planetlab for the past 4 months or so. Getting good >>>>>>>>>>>>>>>>> data >>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>> experiments >>>>>>>>>>>>>>>>> going well. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> However, I do have one question/comment. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Every Pyxida node contains a list of Neighbors (in >>>>>>>>>>>>>>>>> NCClient) and a >>>>>>>>>>>>>>>>> list >>>>>>>>>>>>>>>>> of Gossip nodes (they are called neighbors in >>>>>>>>>>>>>>>>> NCManager >>>>>>>>>>>>>>>>> but >>>>>>>>>>>>>>>>> I will >>>>>>>>>>>>>>>>> call >>>>>>>>>>>>>>>>> them gossip for now so as not to confuse with the >>>>>>>>>>>>>>>>> 'real' >>>>>>>>>>>>>>>>> neighbors >>>>>>>>>>>>>>>>> in >>>>>>>>>>>>>>>>> NCClient). Every 10 seconds, Pyxida will pick a random >>>>>>>>>>>>>>>>> node >>>>>>>>>>>>>>>>> (call >>>>>>>>>>>>>>>>> it A) >>>>>>>>>>>>>>>>> from Gossip and send a gossip request (basically >>>>>>>>>>>>>>>>> asking >>>>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>>> rtt, >>>>>>>>>>>>>>>>> coordinate, error). But Gossip contains all the >>>>>>>>>>>>>>>>> nodes in >>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>> network. >>>>>>>>>>>>>>>>> The comment in the code said that was to avoid Pyxida >>>>>>>>>>>>>>>>> becoming >>>>>>>>>>>>>>>>> lonely >>>>>>>>>>>>>>>>> and no-one to gossip with, which makes sense. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Once A responds, Pyxida will add A to its Neighbors >>>>>>>>>>>>>>>>> (if >>>>>>>>>>>>>>>>> Neighbors >>>>>>>>>>>>>>>>> does >>>>>>>>>>>>>>>>> not contain A) and remove the first element from the >>>>>>>>>>>>>>>>> Neighbors list. >>>>>>>>>>>>>>>>> This seems a little bit odd. Was there any particular >>>>>>>>>>>>>>>>> reasoning >>>>>>>>>>>>>>>>> behind >>>>>>>>>>>>>>>>> this? Because it would seem like you're getting the >>>>>>>>>>>>>>>>> coordinate from >>>>>>>>>>>>>>>>> every node in the network eventually. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks a lot, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> -Eric >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Jonathan Ledlie wrote: >>>>>>>>>>>>>>>>>> Your inclination is correct. See forwarded message. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> -jonathan >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Apr 9, 2008, at 1:53 PM, ext Eric Chan-Tin wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Thank you for the clarification. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I've been reading through your published papers and >>>>>>>>>>>>>>>>>>> code >>>>>>>>>>>>>>>>>>> and I >>>>>>>>>>>>>>>>>>> must >>>>>>>>>>>>>>>>>>> say I learned a lot about how Pyxida works :) >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I still have a quick and easy (hopefully) question. >>>>>>>>>>>>>>>>>>> Each >>>>>>>>>>>>>>>>>>> Pyxida >>>>>>>>>>>>>>>>>>> node >>>>>>>>>>>>>>>>>>> has a small amount of neighbors and learn about >>>>>>>>>>>>>>>>>>> other >>>>>>>>>>>>>>>>>>> nodes >>>>>>>>>>>>>>>>>>> through >>>>>>>>>>>>>>>>>>> other nodes. I am running on about 200 machines, and >>>>>>>>>>>>>>>>>>> it >>>>>>>>>>>>>>>>>>> seems >>>>>>>>>>>>>>>>>>> like all >>>>>>>>>>>>>>>>>>> the Pyxida nodes are adding each other to their >>>>>>>>>>>>>>>>>>> upNeighbors set. >>>>>>>>>>>>>>>>>>> NCClient.java has the MAX_NEIGHBORS set to 512. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I wanted to make sure that I am not misreading the >>>>>>>>>>>>>>>>>>> code, but >>>>>>>>>>>>>>>>>>> shouldn't >>>>>>>>>>>>>>>>>>> each Pyxida node keep a small number of neighbors in >>>>>>>>>>>>>>>>>>> its >>>>>>>>>>>>>>>>>>> upNeighbors >>>>>>>>>>>>>>>>>>> set. Else, it will be a lot of updates to send a >>>>>>>>>>>>>>>>>>> gossipMessage >>>>>>>>>>>>>>>>>>> to all >>>>>>>>>>>>>>>>>>> 200 neighbors, and it also means that a Pyxida node >>>>>>>>>>>>>>>>>>> has >>>>>>>>>>>>>>>>>>> the whole >>>>>>>>>>>>>>>>>>> network in its upNeighbors set - that seems wrong. I >>>>>>>>>>>>>>>>>>> am >>>>>>>>>>>>>>>>>>> inclined >>>>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>> change the value of MAX_NEIGHBORS but wanted to make >>>>>>>>>>>>>>>>>>> sure >>>>>>>>>>>>>>>>>>> that my >>>>>>>>>>>>>>>>>>> understanding is correct first. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Thank you, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Eric >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> ------------------------------------------------------ >>>>>>>>>>>>>>>>> ------------------- >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> This SF.Net email is sponsored by the Moblin Your Move >>>>>>>>>>>>>>>>> Developer's >>>>>>>>>>>>>>>>> challenge Build the coolest Linux based applications >>>>>>>>>>>>>>>>> with >>>>>>>>>>>>>>>>> Moblin >>>>>>>>>>>>>>>>> SDK & >>>>>>>>>>>>>>>>> win great prizes Grand prize is a trip for two to an >>>>>>>>>>>>>>>>> Open >>>>>>>>>>>>>>>>> Source >>>>>>>>>>>>>>>>> event >>>>>>>>>>>>>>>>> anywhere in the world >>>>>>>>>>>>>>>>> http://moblin-contest.org/redirect.php? >>>>>>>>>>>>>>>>> banner_id=100&url=/ >>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>> Pyxida-users mailing >>>>>>>>>>>>>>>>> list Pyx...@li... >>>>>>>>>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/pyxida- >>>>>>>>>>>>>>>>> users >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> ---------------------------------------------------------- >>>>>>>>>>>>> --------------- >>>>>>>>>>>>> This SF.Net email is sponsored by the Moblin Your Move >>>>>>>>>>>>> Developer's challenge Build the coolest Linux based >>>>>>>>>>>>> applications with Moblin SDK & win great prizes Grand >>>>>>>>>>>>> prize is >>>>>>>>>>>>> a trip for two to an Open Source event anywhere in the >>>>>>>>>>>>> world >>>>>>>>>>>>> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >>>>>>>>>>>>> _______________________________________________ Pyxida- >>>>>>>>>>>>> users >>>>>>>>>>>>> mailing list Pyx...@li... >>>>>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/pyxida-users >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> ------------------------------------------------------------ >>>>>>>>>>> ------------------ >>>>>>>>>>> SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las >>>>>>>>>>> Vegas, Nevada. The future of the web can't happen without >>>>>>>>>>> you. >>>>>>>>>>> Join us at MIX09 to help pave the way to the Next Web >>>>>>>>>>> now. Learn >>>>>>>>>>> more and register at >>>>>>>>>>> http://ad.doubleclick.net/clk;208669438;13503038;i?http:// >>>>>>>>>>> 2009.visitmix.com/ >>>>>>>>>>> _______________________________________________ Pyxida-users >>>>>>>>>>> mailing list Pyx...@li... >>>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/pyxida-users >>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>> >>> >>> >>> >>> >>> -------------------------------------------------------------------- >>> ---------- >>> SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, >>> Nevada. The future of the web can't happen without you. Join us >>> at MIX09 >>> to help pave the way to the Next Web now. Learn more and register at >>> http://ad.doubleclick.net/clk;208669438;13503038;i?http:// >>> 2009.visitmix.com/ >>> _______________________________________________ Pyxida-users mailing >>> list Pyx...@li... >>> https://lists.sourceforge.net/lists/listinfo/pyxida-users >> > > > > ---------------------------------------------------------------------- > -------- > SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, > Nevada. > The future of the web can't happen without you. Join us at MIX09 > to help > pave the way to the Next Web now. Learn more and register at > http://ad.doubleclick.net/clk;208669438;13503038;i?http:// > 2009.visitmix.com/ > _______________________________________________ > Pyxida-users mailing list > Pyx...@li... > https://lists.sourceforge.net/lists/listinfo/pyxida-users |
From: Eric Chan-T. <dch...@cs...> - 2008-12-19 01:02:32
|
Do you mean the Pyxida code running on PlanetLab? I killed them after running it overnight. I am currently restarting the experiments. Try again in about 15 minutes and it should be ready. Fix1 is on port 28500 (and api.port 28501). Fix2 is on port 26500 (api.port 26501). The node you can contact is "planet1.scs.stanford.edu" or "planetlab01.cs.washington.edu". About your other email: Yes I know. The fix was to go through the list again, but I didn't want to implement that since it seems to be working fine. So go through the list once to calculate weight, and add neighbors into another list (sorted by weights). Then go through that new list. The new list will be sorted by higher weights, so older neighbors will be "tested" first. If you want it that way, I can do that. However, I do not think we need to. Let's assume iteration 0, the first node gets picked. At iteration 1, that first node will have a very small probability of getting picked again. -Eric On Dec 18 2008, Jonathan Ledlie wrote: > >Eric, > >Do you have a PL Pyxida slice that I can connect to to try this out? > >That is, I'll run it on my machine, but connect to your slice via the >right ports. > >-jonathan > >On Dec 18, 2008, at 11:35 AM, Eric Chan-Tin wrote: > >> Sorry for the delay. Was busy with final exams and projects. >> >> I made the changes. They can be found at >> >> http://www-users.cs.umn.edu/~dchantin/NCClient.java >> http://www-users.cs.umn.edu/~dchantin/NCManager.java >> http://www-users.cs.umn.edu/~dchantin/RemoteState.java >> >> All the changes are enclosed within >> //ERIC START >> ... >> //ERIC END >> >> I compared every reported "neighbors list" of each Pyxida node >> (about 370 >> total) and see how many neighbors did not change within 10 minutes. >> Then >> did the average for all the nodes. >> >> Original: 43.37% of the neighbors stayed the same on average. >> >> Fix0 (before we found the interesting situation): 74.73% >> >> Fix1 (with the weight probability of picking a neighbor to ping): >> 95.79% >> >> Fix2 (with the weight probability and the MIN_UPDATE_PING): 98.81% >> >> Also, the reported error, rrl, etc.. were much lower in the Fixed >> Pyxida >> than in the original one (this could not be significant, since the >> Original >> experiment was performed 1-2 months ago), but as expected, it was not >> higher. >> >> You can take a look at the code and test it if you want. If everything >> looks fine, I'll remove some of the debugging code and other "useless" >> comments (like //ERIC). Also, let me know which "fix" you prefer. I >> will >> then email you the updated code. >> >> Thanks, >> >> -Eric >> >> >> On Dec 10 2008, Jonathan Ledlie wrote: >> >>> >>> Seems reasonable. >>> >>> Both ideas (lastPingTime and a weighted probability based on age) >>> seem >>> to be independent. >>> >>> Some of the code is more CPU intensive than it should be (e.g., >>> looping through potentially long lists), and that is a slight >>> downside >>> to the function approach vs a set, but this probably isn't a big >>> deal. >>> >>> -jonathan >>> >>> On Dec 10, 2008, at 10:40 PM, Eric Chan-Tin wrote: >>> >>>> okay. I think I might have misunderstood you. Your idea of >>>> lastPingTime >>>> looks interesting. >>>> >>>> My idea is to introduce a probability function. Right now, it just >>>> randomly picks one of the neighbors. What I am thinking it might >>>> do is >>>> that instead of just randomly picking a neighbor, pick a neighbor >>>> based >>>> on some function. So, the neighbor whose lastUpdateTime is oldest >>>> will >>>> have a higher chance of being picked than a neighbor who was just >>>> updated. >>>> >>>> Does that make more sense? >>>> >>>> -Eric >>>> >>>> Jonathan Ledlie wrote: >>>>> >>>>> Hmm. The advantage of lastPingTime is that you learn whether or >>>>> not >>>>> you've tried to contact a neighbor and they haven't responded. >>>>> Then you >>>>> can use that knowledge to determine whether or not to toss them. >>>>> >>>>> Maybe I'm not understanding your suggestion then. >>>>> >>>>> -jonathan >>>>> >>>>> On Dec 10, 2008, at 12:51 PM, Eric Chan-Tin wrote: >>>>> >>>>>> That's a good idea. Here is a little twist on it. >>>>>> >>>>>> Put a "weight" on each neighbor, based on the lastUpdateTime. So >>>>>> if >>>>>> currentTime - neighbor's lastUpdateTime, then the weight is >>>>>> higher. >>>>>> Then in getNeighborToPing, neighbor's with higher weight have a >>>>>> higher >>>>>> probability of getting picked than neighbor's with lower weight. >>>>>> >>>>>> The weight could be ( (currentTime - lastUpdateTime) / 15 >>>>>> minutes). >>>>>> It's not 10 minutes so as to avoid a probability of 1 if the >>>>>> lastUpdateTime was the previous performMaintenance. This should >>>>>> also >>>>>> avoid the problem of always pinging the oldest neighbor since it's >>>>>> about probability (weight). >>>>>> >>>>>> It seems a little bit less complicated than having lastPingTime >>>>>> and >>>>>> MIN_UPDATE_TIME. >>>>>> >>>>>> -Eric >>>>>> >>>>>> On Dec 10 2008, Jonathan Ledlie wrote: >>>>>> >>>>>>> >>>>>>> This is an interesting situation. Let's think about the goal. >>>>>>> The >>>>>>> goal is to keep pinging the same nodes as long as they are alive, >>>>>>> and to replace a node soon after it dies, keeping the total >>>>>>> number >>>>>>> of up neighbors at approximately MAX_NEIGHBORS. >>>>>>> >>>>>>> How about the following: >>>>>>> - when you pick a neighbor to ping, you set a time on that event >>>>>>> (in >>>>>>> a new per-neighbor timestamp LastPingTime) >>>>>>> - when you first insert a neighbor, set this timestamp to 0. >>>>>>> - when you loop in performMaintenance, remove nodes whose >>>>>>> LastPingTime - LastUpdateTime > threshold (like 10 minutes) >>>>>>> >>>>>>> - when picking a node, pick the guy with the oldest LastPingTime >>>>>>> (not sure about this one -- I don't like situations where you >>>>>>> pick >>>>>>> the absolute *oldest* because then we could get in a situation >>>>>>> where >>>>>>> an old guy sits there, waiting to be kicked out, and we keep >>>>>>> picking >>>>>>> him to be pinged; I'd prefer to pick randomly, or random older >>>>>>> guys). >>>>>>> - a simple way around this is to simply have a self-adjusting >>>>>>> threshold: so, we walk through the nodes, and everyone whose >>>>>>> LastPingTime is older than MIN_UPDATE_TIME_TO_PING ago (a >>>>>>> relative >>>>>>> time) get added to a set; we pick a guy randomly from the set to >>>>>>> ping. If the set is empty, we increase MIN_UPDATE_TIME_TO_PING, >>>>>>> and >>>>>>> wait for the next time the function is called. If the set is too >>>>>>> large (relative to a fraction of MAX_NEIGHBORS), we decrease it. >>>>>>> >>>>>>> We still might get some churn from lost UDP packets. You might >>>>>>> also >>>>>>> give nodes a second chance to be pinged by including >>>>>>> LastPingTime - >>>>>>> LastUpdateTime > 1 minute in the set. Note that increases and >>>>>>> decreases in MIN_UPDATE_TIME_TO_PING shouldn't be affected by the >>>>>>> number of nodes that fall into this category. >>>>>>> >>>>>>> We still probably need a way for new potential neighbors to be >>>>>>> rejected if our current set of neighbors is fine. Because I >>>>>>> still >>>>>>> want addNeigbor to really add the neighbor, how about putting a >>>>>>> check before calling that to see if we need more neighbors. >>>>>>> >>>>>>> What thinks? >>>>>>> >>>>>>> -jonathan >>>>>>> >>>>>>> On Dec 9, 2008, at 11:37 PM, ext Eric Chan-Tin wrote: >>>>>>> >>>>>>>> Peter, >>>>>>>> >>>>>>>> Thanks. That fixed it. >>>>>>>> >>>>>>>> However, here is another problem I found while deploying Pyxida >>>>>>>> and >>>>>>>> running them for a while. >>>>>>>> >>>>>>>> In performMaintenance(), I loop through every neighbor and if >>>>>>>> (neighbor.getLastUpdateTime() < current_time - 10 minutes), >>>>>>>> then I >>>>>>>> remove that neighbor. >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> In NCManager, I call getNeighborToPing, which randomly picks one >>>>>>>> of >>>>>>>> the >>>>>>>> neighbors (32 in this case). Pinging is performed every 10 >>>>>>>> seconds, so >>>>>>>> in 10 minutes, you can ping 60 nodes. However, if a neighbor A >>>>>>>> is not >>>>>>>> among those 60 nodes, it will get kicked out in >>>>>>>> performMaintenance(), >>>>>>>> although A is still alive (it just hasn't been contacted yet). >>>>>>>> >>>>>>>> There are some possible options. >>>>>>>> >>>>>>>> 1) Increase the timeout period from 10 minutes to ?? minutes. >>>>>>>> This >>>>>>>> doesn't quite work because say in Azureus, the number of >>>>>>>> neighbors is >>>>>>>> 512, so the timeout needs to be increased even more... It could >>>>>>>> be a >>>>>>>> function of MAX_NEIGHBORS... >>>>>>>> >>>>>>>> 2) When Pyxida replies to a ping request, it will update the >>>>>>>> neighbor's >>>>>>>> lastUpdateTime. This does not quite work either because it >>>>>>>> assumes a >>>>>>>> symmetric relationship which is most likely not the case. >>>>>>>> >>>>>>>> 3) Introduce some more logic to Pyxida so it remembers what >>>>>>>> nodes it >>>>>>>> contacted and does not send, say, more than 2 requests in 10 >>>>>>>> minutes to >>>>>>>> the same node. This solution isn't that great either. >>>>>>>> >>>>>>>> Let me know what you guys think. I am currently leaning towards >>>>>>>> option >>>>>>>> 1) but make the timeout period a function of MAX_NEIGHBORS, >>>>>>>> although I >>>>>>>> haven't quite found the perfect function yet. >>>>>>>> >>>>>>>> Here are some preliminary results. I compared the number of >>>>>>>> neighbors >>>>>>>> which are the same at each reported time (10 minutes). Before >>>>>>>> the fix, >>>>>>>> more than 50% of the neighbors changed within 10 minutes. After >>>>>>>> the >>>>>>>> fix, >>>>>>>> only about 30% of the neighbors changed within 10 minutes. It is >>>>>>>> more >>>>>>>> stable but I think 30% is still quite a big number. >>>>>>>> >>>>>>>> Sorry for the long email :) >>>>>>>> >>>>>>>> -Eric >>>>>>>> >>>>>>>> Peter Pietzuch wrote: >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> Eric Chan-Tin wrote: >>>>>>>>>> for (RemoteState<T> neighbor : neighbors) { //this is line >>>>>>>>>> 1029 >>>>>>>>>> if (neighbor.getLastUpdateTime() < (curr_time - >>>>>>>>>> MAINTENANCE_PERIOD)) { >>>>>>>>>> neighbors.remove(neighbor); >>>>>>>>>> } >>>>>>>>>> } >>>>>>>>> >>>>>>>>> IIRC you can't modify a collection while iterating over it >>>>>>>>> using the >>>>>>>>> above for loop. Instead you have to use a proper >>>>>>>>> Iterator<RemoteState<T>> and use it's remove method. >>>>>>>>> >>>>>>>>> Cheers, >>>>>>>>> Peter >>>>>>>>> >>>>>>>>>> Jonathan Ledlie wrote: >>>>>>>>>>> On Nov 3, 2008, at 3:46 PM, Eric Chan-Tin wrote: >>>>>>>>>>> >>>>>>>>>>>> Jonathan, >>>>>>>>>>>> >>>>>>>>>>>> I can insert some code to fix this and it doesn't look too >>>>>>>>>>>> hard >>>>>>>>>>>> but I do >>>>>>>>>>>> have some concerns. >>>>>>>>>>>> >>>>>>>>>>>>> I don't remember observing this behavior in our PL >>>>>>>>>>>>> deployment, but that could have been because we had the >>>>>>>>>>>>> ping >>>>>>>>>>>>> time set >>>>>>>>>>>>> to 1-2 minutes, IIRC. >>>>>>>>>>>> The default ping time was set to 10 seconds + some Gaussian >>>>>>>>>>>> distribution, >>>>>>>>>>>> so that's what I used, but changing the ping time to 1 >>>>>>>>>>>> minute >>>>>>>>>>>> or more >>>>>>>>>>>> will >>>>>>>>>>>> result in the same effect except it will take longer for the >>>>>>>>>>>> "cycle" to >>>>>>>>>>>> occur. >>>>>>>>>>>> >>>>>>>>>>>> What I have in mind was the following: >>>>>>>>>>>> >>>>>>>>>>>> 1) In NCClient.addNeighbor() would be changed to (in >>>>>>>>>>>> pseudocode) >>>>>>>>>>>> protected boolean addNeighbor(RemoteState<T> guy) { >>>>>>>>>>>> boolean added = false; >>>>>>>>>>>> //if neighbors.size() >= MAX_NEIGHBORS >>>>>>>>>>>> //Check last update time of each neighbor in >>>>>>>>>>>> neighbors >>>>>>>>>>>> //neighbor.getLastUpdateTime() >>>>>>>>>>>> //if any of them is older than some number >>>>>>>>>>>> TIMEOUT, >>>>>>>>>>>> remove the oldest one, then add "guy" to neighbors >>>>>>>>>>>> //else don't add "guy" to neighbors >>>>>>>>>>>> //else add "guy" to neighbors >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>> IMO this shouldn't be the behavior of this function because >>>>>>>>>>> it >>>>>>>>>>> is called >>>>>>>>>>> "addNeighbor." Instead what could happen is that part of the >>>>>>>>>>> performMaintenance() function could remove old neighbors. >>>>>>>>>>> >>>>>>>>>>>> 2) In NCManager, getGossipNode() would just pick a random >>>>>>>>>>>> node >>>>>>>>>>>> from >>>>>>>>>>>> localNC.primaryNC.getNeighbors() to send a request to >>>>>>>>>>>> every 10 >>>>>>>>>>>> seconds. >>>>>>>>>>>> >>>>>>>>>>>> 3) Pyxida will still keep track of all GossipNodes in the >>>>>>>>>>>> network. It >>>>>>>>>>>> could >>>>>>>>>>>> also (say every 10 minutes) check its NCClient.neighbors >>>>>>>>>>>> list >>>>>>>>>>>> and see if >>>>>>>>>>>> any neighbor is older than TIMEOUT, remove it, then pick a >>>>>>>>>>>> random >>>>>>>>>>>> GossipNode to send a ping request. >>>>>>>>>>>> >>>>>>>>>>>> The main problem with this scheme is that one Pyxida node >>>>>>>>>>>> would >>>>>>>>>>>> most >>>>>>>>>>>> likely >>>>>>>>>>>> stay with the same set of neighbors all the time. Its >>>>>>>>>>>> coordinates >>>>>>>>>>>> would be >>>>>>>>>>>> greatly affected by those 32 nodes (I am assuming >>>>>>>>>>>> MAX_NEIGHBORS >>>>>>>>>>>> = 32). >>>>>>>>>>>> >>>>>>>>>>> Instead of (2), why not have the getNeighborToPing function >>>>>>>>>>> determine >>>>>>>>>>> the behavior? If it returns a node, then ping it. If it >>>>>>>>>>> returns null, >>>>>>>>>>> then have the NCManager pick some other node to use? If >>>>>>>>>>> things are >>>>>>>>>>> hunky-dory, getNeighborToPing() would return a node (a >>>>>>>>>>> current >>>>>>>>>>> member of >>>>>>>>>>> the neighbor set), if we're short on neighbors, then it would >>>>>>>>>>> return >>>>>>>>>>> null. The periodic maintenance would have already tossed old >>>>>>>>>>> nodes, so >>>>>>>>>>> this would be self-sustaining. >>>>>>>>>>> >>>>>>>>>>> The "problem" you mention is fairly central to network >>>>>>>>>>> coordinates in >>>>>>>>>>> general. Sure the coordinates will be a little better the >>>>>>>>>>> more >>>>>>>>>>> neighbors you have. But in a very large system talking to >>>>>>>>>>> everybody >>>>>>>>>>> isn't feasible. Peter and I have a graph in one of the >>>>>>>>>>> workshop >>>>>>>>>>> papers >>>>>>>>>>> which shows the effect of increasing the number of neighbors >>>>>>>>>>> >>>>>>>>>>>> I don't think the network would be partitioned, but it could >>>>>>>>>>>> happen. For >>>>>>>>>>>> example, the boostrap nodes have each other in their >>>>>>>>>>>> neighbors >>>>>>>>>>>> list. >>>>>>>>>>>> Every >>>>>>>>>>>> new node bootstrapping to the network would contact those >>>>>>>>>>>> bootstrapping >>>>>>>>>>>> nodes first and add them to their neighbors. So every Pyxida >>>>>>>>>>>> node could >>>>>>>>>>>> have as its neighbors the "Bootstrap nodes", which would >>>>>>>>>>>> make >>>>>>>>>>>> Pyxida >>>>>>>>>>>> become >>>>>>>>>>>> a centralized system. >>>>>>>>>>>> >>>>>>>>>>> As long as the initial bootstrap list is large enough, I >>>>>>>>>>> think >>>>>>>>>>> you are >>>>>>>>>>> right, that partitions won't be a problem. >>>>>>>>>>> >>>>>>>>>>>> One last thing: When Pyxida receives a ping request from a >>>>>>>>>>>> node, it does >>>>>>>>>>>> not update its coordinates. Is there any particular reason >>>>>>>>>>>> for >>>>>>>>>>>> that? >>>>>>>>>>>> Is it >>>>>>>>>>>> because Pyxida will not be able to calculate the RTT? >>>>>>>>>>>> >>>>>>>>>>> Yes. >>>>>>>>>>> >>>>>>>>>>>> Also, there is no "gossiping" of other nodes' coordinates, >>>>>>>>>>>> right? So if A >>>>>>>>>>>> has B in its neighbors list, it will from time to time tell >>>>>>>>>>>> other nodes >>>>>>>>>>>> about B's coordinates. I realize this has a lot of >>>>>>>>>>>> complications but the >>>>>>>>>>>> original Vivaldi paper was very fuzzy about this point and >>>>>>>>>>>> the >>>>>>>>>>>> previous >>>>>>>>>>>> point that it seems it is up to the implementors to decide >>>>>>>>>>>> what >>>>>>>>>>>> to do. >>>>>>>>>>>> >>>>>>>>>>> I don't think we are doing this. It could be useful in >>>>>>>>>>> certain >>>>>>>>>>> circumstance (e.g., picking potential new neighbors), but >>>>>>>>>>> we're >>>>>>>>>>> not, IIRC. >>>>>>>>>>> >>>>>>>>>>>> I will have to think some more about this as there are a lot >>>>>>>>>>>> of >>>>>>>>>>>> nasty >>>>>>>>>>>> little side effects that could happen, but these are my >>>>>>>>>>>> thoughts. >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> >>>>>>>>>>>> -Eric >>>>>>>>>>>> >>>>>>>>>>> Sounds good. >>>>>>>>>>> >>>>>>>>>>> -jonathan >>>>>>>>>>> >>>>>>>>>>>> On Nov 3 2008, Jonathan Ledlie wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Eric, >>>>>>>>>>>>> >>>>>>>>>>>>> Based on a look at the code just now, I see where you are >>>>>>>>>>>>> coming from: >>>>>>>>>>>>> with these params on PL, there isn't anything stopping the >>>>>>>>>>>>> neighbor >>>>>>>>>>>>> list from circulating pretty quickly through all nodes. In >>>>>>>>>>>>> the >>>>>>>>>>>>> Azureus implementation, the guys fed to the alg (fed to >>>>>>>>>>>>> processSample) >>>>>>>>>>>>> weren't changing much because they were coming from the >>>>>>>>>>>>> routing finger >>>>>>>>>>>>> tables. I don't remember observing this behavior in our PL >>>>>>>>>>>>> deployment, but that could have been because we had the >>>>>>>>>>>>> ping >>>>>>>>>>>>> time set >>>>>>>>>>>>> to 1-2 minutes, IIRC. >>>>>>>>>>>>> >>>>>>>>>>>>> Do you want to insert some code that will fix this? >>>>>>>>>>>>> Basically >>>>>>>>>>>>> we want >>>>>>>>>>>>> to only add a new neighbor if our oldest neighbor should be >>>>>>>>>>>>> tossed (or >>>>>>>>>>>>> if we haven't reached MAX_NEIGHBORS). It's probably best >>>>>>>>>>>>> from a >>>>>>>>>>>>> clarity standpoint if addNeighbor does continue to actually >>>>>>>>>>>>> add the >>>>>>>>>>>>> neighbor in all cases. Probably the best thing to do is to >>>>>>>>>>>>> have >>>>>>>>>>>>> NCClient drop old neighbors internally and have NCManager >>>>>>>>>>>>> ask >>>>>>>>>>>>> it which >>>>>>>>>>>>> nodes should be pinged via getNeighborToPing. So NCClient >>>>>>>>>>>>> would >>>>>>>>>>>>> continue to have the same behavior on processSample (it >>>>>>>>>>>>> would not >>>>>>>>>>>>> reject new information) and NCManager and NCClient would >>>>>>>>>>>>> cooperate to >>>>>>>>>>>>> measure RTTs to current neighbors. We do want to make >>>>>>>>>>>>> sure, >>>>>>>>>>>>> however, >>>>>>>>>>>>> that the gossip set stays large enough so that partitions >>>>>>>>>>>>> don't occur. >>>>>>>>>>>>> >>>>>>>>>>>>> Do you want to be added to the SourceForge Pyxida >>>>>>>>>>>>> developers >>>>>>>>>>>>> to make >>>>>>>>>>>>> this change? (Please test it out carefully first.) >>>>>>>>>>>>> >>>>>>>>>>>>> -jonathan >>>>>>>>>>>>> >>>>>>>>>>>>> On Oct 30, 2008, at 11:32 AM, ext Eric Chan-Tin wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Jonathan, >>>>>>>>>>>>>> >>>>>>>>>>>>>> From what I saw, eventually, you learn about all the nodes >>>>>>>>>>>>>> in >>>>>>>>>>>>>> the >>>>>>>>>>>>>> network. Every 10 seconds, Pyxida picks a random "Gossip" >>>>>>>>>>>>>> node from >>>>>>>>>>>>>> its >>>>>>>>>>>>>> list and sends a request. The Gossip node will reply back >>>>>>>>>>>>>> with an >>>>>>>>>>>>>> "upNeighbor", that is, a "Gossip" node that is "up" or >>>>>>>>>>>>>> with 10% >>>>>>>>>>>>>> probability, a node that is "pending". Since naturally, >>>>>>>>>>>>>> all >>>>>>>>>>>>>> the nodes >>>>>>>>>>>>>> bootstrap off the same set of nodes, every node will >>>>>>>>>>>>>> eventually know >>>>>>>>>>>>>> of >>>>>>>>>>>>>> every other node. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Pyxida might not be rotating through everybody >>>>>>>>>>>>>> continuously >>>>>>>>>>>>>> (although >>>>>>>>>>>>>> eventually, since it knows everybody, it will have added >>>>>>>>>>>>>> everybody to >>>>>>>>>>>>>> its Neighbors list at some point), but could be removing a >>>>>>>>>>>>>> node that >>>>>>>>>>>>>> is >>>>>>>>>>>>>> "alive". So, if MAX_NEIGHBORS = 32, let's call them A_1, >>>>>>>>>>>>>> A_2, >>>>>>>>>>>>>> ..., >>>>>>>>>>>>>> A_32. >>>>>>>>>>>>>> Pyxida picks a random Gossip node B, sends it a request >>>>>>>>>>>>>> and B >>>>>>>>>>>>>> replies >>>>>>>>>>>>>> back. Pyxida will add B to its Neighbors list, see that >>>>>>>>>>>>>> size of >>>>>>>>>>>>>> Neighbors > MAX_NEIGHBORS, kick A_1 out. Then from those >>>>>>>>>>>>>> 32 >>>>>>>>>>>>>> neighbors >>>>>>>>>>>>>> A_2, ...., A_32, B, it will calculate the force, and apply >>>>>>>>>>>>>> it >>>>>>>>>>>>>> to the >>>>>>>>>>>>>> current system coordinates. Thus, if every 10 seconds, >>>>>>>>>>>>>> Pyxida >>>>>>>>>>>>>> picks a >>>>>>>>>>>>>> different set of nodes C, D, E, etc... it would keep >>>>>>>>>>>>>> kicking >>>>>>>>>>>>>> nodes out >>>>>>>>>>>>>> of the neighbors list. >>>>>>>>>>>>>> >>>>>>>>>>>>>> It's the "kicking A_1 out" part that seemed a bit >>>>>>>>>>>>>> confusing. >>>>>>>>>>>>>> Churn >>>>>>>>>>>>>> is a >>>>>>>>>>>>>> good reason but you don't check that A_1 might still be >>>>>>>>>>>>>> alive >>>>>>>>>>>>>> or maybe >>>>>>>>>>>>>> A_1 was just added. I have been printing the neighbors >>>>>>>>>>>>>> list >>>>>>>>>>>>>> every 10 >>>>>>>>>>>>>> minutes, and they are completely different between those >>>>>>>>>>>>>> 10 >>>>>>>>>>>>>> minutes. >>>>>>>>>>>>>> >>>>>>>>>>>>>> One possible problem might be that I am choosing the value >>>>>>>>>>>>>> of >>>>>>>>>>>>>> MAX_NEIGHBORS to be too small. Originally it was set to >>>>>>>>>>>>>> 512 >>>>>>>>>>>>>> (which was >>>>>>>>>>>>>> meant for Azureus). Since I am using Planetlab where there >>>>>>>>>>>>>> are about >>>>>>>>>>>>>> 400-500 "stable" nodes, I changed the value to 32. But I >>>>>>>>>>>>>> don't think >>>>>>>>>>>>>> that should be a problem since log(n) = log(512) = 9. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> >>>>>>>>>>>>>> -Eric >>>>>>>>>>>>>> >>>>>>>>>>>>>> Jonathan Ledlie wrote: >>>>>>>>>>>>>>> Eric, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Glad to hear it. Let Peter and me know if you publish >>>>>>>>>>>>>>> anything using >>>>>>>>>>>>>>> Pyxida and we can put it on the web page. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> "Gossip" nodes are for learning about the network. >>>>>>>>>>>>>>> "Neighbors" are >>>>>>>>>>>>>>> for >>>>>>>>>>>>>>> constructing the coordinate. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> So, IIRC, the gossip process learns about nodes that >>>>>>>>>>>>>>> may or >>>>>>>>>>>>>>> may not >>>>>>>>>>>>>>> be >>>>>>>>>>>>>>> applied to the neighbor list. That is, we may eventually >>>>>>>>>>>>>>> learn >>>>>>>>>>>>>>> about a >>>>>>>>>>>>>>> large portion of the network, via gossiping, but should >>>>>>>>>>>>>>> only >>>>>>>>>>>>>>> be using >>>>>>>>>>>>>>> MAX_NEIGHBORS at any given time for coordinate >>>>>>>>>>>>>>> computation. >>>>>>>>>>>>>>> The >>>>>>>>>>>>>>> reason >>>>>>>>>>>>>>> we don't want to use a fixed set of neighbors for >>>>>>>>>>>>>>> coordinate >>>>>>>>>>>>>>> computation >>>>>>>>>>>>>>> is because nodes may come and go from the system. >>>>>>>>>>>>>>> However, >>>>>>>>>>>>>>> it should >>>>>>>>>>>>>>> not be the case that we just rotate through everybody, >>>>>>>>>>>>>>> because >>>>>>>>>>>>>>> then, as >>>>>>>>>>>>>>> you note, we'd be computing against everyone's >>>>>>>>>>>>>>> coordinate/ >>>>>>>>>>>>>>> latency, >>>>>>>>>>>>>>> and >>>>>>>>>>>>>>> this is counter to the point of network coordinates. >>>>>>>>>>>>>>> With this >>>>>>>>>>>>>>> gradual >>>>>>>>>>>>>>> transience as a desirable quality in mind, I am pretty >>>>>>>>>>>>>>> sure we >>>>>>>>>>>>>>> maintain >>>>>>>>>>>>>>> neighbors in a fairly fixed manner. Does this not seem >>>>>>>>>>>>>>> to >>>>>>>>>>>>>>> be the >>>>>>>>>>>>>>> case >>>>>>>>>>>>>>> to you? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -jonathan >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Oct 29, 2008, at 9:20 PM, Eric Chan-Tin wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi Jonathan and Peter, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> First of all, just wanted to say that Pyxida has been >>>>>>>>>>>>>>>> working >>>>>>>>>>>>>>>> great on >>>>>>>>>>>>>>>> Planetlab for the past 4 months or so. Getting good data >>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>> experiments >>>>>>>>>>>>>>>> going well. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> However, I do have one question/comment. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Every Pyxida node contains a list of Neighbors (in >>>>>>>>>>>>>>>> NCClient) and a >>>>>>>>>>>>>>>> list >>>>>>>>>>>>>>>> of Gossip nodes (they are called neighbors in NCManager >>>>>>>>>>>>>>>> but >>>>>>>>>>>>>>>> I will >>>>>>>>>>>>>>>> call >>>>>>>>>>>>>>>> them gossip for now so as not to confuse with the 'real' >>>>>>>>>>>>>>>> neighbors >>>>>>>>>>>>>>>> in >>>>>>>>>>>>>>>> NCClient). Every 10 seconds, Pyxida will pick a random >>>>>>>>>>>>>>>> node >>>>>>>>>>>>>>>> (call >>>>>>>>>>>>>>>> it A) >>>>>>>>>>>>>>>> from Gossip and send a gossip request (basically asking >>>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>> rtt, >>>>>>>>>>>>>>>> coordinate, error). But Gossip contains all the nodes in >>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>> network. >>>>>>>>>>>>>>>> The comment in the code said that was to avoid Pyxida >>>>>>>>>>>>>>>> becoming >>>>>>>>>>>>>>>> lonely >>>>>>>>>>>>>>>> and no-one to gossip with, which makes sense. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Once A responds, Pyxida will add A to its Neighbors (if >>>>>>>>>>>>>>>> Neighbors >>>>>>>>>>>>>>>> does >>>>>>>>>>>>>>>> not contain A) and remove the first element from the >>>>>>>>>>>>>>>> Neighbors list. >>>>>>>>>>>>>>>> This seems a little bit odd. Was there any particular >>>>>>>>>>>>>>>> reasoning >>>>>>>>>>>>>>>> behind >>>>>>>>>>>>>>>> this? Because it would seem like you're getting the >>>>>>>>>>>>>>>> coordinate from >>>>>>>>>>>>>>>> every node in the network eventually. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks a lot, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -Eric >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Jonathan Ledlie wrote: >>>>>>>>>>>>>>>>> Your inclination is correct. See forwarded message. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> -jonathan >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Apr 9, 2008, at 1:53 PM, ext Eric Chan-Tin wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thank you for the clarification. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I've been reading through your published papers and >>>>>>>>>>>>>>>>>> code >>>>>>>>>>>>>>>>>> and I >>>>>>>>>>>>>>>>>> must >>>>>>>>>>>>>>>>>> say I learned a lot about how Pyxida works :) >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I still have a quick and easy (hopefully) question. >>>>>>>>>>>>>>>>>> Each >>>>>>>>>>>>>>>>>> Pyxida >>>>>>>>>>>>>>>>>> node >>>>>>>>>>>>>>>>>> has a small amount of neighbors and learn about other >>>>>>>>>>>>>>>>>> nodes >>>>>>>>>>>>>>>>>> through >>>>>>>>>>>>>>>>>> other nodes. I am running on about 200 machines, and >>>>>>>>>>>>>>>>>> it >>>>>>>>>>>>>>>>>> seems >>>>>>>>>>>>>>>>>> like all >>>>>>>>>>>>>>>>>> the Pyxida nodes are adding each other to their >>>>>>>>>>>>>>>>>> upNeighbors set. >>>>>>>>>>>>>>>>>> NCClient.java has the MAX_NEIGHBORS set to 512. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I wanted to make sure that I am not misreading the >>>>>>>>>>>>>>>>>> code, but >>>>>>>>>>>>>>>>>> shouldn't >>>>>>>>>>>>>>>>>> each Pyxida node keep a small number of neighbors in >>>>>>>>>>>>>>>>>> its >>>>>>>>>>>>>>>>>> upNeighbors >>>>>>>>>>>>>>>>>> set. Else, it will be a lot of updates to send a >>>>>>>>>>>>>>>>>> gossipMessage >>>>>>>>>>>>>>>>>> to all >>>>>>>>>>>>>>>>>> 200 neighbors, and it also means that a Pyxida node >>>>>>>>>>>>>>>>>> has >>>>>>>>>>>>>>>>>> the whole >>>>>>>>>>>>>>>>>> network in its upNeighbors set - that seems wrong. I >>>>>>>>>>>>>>>>>> am >>>>>>>>>>>>>>>>>> inclined >>>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>> change the value of MAX_NEIGHBORS but wanted to make >>>>>>>>>>>>>>>>>> sure >>>>>>>>>>>>>>>>>> that my >>>>>>>>>>>>>>>>>> understanding is correct first. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thank you, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Eric >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> ------------------------------------------------------------------------- >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> This SF.Net email is sponsored by the Moblin Your Move >>>>>>>>>>>>>>>> Developer's >>>>>>>>>>>>>>>> challenge Build the coolest Linux based applications >>>>>>>>>>>>>>>> with >>>>>>>>>>>>>>>> Moblin >>>>>>>>>>>>>>>> SDK & >>>>>>>>>>>>>>>> win great prizes Grand prize is a trip for two to an >>>>>>>>>>>>>>>> Open >>>>>>>>>>>>>>>> Source >>>>>>>>>>>>>>>> event >>>>>>>>>>>>>>>> anywhere in the world >>>>>>>>>>>>>>>> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>> Pyxida-users mailing >>>>>>>>>>>>>>>> list Pyx...@li... >>>>>>>>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/pyxida- >>>>>>>>>>>>>>>> users >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> ------------------------------------------------------------------------- >>>>>>>>>>>> This SF.Net email is sponsored by the Moblin Your Move >>>>>>>>>>>> Developer's challenge Build the coolest Linux based >>>>>>>>>>>> applications with Moblin SDK & win great prizes Grand prize is >>>>>>>>>>>> a trip for two to an Open Source event anywhere in the world >>>>>>>>>>>> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >>>>>>>>>>>> _______________________________________________ Pyxida-users >>>>>>>>>>>> mailing list Pyx...@li... >>>>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/pyxida-users >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> ------------------------------------------------------------------------------ >>>>>>>>>> SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las >>>>>>>>>> Vegas, Nevada. The future of the web can't happen without you. >>>>>>>>>> Join us at MIX09 to help pave the way to the Next Web now. Learn >>>>>>>>>> more and register at >>>>>>>>>> http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/ >>>>>>>>>> _______________________________________________ Pyxida-users >>>>>>>>>> mailing list Pyx...@li... >>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/pyxida-users >>>>>>>>> >>>>>>>>> >>>>>>> >>> >> >> >> >> >> ------------------------------------------------------------------------------ >> SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, >> Nevada. The future of the web can't happen without you. Join us at MIX09 >> to help pave the way to the Next Web now. Learn more and register at >> http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/ >> _______________________________________________ Pyxida-users mailing >> list Pyx...@li... >> https://lists.sourceforge.net/lists/listinfo/pyxida-users > |
From: Jonathan L. <jon...@ee...> - 2008-12-18 23:08:49
|
Eric, I've checked in the new code. It's completely untested, but hopefully I can connect to your slice tomorrow and try it out. -jonathan On Dec 18, 2008, at 3:51 PM, Jonathan Ledlie wrote: > > I could be wrong, but I think the way you've written getNeighborToPing > is biased toward elements earlier in the neighbors set. This is > because you traverse neighbor set in the same order each time. > > I'll see if I can code up a fix. > > -jonathan > > On Dec 18, 2008, at 11:35 AM, Eric Chan-Tin wrote: > >> Sorry for the delay. Was busy with final exams and projects. >> >> I made the changes. They can be found at >> >> http://www-users.cs.umn.edu/~dchantin/NCClient.java >> http://www-users.cs.umn.edu/~dchantin/NCManager.java >> http://www-users.cs.umn.edu/~dchantin/RemoteState.java >> >> All the changes are enclosed within >> //ERIC START >> ... >> //ERIC END >> >> I compared every reported "neighbors list" of each Pyxida node >> (about 370 >> total) and see how many neighbors did not change within 10 minutes. >> Then >> did the average for all the nodes. >> >> Original: 43.37% of the neighbors stayed the same on average. >> >> Fix0 (before we found the interesting situation): 74.73% >> >> Fix1 (with the weight probability of picking a neighbor to ping): >> 95.79% >> >> Fix2 (with the weight probability and the MIN_UPDATE_PING): 98.81% >> >> Also, the reported error, rrl, etc.. were much lower in the Fixed >> Pyxida >> than in the original one (this could not be significant, since the >> Original >> experiment was performed 1-2 months ago), but as expected, it was not >> higher. >> >> You can take a look at the code and test it if you want. If >> everything >> looks fine, I'll remove some of the debugging code and other >> "useless" >> comments (like //ERIC). Also, let me know which "fix" you prefer. I >> will >> then email you the updated code. >> >> Thanks, >> >> -Eric >> >> >> On Dec 10 2008, Jonathan Ledlie wrote: >> >>> >>> Seems reasonable. >>> >>> Both ideas (lastPingTime and a weighted probability based on age) >>> seem >>> to be independent. >>> >>> Some of the code is more CPU intensive than it should be (e.g., >>> looping through potentially long lists), and that is a slight >>> downside >>> to the function approach vs a set, but this probably isn't a big >>> deal. >>> >>> -jonathan >>> >>> On Dec 10, 2008, at 10:40 PM, Eric Chan-Tin wrote: >>> >>>> okay. I think I might have misunderstood you. Your idea of >>>> lastPingTime >>>> looks interesting. >>>> >>>> My idea is to introduce a probability function. Right now, it just >>>> randomly picks one of the neighbors. What I am thinking it might >>>> do is >>>> that instead of just randomly picking a neighbor, pick a neighbor >>>> based >>>> on some function. So, the neighbor whose lastUpdateTime is oldest >>>> will >>>> have a higher chance of being picked than a neighbor who was just >>>> updated. >>>> >>>> Does that make more sense? >>>> >>>> -Eric >>>> >>>> Jonathan Ledlie wrote: >>>>> >>>>> Hmm. The advantage of lastPingTime is that you learn whether or >>>>> not >>>>> you've tried to contact a neighbor and they haven't responded. >>>>> Then you >>>>> can use that knowledge to determine whether or not to toss them. >>>>> >>>>> Maybe I'm not understanding your suggestion then. >>>>> >>>>> -jonathan >>>>> >>>>> On Dec 10, 2008, at 12:51 PM, Eric Chan-Tin wrote: >>>>> >>>>>> That's a good idea. Here is a little twist on it. >>>>>> >>>>>> Put a "weight" on each neighbor, based on the lastUpdateTime. So >>>>>> if >>>>>> currentTime - neighbor's lastUpdateTime, then the weight is >>>>>> higher. >>>>>> Then in getNeighborToPing, neighbor's with higher weight have a >>>>>> higher >>>>>> probability of getting picked than neighbor's with lower weight. >>>>>> >>>>>> The weight could be ( (currentTime - lastUpdateTime) / 15 >>>>>> minutes). >>>>>> It's not 10 minutes so as to avoid a probability of 1 if the >>>>>> lastUpdateTime was the previous performMaintenance. This should >>>>>> also >>>>>> avoid the problem of always pinging the oldest neighbor since >>>>>> it's >>>>>> about probability (weight). >>>>>> >>>>>> It seems a little bit less complicated than having lastPingTime >>>>>> and >>>>>> MIN_UPDATE_TIME. >>>>>> >>>>>> -Eric >>>>>> >>>>>> On Dec 10 2008, Jonathan Ledlie wrote: >>>>>> >>>>>>> >>>>>>> This is an interesting situation. Let's think about the goal. >>>>>>> The >>>>>>> goal is to keep pinging the same nodes as long as they are >>>>>>> alive, >>>>>>> and to replace a node soon after it dies, keeping the total >>>>>>> number >>>>>>> of up neighbors at approximately MAX_NEIGHBORS. >>>>>>> >>>>>>> How about the following: >>>>>>> - when you pick a neighbor to ping, you set a time on that event >>>>>>> (in >>>>>>> a new per-neighbor timestamp LastPingTime) >>>>>>> - when you first insert a neighbor, set this timestamp to 0. >>>>>>> - when you loop in performMaintenance, remove nodes whose >>>>>>> LastPingTime - LastUpdateTime > threshold (like 10 minutes) >>>>>>> >>>>>>> - when picking a node, pick the guy with the oldest LastPingTime >>>>>>> (not sure about this one -- I don't like situations where you >>>>>>> pick >>>>>>> the absolute *oldest* because then we could get in a situation >>>>>>> where >>>>>>> an old guy sits there, waiting to be kicked out, and we keep >>>>>>> picking >>>>>>> him to be pinged; I'd prefer to pick randomly, or random older >>>>>>> guys). >>>>>>> - a simple way around this is to simply have a self-adjusting >>>>>>> threshold: so, we walk through the nodes, and everyone whose >>>>>>> LastPingTime is older than MIN_UPDATE_TIME_TO_PING ago (a >>>>>>> relative >>>>>>> time) get added to a set; we pick a guy randomly from the set to >>>>>>> ping. If the set is empty, we increase MIN_UPDATE_TIME_TO_PING, >>>>>>> and >>>>>>> wait for the next time the function is called. If the set is >>>>>>> too >>>>>>> large (relative to a fraction of MAX_NEIGHBORS), we decrease it. >>>>>>> >>>>>>> We still might get some churn from lost UDP packets. You might >>>>>>> also >>>>>>> give nodes a second chance to be pinged by including >>>>>>> LastPingTime - >>>>>>> LastUpdateTime > 1 minute in the set. Note that increases and >>>>>>> decreases in MIN_UPDATE_TIME_TO_PING shouldn't be affected by >>>>>>> the >>>>>>> number of nodes that fall into this category. >>>>>>> >>>>>>> We still probably need a way for new potential neighbors to be >>>>>>> rejected if our current set of neighbors is fine. Because I >>>>>>> still >>>>>>> want addNeigbor to really add the neighbor, how about putting a >>>>>>> check before calling that to see if we need more neighbors. >>>>>>> >>>>>>> What thinks? >>>>>>> >>>>>>> -jonathan >>>>>>> >>>>>>> On Dec 9, 2008, at 11:37 PM, ext Eric Chan-Tin wrote: >>>>>>> >>>>>>>> Peter, >>>>>>>> >>>>>>>> Thanks. That fixed it. >>>>>>>> >>>>>>>> However, here is another problem I found while deploying Pyxida >>>>>>>> and >>>>>>>> running them for a while. >>>>>>>> >>>>>>>> In performMaintenance(), I loop through every neighbor and if >>>>>>>> (neighbor.getLastUpdateTime() < current_time - 10 minutes), >>>>>>>> then I >>>>>>>> remove that neighbor. >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> In NCManager, I call getNeighborToPing, which randomly picks >>>>>>>> one >>>>>>>> of >>>>>>>> the >>>>>>>> neighbors (32 in this case). Pinging is performed every 10 >>>>>>>> seconds, so >>>>>>>> in 10 minutes, you can ping 60 nodes. However, if a neighbor A >>>>>>>> is not >>>>>>>> among those 60 nodes, it will get kicked out in >>>>>>>> performMaintenance(), >>>>>>>> although A is still alive (it just hasn't been contacted yet). >>>>>>>> >>>>>>>> There are some possible options. >>>>>>>> >>>>>>>> 1) Increase the timeout period from 10 minutes to ?? minutes. >>>>>>>> This >>>>>>>> doesn't quite work because say in Azureus, the number of >>>>>>>> neighbors is >>>>>>>> 512, so the timeout needs to be increased even more... It could >>>>>>>> be a >>>>>>>> function of MAX_NEIGHBORS... >>>>>>>> >>>>>>>> 2) When Pyxida replies to a ping request, it will update the >>>>>>>> neighbor's >>>>>>>> lastUpdateTime. This does not quite work either because it >>>>>>>> assumes a >>>>>>>> symmetric relationship which is most likely not the case. >>>>>>>> >>>>>>>> 3) Introduce some more logic to Pyxida so it remembers what >>>>>>>> nodes it >>>>>>>> contacted and does not send, say, more than 2 requests in 10 >>>>>>>> minutes to >>>>>>>> the same node. This solution isn't that great either. >>>>>>>> >>>>>>>> Let me know what you guys think. I am currently leaning towards >>>>>>>> option >>>>>>>> 1) but make the timeout period a function of MAX_NEIGHBORS, >>>>>>>> although I >>>>>>>> haven't quite found the perfect function yet. >>>>>>>> >>>>>>>> Here are some preliminary results. I compared the number of >>>>>>>> neighbors >>>>>>>> which are the same at each reported time (10 minutes). Before >>>>>>>> the fix, >>>>>>>> more than 50% of the neighbors changed within 10 minutes. After >>>>>>>> the >>>>>>>> fix, >>>>>>>> only about 30% of the neighbors changed within 10 minutes. It >>>>>>>> is >>>>>>>> more >>>>>>>> stable but I think 30% is still quite a big number. >>>>>>>> >>>>>>>> Sorry for the long email :) >>>>>>>> >>>>>>>> -Eric >>>>>>>> >>>>>>>> Peter Pietzuch wrote: >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> Eric Chan-Tin wrote: >>>>>>>>>> for (RemoteState<T> neighbor : neighbors) { //this is line >>>>>>>>>> 1029 >>>>>>>>>> if (neighbor.getLastUpdateTime() < (curr_time - >>>>>>>>>> MAINTENANCE_PERIOD)) { >>>>>>>>>> neighbors.remove(neighbor); >>>>>>>>>> } >>>>>>>>>> } >>>>>>>>> >>>>>>>>> IIRC you can't modify a collection while iterating over it >>>>>>>>> using the >>>>>>>>> above for loop. Instead you have to use a proper >>>>>>>>> Iterator<RemoteState<T>> and use it's remove method. >>>>>>>>> >>>>>>>>> Cheers, >>>>>>>>> Peter >>>>>>>>> >>>>>>>>>> Jonathan Ledlie wrote: >>>>>>>>>>> On Nov 3, 2008, at 3:46 PM, Eric Chan-Tin wrote: >>>>>>>>>>> >>>>>>>>>>>> Jonathan, >>>>>>>>>>>> >>>>>>>>>>>> I can insert some code to fix this and it doesn't look too >>>>>>>>>>>> hard >>>>>>>>>>>> but I do >>>>>>>>>>>> have some concerns. >>>>>>>>>>>> >>>>>>>>>>>>> I don't remember observing this behavior in our PL >>>>>>>>>>>>> deployment, but that could have been because we had the >>>>>>>>>>>>> ping >>>>>>>>>>>>> time set >>>>>>>>>>>>> to 1-2 minutes, IIRC. >>>>>>>>>>>> The default ping time was set to 10 seconds + some Gaussian >>>>>>>>>>>> distribution, >>>>>>>>>>>> so that's what I used, but changing the ping time to 1 >>>>>>>>>>>> minute >>>>>>>>>>>> or more >>>>>>>>>>>> will >>>>>>>>>>>> result in the same effect except it will take longer for >>>>>>>>>>>> the >>>>>>>>>>>> "cycle" to >>>>>>>>>>>> occur. >>>>>>>>>>>> >>>>>>>>>>>> What I have in mind was the following: >>>>>>>>>>>> >>>>>>>>>>>> 1) In NCClient.addNeighbor() would be changed to (in >>>>>>>>>>>> pseudocode) >>>>>>>>>>>> protected boolean addNeighbor(RemoteState<T> guy) { >>>>>>>>>>>> boolean added = false; >>>>>>>>>>>> //if neighbors.size() >= MAX_NEIGHBORS >>>>>>>>>>>> //Check last update time of each neighbor in >>>>>>>>>>>> neighbors >>>>>>>>>>>> //neighbor.getLastUpdateTime() >>>>>>>>>>>> //if any of them is older than some number >>>>>>>>>>>> TIMEOUT, >>>>>>>>>>>> remove the oldest one, then add "guy" to neighbors >>>>>>>>>>>> //else don't add "guy" to neighbors >>>>>>>>>>>> //else add "guy" to neighbors >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>> IMO this shouldn't be the behavior of this function because >>>>>>>>>>> it >>>>>>>>>>> is called >>>>>>>>>>> "addNeighbor." Instead what could happen is that part of >>>>>>>>>>> the >>>>>>>>>>> performMaintenance() function could remove old neighbors. >>>>>>>>>>> >>>>>>>>>>>> 2) In NCManager, getGossipNode() would just pick a random >>>>>>>>>>>> node >>>>>>>>>>>> from >>>>>>>>>>>> localNC.primaryNC.getNeighbors() to send a request to >>>>>>>>>>>> every 10 >>>>>>>>>>>> seconds. >>>>>>>>>>>> >>>>>>>>>>>> 3) Pyxida will still keep track of all GossipNodes in the >>>>>>>>>>>> network. It >>>>>>>>>>>> could >>>>>>>>>>>> also (say every 10 minutes) check its NCClient.neighbors >>>>>>>>>>>> list >>>>>>>>>>>> and see if >>>>>>>>>>>> any neighbor is older than TIMEOUT, remove it, then pick a >>>>>>>>>>>> random >>>>>>>>>>>> GossipNode to send a ping request. >>>>>>>>>>>> >>>>>>>>>>>> The main problem with this scheme is that one Pyxida node >>>>>>>>>>>> would >>>>>>>>>>>> most >>>>>>>>>>>> likely >>>>>>>>>>>> stay with the same set of neighbors all the time. Its >>>>>>>>>>>> coordinates >>>>>>>>>>>> would be >>>>>>>>>>>> greatly affected by those 32 nodes (I am assuming >>>>>>>>>>>> MAX_NEIGHBORS >>>>>>>>>>>> = 32). >>>>>>>>>>>> >>>>>>>>>>> Instead of (2), why not have the getNeighborToPing function >>>>>>>>>>> determine >>>>>>>>>>> the behavior? If it returns a node, then ping it. If it >>>>>>>>>>> returns null, >>>>>>>>>>> then have the NCManager pick some other node to use? If >>>>>>>>>>> things are >>>>>>>>>>> hunky-dory, getNeighborToPing() would return a node (a >>>>>>>>>>> current >>>>>>>>>>> member of >>>>>>>>>>> the neighbor set), if we're short on neighbors, then it >>>>>>>>>>> would >>>>>>>>>>> return >>>>>>>>>>> null. The periodic maintenance would have already tossed >>>>>>>>>>> old >>>>>>>>>>> nodes, so >>>>>>>>>>> this would be self-sustaining. >>>>>>>>>>> >>>>>>>>>>> The "problem" you mention is fairly central to network >>>>>>>>>>> coordinates in >>>>>>>>>>> general. Sure the coordinates will be a little better the >>>>>>>>>>> more >>>>>>>>>>> neighbors you have. But in a very large system talking to >>>>>>>>>>> everybody >>>>>>>>>>> isn't feasible. Peter and I have a graph in one of the >>>>>>>>>>> workshop >>>>>>>>>>> papers >>>>>>>>>>> which shows the effect of increasing the number of neighbors >>>>>>>>>>> >>>>>>>>>>>> I don't think the network would be partitioned, but it >>>>>>>>>>>> could >>>>>>>>>>>> happen. For >>>>>>>>>>>> example, the boostrap nodes have each other in their >>>>>>>>>>>> neighbors >>>>>>>>>>>> list. >>>>>>>>>>>> Every >>>>>>>>>>>> new node bootstrapping to the network would contact those >>>>>>>>>>>> bootstrapping >>>>>>>>>>>> nodes first and add them to their neighbors. So every >>>>>>>>>>>> Pyxida >>>>>>>>>>>> node could >>>>>>>>>>>> have as its neighbors the "Bootstrap nodes", which would >>>>>>>>>>>> make >>>>>>>>>>>> Pyxida >>>>>>>>>>>> become >>>>>>>>>>>> a centralized system. >>>>>>>>>>>> >>>>>>>>>>> As long as the initial bootstrap list is large enough, I >>>>>>>>>>> think >>>>>>>>>>> you are >>>>>>>>>>> right, that partitions won't be a problem. >>>>>>>>>>> >>>>>>>>>>>> One last thing: When Pyxida receives a ping request from a >>>>>>>>>>>> node, it does >>>>>>>>>>>> not update its coordinates. Is there any particular reason >>>>>>>>>>>> for >>>>>>>>>>>> that? >>>>>>>>>>>> Is it >>>>>>>>>>>> because Pyxida will not be able to calculate the RTT? >>>>>>>>>>>> >>>>>>>>>>> Yes. >>>>>>>>>>> >>>>>>>>>>>> Also, there is no "gossiping" of other nodes' coordinates, >>>>>>>>>>>> right? So if A >>>>>>>>>>>> has B in its neighbors list, it will from time to time tell >>>>>>>>>>>> other nodes >>>>>>>>>>>> about B's coordinates. I realize this has a lot of >>>>>>>>>>>> complications but the >>>>>>>>>>>> original Vivaldi paper was very fuzzy about this point and >>>>>>>>>>>> the >>>>>>>>>>>> previous >>>>>>>>>>>> point that it seems it is up to the implementors to decide >>>>>>>>>>>> what >>>>>>>>>>>> to do. >>>>>>>>>>>> >>>>>>>>>>> I don't think we are doing this. It could be useful in >>>>>>>>>>> certain >>>>>>>>>>> circumstance (e.g., picking potential new neighbors), but >>>>>>>>>>> we're >>>>>>>>>>> not, IIRC. >>>>>>>>>>> >>>>>>>>>>>> I will have to think some more about this as there are a >>>>>>>>>>>> lot >>>>>>>>>>>> of >>>>>>>>>>>> nasty >>>>>>>>>>>> little side effects that could happen, but these are my >>>>>>>>>>>> thoughts. >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> >>>>>>>>>>>> -Eric >>>>>>>>>>>> >>>>>>>>>>> Sounds good. >>>>>>>>>>> >>>>>>>>>>> -jonathan >>>>>>>>>>> >>>>>>>>>>>> On Nov 3 2008, Jonathan Ledlie wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Eric, >>>>>>>>>>>>> >>>>>>>>>>>>> Based on a look at the code just now, I see where you are >>>>>>>>>>>>> coming from: >>>>>>>>>>>>> with these params on PL, there isn't anything stopping the >>>>>>>>>>>>> neighbor >>>>>>>>>>>>> list from circulating pretty quickly through all nodes. >>>>>>>>>>>>> In >>>>>>>>>>>>> the >>>>>>>>>>>>> Azureus implementation, the guys fed to the alg (fed to >>>>>>>>>>>>> processSample) >>>>>>>>>>>>> weren't changing much because they were coming from the >>>>>>>>>>>>> routing finger >>>>>>>>>>>>> tables. I don't remember observing this behavior in our >>>>>>>>>>>>> PL >>>>>>>>>>>>> deployment, but that could have been because we had the >>>>>>>>>>>>> ping >>>>>>>>>>>>> time set >>>>>>>>>>>>> to 1-2 minutes, IIRC. >>>>>>>>>>>>> >>>>>>>>>>>>> Do you want to insert some code that will fix this? >>>>>>>>>>>>> Basically >>>>>>>>>>>>> we want >>>>>>>>>>>>> to only add a new neighbor if our oldest neighbor should >>>>>>>>>>>>> be >>>>>>>>>>>>> tossed (or >>>>>>>>>>>>> if we haven't reached MAX_NEIGHBORS). It's probably best >>>>>>>>>>>>> from a >>>>>>>>>>>>> clarity standpoint if addNeighbor does continue to >>>>>>>>>>>>> actually >>>>>>>>>>>>> add the >>>>>>>>>>>>> neighbor in all cases. Probably the best thing to do is >>>>>>>>>>>>> to >>>>>>>>>>>>> have >>>>>>>>>>>>> NCClient drop old neighbors internally and have NCManager >>>>>>>>>>>>> ask >>>>>>>>>>>>> it which >>>>>>>>>>>>> nodes should be pinged via getNeighborToPing. So NCClient >>>>>>>>>>>>> would >>>>>>>>>>>>> continue to have the same behavior on processSample (it >>>>>>>>>>>>> would not >>>>>>>>>>>>> reject new information) and NCManager and NCClient would >>>>>>>>>>>>> cooperate to >>>>>>>>>>>>> measure RTTs to current neighbors. We do want to make >>>>>>>>>>>>> sure, >>>>>>>>>>>>> however, >>>>>>>>>>>>> that the gossip set stays large enough so that partitions >>>>>>>>>>>>> don't occur. >>>>>>>>>>>>> >>>>>>>>>>>>> Do you want to be added to the SourceForge Pyxida >>>>>>>>>>>>> developers >>>>>>>>>>>>> to make >>>>>>>>>>>>> this change? (Please test it out carefully first.) >>>>>>>>>>>>> >>>>>>>>>>>>> -jonathan >>>>>>>>>>>>> >>>>>>>>>>>>> On Oct 30, 2008, at 11:32 AM, ext Eric Chan-Tin wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Jonathan, >>>>>>>>>>>>>> >>>>>>>>>>>>>> From what I saw, eventually, you learn about all the >>>>>>>>>>>>>> nodes >>>>>>>>>>>>>> in >>>>>>>>>>>>>> the >>>>>>>>>>>>>> network. Every 10 seconds, Pyxida picks a random "Gossip" >>>>>>>>>>>>>> node from >>>>>>>>>>>>>> its >>>>>>>>>>>>>> list and sends a request. The Gossip node will reply back >>>>>>>>>>>>>> with an >>>>>>>>>>>>>> "upNeighbor", that is, a "Gossip" node that is "up" or >>>>>>>>>>>>>> with 10% >>>>>>>>>>>>>> probability, a node that is "pending". Since naturally, >>>>>>>>>>>>>> all >>>>>>>>>>>>>> the nodes >>>>>>>>>>>>>> bootstrap off the same set of nodes, every node will >>>>>>>>>>>>>> eventually know >>>>>>>>>>>>>> of >>>>>>>>>>>>>> every other node. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Pyxida might not be rotating through everybody >>>>>>>>>>>>>> continuously >>>>>>>>>>>>>> (although >>>>>>>>>>>>>> eventually, since it knows everybody, it will have added >>>>>>>>>>>>>> everybody to >>>>>>>>>>>>>> its Neighbors list at some point), but could be >>>>>>>>>>>>>> removing a >>>>>>>>>>>>>> node that >>>>>>>>>>>>>> is >>>>>>>>>>>>>> "alive". So, if MAX_NEIGHBORS = 32, let's call them A_1, >>>>>>>>>>>>>> A_2, >>>>>>>>>>>>>> ..., >>>>>>>>>>>>>> A_32. >>>>>>>>>>>>>> Pyxida picks a random Gossip node B, sends it a request >>>>>>>>>>>>>> and B >>>>>>>>>>>>>> replies >>>>>>>>>>>>>> back. Pyxida will add B to its Neighbors list, see that >>>>>>>>>>>>>> size of >>>>>>>>>>>>>> Neighbors > MAX_NEIGHBORS, kick A_1 out. Then from those >>>>>>>>>>>>>> 32 >>>>>>>>>>>>>> neighbors >>>>>>>>>>>>>> A_2, ...., A_32, B, it will calculate the force, and >>>>>>>>>>>>>> apply >>>>>>>>>>>>>> it >>>>>>>>>>>>>> to the >>>>>>>>>>>>>> current system coordinates. Thus, if every 10 seconds, >>>>>>>>>>>>>> Pyxida >>>>>>>>>>>>>> picks a >>>>>>>>>>>>>> different set of nodes C, D, E, etc... it would keep >>>>>>>>>>>>>> kicking >>>>>>>>>>>>>> nodes out >>>>>>>>>>>>>> of the neighbors list. >>>>>>>>>>>>>> >>>>>>>>>>>>>> It's the "kicking A_1 out" part that seemed a bit >>>>>>>>>>>>>> confusing. >>>>>>>>>>>>>> Churn >>>>>>>>>>>>>> is a >>>>>>>>>>>>>> good reason but you don't check that A_1 might still be >>>>>>>>>>>>>> alive >>>>>>>>>>>>>> or maybe >>>>>>>>>>>>>> A_1 was just added. I have been printing the neighbors >>>>>>>>>>>>>> list >>>>>>>>>>>>>> every 10 >>>>>>>>>>>>>> minutes, and they are completely different between those >>>>>>>>>>>>>> 10 >>>>>>>>>>>>>> minutes. >>>>>>>>>>>>>> >>>>>>>>>>>>>> One possible problem might be that I am choosing the >>>>>>>>>>>>>> value >>>>>>>>>>>>>> of >>>>>>>>>>>>>> MAX_NEIGHBORS to be too small. Originally it was set to >>>>>>>>>>>>>> 512 >>>>>>>>>>>>>> (which was >>>>>>>>>>>>>> meant for Azureus). Since I am using Planetlab where >>>>>>>>>>>>>> there >>>>>>>>>>>>>> are about >>>>>>>>>>>>>> 400-500 "stable" nodes, I changed the value to 32. But I >>>>>>>>>>>>>> don't think >>>>>>>>>>>>>> that should be a problem since log(n) = log(512) = 9. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> >>>>>>>>>>>>>> -Eric >>>>>>>>>>>>>> >>>>>>>>>>>>>> Jonathan Ledlie wrote: >>>>>>>>>>>>>>> Eric, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Glad to hear it. Let Peter and me know if you publish >>>>>>>>>>>>>>> anything using >>>>>>>>>>>>>>> Pyxida and we can put it on the web page. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> "Gossip" nodes are for learning about the network. >>>>>>>>>>>>>>> "Neighbors" are >>>>>>>>>>>>>>> for >>>>>>>>>>>>>>> constructing the coordinate. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> So, IIRC, the gossip process learns about nodes that >>>>>>>>>>>>>>> may or >>>>>>>>>>>>>>> may not >>>>>>>>>>>>>>> be >>>>>>>>>>>>>>> applied to the neighbor list. That is, we may >>>>>>>>>>>>>>> eventually >>>>>>>>>>>>>>> learn >>>>>>>>>>>>>>> about a >>>>>>>>>>>>>>> large portion of the network, via gossiping, but should >>>>>>>>>>>>>>> only >>>>>>>>>>>>>>> be using >>>>>>>>>>>>>>> MAX_NEIGHBORS at any given time for coordinate >>>>>>>>>>>>>>> computation. >>>>>>>>>>>>>>> The >>>>>>>>>>>>>>> reason >>>>>>>>>>>>>>> we don't want to use a fixed set of neighbors for >>>>>>>>>>>>>>> coordinate >>>>>>>>>>>>>>> computation >>>>>>>>>>>>>>> is because nodes may come and go from the system. >>>>>>>>>>>>>>> However, >>>>>>>>>>>>>>> it should >>>>>>>>>>>>>>> not be the case that we just rotate through everybody, >>>>>>>>>>>>>>> because >>>>>>>>>>>>>>> then, as >>>>>>>>>>>>>>> you note, we'd be computing against everyone's >>>>>>>>>>>>>>> coordinate/ >>>>>>>>>>>>>>> latency, >>>>>>>>>>>>>>> and >>>>>>>>>>>>>>> this is counter to the point of network coordinates. >>>>>>>>>>>>>>> With this >>>>>>>>>>>>>>> gradual >>>>>>>>>>>>>>> transience as a desirable quality in mind, I am pretty >>>>>>>>>>>>>>> sure we >>>>>>>>>>>>>>> maintain >>>>>>>>>>>>>>> neighbors in a fairly fixed manner. Does this not seem >>>>>>>>>>>>>>> to >>>>>>>>>>>>>>> be the >>>>>>>>>>>>>>> case >>>>>>>>>>>>>>> to you? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -jonathan >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Oct 29, 2008, at 9:20 PM, Eric Chan-Tin wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi Jonathan and Peter, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> First of all, just wanted to say that Pyxida has been >>>>>>>>>>>>>>>> working >>>>>>>>>>>>>>>> great on >>>>>>>>>>>>>>>> Planetlab for the past 4 months or so. Getting good >>>>>>>>>>>>>>>> data >>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>> experiments >>>>>>>>>>>>>>>> going well. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> However, I do have one question/comment. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Every Pyxida node contains a list of Neighbors (in >>>>>>>>>>>>>>>> NCClient) and a >>>>>>>>>>>>>>>> list >>>>>>>>>>>>>>>> of Gossip nodes (they are called neighbors in NCManager >>>>>>>>>>>>>>>> but >>>>>>>>>>>>>>>> I will >>>>>>>>>>>>>>>> call >>>>>>>>>>>>>>>> them gossip for now so as not to confuse with the >>>>>>>>>>>>>>>> 'real' >>>>>>>>>>>>>>>> neighbors >>>>>>>>>>>>>>>> in >>>>>>>>>>>>>>>> NCClient). Every 10 seconds, Pyxida will pick a random >>>>>>>>>>>>>>>> node >>>>>>>>>>>>>>>> (call >>>>>>>>>>>>>>>> it A) >>>>>>>>>>>>>>>> from Gossip and send a gossip request (basically asking >>>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>> rtt, >>>>>>>>>>>>>>>> coordinate, error). But Gossip contains all the nodes >>>>>>>>>>>>>>>> in >>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>> network. >>>>>>>>>>>>>>>> The comment in the code said that was to avoid Pyxida >>>>>>>>>>>>>>>> becoming >>>>>>>>>>>>>>>> lonely >>>>>>>>>>>>>>>> and no-one to gossip with, which makes sense. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Once A responds, Pyxida will add A to its Neighbors (if >>>>>>>>>>>>>>>> Neighbors >>>>>>>>>>>>>>>> does >>>>>>>>>>>>>>>> not contain A) and remove the first element from the >>>>>>>>>>>>>>>> Neighbors list. >>>>>>>>>>>>>>>> This seems a little bit odd. Was there any particular >>>>>>>>>>>>>>>> reasoning >>>>>>>>>>>>>>>> behind >>>>>>>>>>>>>>>> this? Because it would seem like you're getting the >>>>>>>>>>>>>>>> coordinate from >>>>>>>>>>>>>>>> every node in the network eventually. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks a lot, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -Eric >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Jonathan Ledlie wrote: >>>>>>>>>>>>>>>>> Your inclination is correct. See forwarded message. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> -jonathan >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Apr 9, 2008, at 1:53 PM, ext Eric Chan-Tin wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thank you for the clarification. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I've been reading through your published papers and >>>>>>>>>>>>>>>>>> code >>>>>>>>>>>>>>>>>> and I >>>>>>>>>>>>>>>>>> must >>>>>>>>>>>>>>>>>> say I learned a lot about how Pyxida works :) >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I still have a quick and easy (hopefully) question. >>>>>>>>>>>>>>>>>> Each >>>>>>>>>>>>>>>>>> Pyxida >>>>>>>>>>>>>>>>>> node >>>>>>>>>>>>>>>>>> has a small amount of neighbors and learn about other >>>>>>>>>>>>>>>>>> nodes >>>>>>>>>>>>>>>>>> through >>>>>>>>>>>>>>>>>> other nodes. I am running on about 200 machines, and >>>>>>>>>>>>>>>>>> it >>>>>>>>>>>>>>>>>> seems >>>>>>>>>>>>>>>>>> like all >>>>>>>>>>>>>>>>>> the Pyxida nodes are adding each other to their >>>>>>>>>>>>>>>>>> upNeighbors set. >>>>>>>>>>>>>>>>>> NCClient.java has the MAX_NEIGHBORS set to 512. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I wanted to make sure that I am not misreading the >>>>>>>>>>>>>>>>>> code, but >>>>>>>>>>>>>>>>>> shouldn't >>>>>>>>>>>>>>>>>> each Pyxida node keep a small number of neighbors in >>>>>>>>>>>>>>>>>> its >>>>>>>>>>>>>>>>>> upNeighbors >>>>>>>>>>>>>>>>>> set. Else, it will be a lot of updates to send a >>>>>>>>>>>>>>>>>> gossipMessage >>>>>>>>>>>>>>>>>> to all >>>>>>>>>>>>>>>>>> 200 neighbors, and it also means that a Pyxida node >>>>>>>>>>>>>>>>>> has >>>>>>>>>>>>>>>>>> the whole >>>>>>>>>>>>>>>>>> network in its upNeighbors set - that seems wrong. I >>>>>>>>>>>>>>>>>> am >>>>>>>>>>>>>>>>>> inclined >>>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>> change the value of MAX_NEIGHBORS but wanted to make >>>>>>>>>>>>>>>>>> sure >>>>>>>>>>>>>>>>>> that my >>>>>>>>>>>>>>>>>> understanding is correct first. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thank you, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Eric >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> ------------------------------------------------------------------------- >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> This SF.Net email is sponsored by the Moblin Your Move >>>>>>>>>>>>>>>> Developer's >>>>>>>>>>>>>>>> challenge Build the coolest Linux based applications >>>>>>>>>>>>>>>> with >>>>>>>>>>>>>>>> Moblin >>>>>>>>>>>>>>>> SDK & >>>>>>>>>>>>>>>> win great prizes Grand prize is a trip for two to an >>>>>>>>>>>>>>>> Open >>>>>>>>>>>>>>>> Source >>>>>>>>>>>>>>>> event >>>>>>>>>>>>>>>> anywhere in the world >>>>>>>>>>>>>>>> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>> Pyxida-users mailing >>>>>>>>>>>>>>>> list Pyx...@li... >>>>>>>>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/pyxida- >>>>>>>>>>>>>>>> users >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> ------------------------------------------------------------------------- >>>>>>>>>>>> This SF.Net email is sponsored by the Moblin Your Move >>>>>>>>>>>> Developer's challenge Build the coolest Linux based >>>>>>>>>>>> applications >>>>>>>>>>>> with Moblin SDK & win great prizes Grand prize is a trip >>>>>>>>>>>> for two >>>>>>>>>>>> to an Open Source event anywhere in the world >>>>>>>>>>>> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >>>>>>>>>>>> _______________________________________________ Pyxida- >>>>>>>>>>>> users >>>>>>>>>>>> mailing list Pyx...@li... >>>>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/pyxida-users >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> ------------------------------------------------------------------------------ >>>>>>>>>> SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las >>>>>>>>>> Vegas, Nevada. The future of the web can't happen without >>>>>>>>>> you. >>>>>>>>>> Join us at MIX09 to help pave the way to the Next Web now. >>>>>>>>>> Learn >>>>>>>>>> more and register at >>>>>>>>>> http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/ >>>>>>>>>> _______________________________________________ >>>>>>>>>> Pyxida-users mailing list Pyx...@li... >>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/pyxida-users >>>>>>>>> >>>>>>>>> >>>>>>> >>> >> >> >> ------------------------------------------------------------------------------ >> SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, >> Nevada. >> The future of the web can't happen without you. Join us at MIX09 to >> help >> pave the way to the Next Web now. Learn more and register at >> http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/ >> _______________________________________________ >> Pyxida-users mailing list >> Pyx...@li... >> https://lists.sourceforge.net/lists/listinfo/pyxida-users > > > ------------------------------------------------------------------------------ > SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, > Nevada. > The future of the web can't happen without you. Join us at MIX09 to > help > pave the way to the Next Web now. Learn more and register at > http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/ > _______________________________________________ > Pyxida-users mailing list > Pyx...@li... > https://lists.sourceforge.net/lists/listinfo/pyxida-users |
From: Jonathan L. <jon...@ee...> - 2008-12-18 20:51:30
|
I could be wrong, but I think the way you've written getNeighborToPing is biased toward elements earlier in the neighbors set. This is because you traverse neighbor set in the same order each time. I'll see if I can code up a fix. -jonathan On Dec 18, 2008, at 11:35 AM, Eric Chan-Tin wrote: > Sorry for the delay. Was busy with final exams and projects. > > I made the changes. They can be found at > > http://www-users.cs.umn.edu/~dchantin/NCClient.java > http://www-users.cs.umn.edu/~dchantin/NCManager.java > http://www-users.cs.umn.edu/~dchantin/RemoteState.java > > All the changes are enclosed within > //ERIC START > ... > //ERIC END > > I compared every reported "neighbors list" of each Pyxida node > (about 370 > total) and see how many neighbors did not change within 10 minutes. > Then > did the average for all the nodes. > > Original: 43.37% of the neighbors stayed the same on average. > > Fix0 (before we found the interesting situation): 74.73% > > Fix1 (with the weight probability of picking a neighbor to ping): > 95.79% > > Fix2 (with the weight probability and the MIN_UPDATE_PING): 98.81% > > Also, the reported error, rrl, etc.. were much lower in the Fixed > Pyxida > than in the original one (this could not be significant, since the > Original > experiment was performed 1-2 months ago), but as expected, it was not > higher. > > You can take a look at the code and test it if you want. If everything > looks fine, I'll remove some of the debugging code and other "useless" > comments (like //ERIC). Also, let me know which "fix" you prefer. I > will > then email you the updated code. > > Thanks, > > -Eric > > > On Dec 10 2008, Jonathan Ledlie wrote: > >> >> Seems reasonable. >> >> Both ideas (lastPingTime and a weighted probability based on age) >> seem >> to be independent. >> >> Some of the code is more CPU intensive than it should be (e.g., >> looping through potentially long lists), and that is a slight >> downside >> to the function approach vs a set, but this probably isn't a big >> deal. >> >> -jonathan >> >> On Dec 10, 2008, at 10:40 PM, Eric Chan-Tin wrote: >> >>> okay. I think I might have misunderstood you. Your idea of >>> lastPingTime >>> looks interesting. >>> >>> My idea is to introduce a probability function. Right now, it just >>> randomly picks one of the neighbors. What I am thinking it might >>> do is >>> that instead of just randomly picking a neighbor, pick a neighbor >>> based >>> on some function. So, the neighbor whose lastUpdateTime is oldest >>> will >>> have a higher chance of being picked than a neighbor who was just >>> updated. >>> >>> Does that make more sense? >>> >>> -Eric >>> >>> Jonathan Ledlie wrote: >>>> >>>> Hmm. The advantage of lastPingTime is that you learn whether or >>>> not >>>> you've tried to contact a neighbor and they haven't responded. >>>> Then you >>>> can use that knowledge to determine whether or not to toss them. >>>> >>>> Maybe I'm not understanding your suggestion then. >>>> >>>> -jonathan >>>> >>>> On Dec 10, 2008, at 12:51 PM, Eric Chan-Tin wrote: >>>> >>>>> That's a good idea. Here is a little twist on it. >>>>> >>>>> Put a "weight" on each neighbor, based on the lastUpdateTime. So >>>>> if >>>>> currentTime - neighbor's lastUpdateTime, then the weight is >>>>> higher. >>>>> Then in getNeighborToPing, neighbor's with higher weight have a >>>>> higher >>>>> probability of getting picked than neighbor's with lower weight. >>>>> >>>>> The weight could be ( (currentTime - lastUpdateTime) / 15 >>>>> minutes). >>>>> It's not 10 minutes so as to avoid a probability of 1 if the >>>>> lastUpdateTime was the previous performMaintenance. This should >>>>> also >>>>> avoid the problem of always pinging the oldest neighbor since it's >>>>> about probability (weight). >>>>> >>>>> It seems a little bit less complicated than having lastPingTime >>>>> and >>>>> MIN_UPDATE_TIME. >>>>> >>>>> -Eric >>>>> >>>>> On Dec 10 2008, Jonathan Ledlie wrote: >>>>> >>>>>> >>>>>> This is an interesting situation. Let's think about the goal. >>>>>> The >>>>>> goal is to keep pinging the same nodes as long as they are alive, >>>>>> and to replace a node soon after it dies, keeping the total >>>>>> number >>>>>> of up neighbors at approximately MAX_NEIGHBORS. >>>>>> >>>>>> How about the following: >>>>>> - when you pick a neighbor to ping, you set a time on that event >>>>>> (in >>>>>> a new per-neighbor timestamp LastPingTime) >>>>>> - when you first insert a neighbor, set this timestamp to 0. >>>>>> - when you loop in performMaintenance, remove nodes whose >>>>>> LastPingTime - LastUpdateTime > threshold (like 10 minutes) >>>>>> >>>>>> - when picking a node, pick the guy with the oldest LastPingTime >>>>>> (not sure about this one -- I don't like situations where you >>>>>> pick >>>>>> the absolute *oldest* because then we could get in a situation >>>>>> where >>>>>> an old guy sits there, waiting to be kicked out, and we keep >>>>>> picking >>>>>> him to be pinged; I'd prefer to pick randomly, or random older >>>>>> guys). >>>>>> - a simple way around this is to simply have a self-adjusting >>>>>> threshold: so, we walk through the nodes, and everyone whose >>>>>> LastPingTime is older than MIN_UPDATE_TIME_TO_PING ago (a >>>>>> relative >>>>>> time) get added to a set; we pick a guy randomly from the set to >>>>>> ping. If the set is empty, we increase MIN_UPDATE_TIME_TO_PING, >>>>>> and >>>>>> wait for the next time the function is called. If the set is too >>>>>> large (relative to a fraction of MAX_NEIGHBORS), we decrease it. >>>>>> >>>>>> We still might get some churn from lost UDP packets. You might >>>>>> also >>>>>> give nodes a second chance to be pinged by including >>>>>> LastPingTime - >>>>>> LastUpdateTime > 1 minute in the set. Note that increases and >>>>>> decreases in MIN_UPDATE_TIME_TO_PING shouldn't be affected by the >>>>>> number of nodes that fall into this category. >>>>>> >>>>>> We still probably need a way for new potential neighbors to be >>>>>> rejected if our current set of neighbors is fine. Because I >>>>>> still >>>>>> want addNeigbor to really add the neighbor, how about putting a >>>>>> check before calling that to see if we need more neighbors. >>>>>> >>>>>> What thinks? >>>>>> >>>>>> -jonathan >>>>>> >>>>>> On Dec 9, 2008, at 11:37 PM, ext Eric Chan-Tin wrote: >>>>>> >>>>>>> Peter, >>>>>>> >>>>>>> Thanks. That fixed it. >>>>>>> >>>>>>> However, here is another problem I found while deploying Pyxida >>>>>>> and >>>>>>> running them for a while. >>>>>>> >>>>>>> In performMaintenance(), I loop through every neighbor and if >>>>>>> (neighbor.getLastUpdateTime() < current_time - 10 minutes), >>>>>>> then I >>>>>>> remove that neighbor. >>>>>> >>>>>> >>>>>>> >>>>>>> >>>>>>> In NCManager, I call getNeighborToPing, which randomly picks one >>>>>>> of >>>>>>> the >>>>>>> neighbors (32 in this case). Pinging is performed every 10 >>>>>>> seconds, so >>>>>>> in 10 minutes, you can ping 60 nodes. However, if a neighbor A >>>>>>> is not >>>>>>> among those 60 nodes, it will get kicked out in >>>>>>> performMaintenance(), >>>>>>> although A is still alive (it just hasn't been contacted yet). >>>>>>> >>>>>>> There are some possible options. >>>>>>> >>>>>>> 1) Increase the timeout period from 10 minutes to ?? minutes. >>>>>>> This >>>>>>> doesn't quite work because say in Azureus, the number of >>>>>>> neighbors is >>>>>>> 512, so the timeout needs to be increased even more... It could >>>>>>> be a >>>>>>> function of MAX_NEIGHBORS... >>>>>>> >>>>>>> 2) When Pyxida replies to a ping request, it will update the >>>>>>> neighbor's >>>>>>> lastUpdateTime. This does not quite work either because it >>>>>>> assumes a >>>>>>> symmetric relationship which is most likely not the case. >>>>>>> >>>>>>> 3) Introduce some more logic to Pyxida so it remembers what >>>>>>> nodes it >>>>>>> contacted and does not send, say, more than 2 requests in 10 >>>>>>> minutes to >>>>>>> the same node. This solution isn't that great either. >>>>>>> >>>>>>> Let me know what you guys think. I am currently leaning towards >>>>>>> option >>>>>>> 1) but make the timeout period a function of MAX_NEIGHBORS, >>>>>>> although I >>>>>>> haven't quite found the perfect function yet. >>>>>>> >>>>>>> Here are some preliminary results. I compared the number of >>>>>>> neighbors >>>>>>> which are the same at each reported time (10 minutes). Before >>>>>>> the fix, >>>>>>> more than 50% of the neighbors changed within 10 minutes. After >>>>>>> the >>>>>>> fix, >>>>>>> only about 30% of the neighbors changed within 10 minutes. It is >>>>>>> more >>>>>>> stable but I think 30% is still quite a big number. >>>>>>> >>>>>>> Sorry for the long email :) >>>>>>> >>>>>>> -Eric >>>>>>> >>>>>>> Peter Pietzuch wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> Eric Chan-Tin wrote: >>>>>>>>> for (RemoteState<T> neighbor : neighbors) { //this is line >>>>>>>>> 1029 >>>>>>>>> if (neighbor.getLastUpdateTime() < (curr_time - >>>>>>>>> MAINTENANCE_PERIOD)) { >>>>>>>>> neighbors.remove(neighbor); >>>>>>>>> } >>>>>>>>> } >>>>>>>> >>>>>>>> IIRC you can't modify a collection while iterating over it >>>>>>>> using the >>>>>>>> above for loop. Instead you have to use a proper >>>>>>>> Iterator<RemoteState<T>> and use it's remove method. >>>>>>>> >>>>>>>> Cheers, >>>>>>>> Peter >>>>>>>> >>>>>>>>> Jonathan Ledlie wrote: >>>>>>>>>> On Nov 3, 2008, at 3:46 PM, Eric Chan-Tin wrote: >>>>>>>>>> >>>>>>>>>>> Jonathan, >>>>>>>>>>> >>>>>>>>>>> I can insert some code to fix this and it doesn't look too >>>>>>>>>>> hard >>>>>>>>>>> but I do >>>>>>>>>>> have some concerns. >>>>>>>>>>> >>>>>>>>>>>> I don't remember observing this behavior in our PL >>>>>>>>>>>> deployment, but that could have been because we had the >>>>>>>>>>>> ping >>>>>>>>>>>> time set >>>>>>>>>>>> to 1-2 minutes, IIRC. >>>>>>>>>>> The default ping time was set to 10 seconds + some Gaussian >>>>>>>>>>> distribution, >>>>>>>>>>> so that's what I used, but changing the ping time to 1 >>>>>>>>>>> minute >>>>>>>>>>> or more >>>>>>>>>>> will >>>>>>>>>>> result in the same effect except it will take longer for the >>>>>>>>>>> "cycle" to >>>>>>>>>>> occur. >>>>>>>>>>> >>>>>>>>>>> What I have in mind was the following: >>>>>>>>>>> >>>>>>>>>>> 1) In NCClient.addNeighbor() would be changed to (in >>>>>>>>>>> pseudocode) >>>>>>>>>>> protected boolean addNeighbor(RemoteState<T> guy) { >>>>>>>>>>> boolean added = false; >>>>>>>>>>> //if neighbors.size() >= MAX_NEIGHBORS >>>>>>>>>>> //Check last update time of each neighbor in >>>>>>>>>>> neighbors >>>>>>>>>>> //neighbor.getLastUpdateTime() >>>>>>>>>>> //if any of them is older than some number >>>>>>>>>>> TIMEOUT, >>>>>>>>>>> remove the oldest one, then add "guy" to neighbors >>>>>>>>>>> //else don't add "guy" to neighbors >>>>>>>>>>> //else add "guy" to neighbors >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>> IMO this shouldn't be the behavior of this function because >>>>>>>>>> it >>>>>>>>>> is called >>>>>>>>>> "addNeighbor." Instead what could happen is that part of the >>>>>>>>>> performMaintenance() function could remove old neighbors. >>>>>>>>>> >>>>>>>>>>> 2) In NCManager, getGossipNode() would just pick a random >>>>>>>>>>> node >>>>>>>>>>> from >>>>>>>>>>> localNC.primaryNC.getNeighbors() to send a request to >>>>>>>>>>> every 10 >>>>>>>>>>> seconds. >>>>>>>>>>> >>>>>>>>>>> 3) Pyxida will still keep track of all GossipNodes in the >>>>>>>>>>> network. It >>>>>>>>>>> could >>>>>>>>>>> also (say every 10 minutes) check its NCClient.neighbors >>>>>>>>>>> list >>>>>>>>>>> and see if >>>>>>>>>>> any neighbor is older than TIMEOUT, remove it, then pick a >>>>>>>>>>> random >>>>>>>>>>> GossipNode to send a ping request. >>>>>>>>>>> >>>>>>>>>>> The main problem with this scheme is that one Pyxida node >>>>>>>>>>> would >>>>>>>>>>> most >>>>>>>>>>> likely >>>>>>>>>>> stay with the same set of neighbors all the time. Its >>>>>>>>>>> coordinates >>>>>>>>>>> would be >>>>>>>>>>> greatly affected by those 32 nodes (I am assuming >>>>>>>>>>> MAX_NEIGHBORS >>>>>>>>>>> = 32). >>>>>>>>>>> >>>>>>>>>> Instead of (2), why not have the getNeighborToPing function >>>>>>>>>> determine >>>>>>>>>> the behavior? If it returns a node, then ping it. If it >>>>>>>>>> returns null, >>>>>>>>>> then have the NCManager pick some other node to use? If >>>>>>>>>> things are >>>>>>>>>> hunky-dory, getNeighborToPing() would return a node (a >>>>>>>>>> current >>>>>>>>>> member of >>>>>>>>>> the neighbor set), if we're short on neighbors, then it would >>>>>>>>>> return >>>>>>>>>> null. The periodic maintenance would have already tossed old >>>>>>>>>> nodes, so >>>>>>>>>> this would be self-sustaining. >>>>>>>>>> >>>>>>>>>> The "problem" you mention is fairly central to network >>>>>>>>>> coordinates in >>>>>>>>>> general. Sure the coordinates will be a little better the >>>>>>>>>> more >>>>>>>>>> neighbors you have. But in a very large system talking to >>>>>>>>>> everybody >>>>>>>>>> isn't feasible. Peter and I have a graph in one of the >>>>>>>>>> workshop >>>>>>>>>> papers >>>>>>>>>> which shows the effect of increasing the number of neighbors >>>>>>>>>> >>>>>>>>>>> I don't think the network would be partitioned, but it could >>>>>>>>>>> happen. For >>>>>>>>>>> example, the boostrap nodes have each other in their >>>>>>>>>>> neighbors >>>>>>>>>>> list. >>>>>>>>>>> Every >>>>>>>>>>> new node bootstrapping to the network would contact those >>>>>>>>>>> bootstrapping >>>>>>>>>>> nodes first and add them to their neighbors. So every Pyxida >>>>>>>>>>> node could >>>>>>>>>>> have as its neighbors the "Bootstrap nodes", which would >>>>>>>>>>> make >>>>>>>>>>> Pyxida >>>>>>>>>>> become >>>>>>>>>>> a centralized system. >>>>>>>>>>> >>>>>>>>>> As long as the initial bootstrap list is large enough, I >>>>>>>>>> think >>>>>>>>>> you are >>>>>>>>>> right, that partitions won't be a problem. >>>>>>>>>> >>>>>>>>>>> One last thing: When Pyxida receives a ping request from a >>>>>>>>>>> node, it does >>>>>>>>>>> not update its coordinates. Is there any particular reason >>>>>>>>>>> for >>>>>>>>>>> that? >>>>>>>>>>> Is it >>>>>>>>>>> because Pyxida will not be able to calculate the RTT? >>>>>>>>>>> >>>>>>>>>> Yes. >>>>>>>>>> >>>>>>>>>>> Also, there is no "gossiping" of other nodes' coordinates, >>>>>>>>>>> right? So if A >>>>>>>>>>> has B in its neighbors list, it will from time to time tell >>>>>>>>>>> other nodes >>>>>>>>>>> about B's coordinates. I realize this has a lot of >>>>>>>>>>> complications but the >>>>>>>>>>> original Vivaldi paper was very fuzzy about this point and >>>>>>>>>>> the >>>>>>>>>>> previous >>>>>>>>>>> point that it seems it is up to the implementors to decide >>>>>>>>>>> what >>>>>>>>>>> to do. >>>>>>>>>>> >>>>>>>>>> I don't think we are doing this. It could be useful in >>>>>>>>>> certain >>>>>>>>>> circumstance (e.g., picking potential new neighbors), but >>>>>>>>>> we're >>>>>>>>>> not, IIRC. >>>>>>>>>> >>>>>>>>>>> I will have to think some more about this as there are a lot >>>>>>>>>>> of >>>>>>>>>>> nasty >>>>>>>>>>> little side effects that could happen, but these are my >>>>>>>>>>> thoughts. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> >>>>>>>>>>> -Eric >>>>>>>>>>> >>>>>>>>>> Sounds good. >>>>>>>>>> >>>>>>>>>> -jonathan >>>>>>>>>> >>>>>>>>>>> On Nov 3 2008, Jonathan Ledlie wrote: >>>>>>>>>>> >>>>>>>>>>>> Eric, >>>>>>>>>>>> >>>>>>>>>>>> Based on a look at the code just now, I see where you are >>>>>>>>>>>> coming from: >>>>>>>>>>>> with these params on PL, there isn't anything stopping the >>>>>>>>>>>> neighbor >>>>>>>>>>>> list from circulating pretty quickly through all nodes. In >>>>>>>>>>>> the >>>>>>>>>>>> Azureus implementation, the guys fed to the alg (fed to >>>>>>>>>>>> processSample) >>>>>>>>>>>> weren't changing much because they were coming from the >>>>>>>>>>>> routing finger >>>>>>>>>>>> tables. I don't remember observing this behavior in our PL >>>>>>>>>>>> deployment, but that could have been because we had the >>>>>>>>>>>> ping >>>>>>>>>>>> time set >>>>>>>>>>>> to 1-2 minutes, IIRC. >>>>>>>>>>>> >>>>>>>>>>>> Do you want to insert some code that will fix this? >>>>>>>>>>>> Basically >>>>>>>>>>>> we want >>>>>>>>>>>> to only add a new neighbor if our oldest neighbor should be >>>>>>>>>>>> tossed (or >>>>>>>>>>>> if we haven't reached MAX_NEIGHBORS). It's probably best >>>>>>>>>>>> from a >>>>>>>>>>>> clarity standpoint if addNeighbor does continue to actually >>>>>>>>>>>> add the >>>>>>>>>>>> neighbor in all cases. Probably the best thing to do is to >>>>>>>>>>>> have >>>>>>>>>>>> NCClient drop old neighbors internally and have NCManager >>>>>>>>>>>> ask >>>>>>>>>>>> it which >>>>>>>>>>>> nodes should be pinged via getNeighborToPing. So NCClient >>>>>>>>>>>> would >>>>>>>>>>>> continue to have the same behavior on processSample (it >>>>>>>>>>>> would not >>>>>>>>>>>> reject new information) and NCManager and NCClient would >>>>>>>>>>>> cooperate to >>>>>>>>>>>> measure RTTs to current neighbors. We do want to make >>>>>>>>>>>> sure, >>>>>>>>>>>> however, >>>>>>>>>>>> that the gossip set stays large enough so that partitions >>>>>>>>>>>> don't occur. >>>>>>>>>>>> >>>>>>>>>>>> Do you want to be added to the SourceForge Pyxida >>>>>>>>>>>> developers >>>>>>>>>>>> to make >>>>>>>>>>>> this change? (Please test it out carefully first.) >>>>>>>>>>>> >>>>>>>>>>>> -jonathan >>>>>>>>>>>> >>>>>>>>>>>> On Oct 30, 2008, at 11:32 AM, ext Eric Chan-Tin wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Jonathan, >>>>>>>>>>>>> >>>>>>>>>>>>> From what I saw, eventually, you learn about all the nodes >>>>>>>>>>>>> in >>>>>>>>>>>>> the >>>>>>>>>>>>> network. Every 10 seconds, Pyxida picks a random "Gossip" >>>>>>>>>>>>> node from >>>>>>>>>>>>> its >>>>>>>>>>>>> list and sends a request. The Gossip node will reply back >>>>>>>>>>>>> with an >>>>>>>>>>>>> "upNeighbor", that is, a "Gossip" node that is "up" or >>>>>>>>>>>>> with 10% >>>>>>>>>>>>> probability, a node that is "pending". Since naturally, >>>>>>>>>>>>> all >>>>>>>>>>>>> the nodes >>>>>>>>>>>>> bootstrap off the same set of nodes, every node will >>>>>>>>>>>>> eventually know >>>>>>>>>>>>> of >>>>>>>>>>>>> every other node. >>>>>>>>>>>>> >>>>>>>>>>>>> Pyxida might not be rotating through everybody >>>>>>>>>>>>> continuously >>>>>>>>>>>>> (although >>>>>>>>>>>>> eventually, since it knows everybody, it will have added >>>>>>>>>>>>> everybody to >>>>>>>>>>>>> its Neighbors list at some point), but could be removing a >>>>>>>>>>>>> node that >>>>>>>>>>>>> is >>>>>>>>>>>>> "alive". So, if MAX_NEIGHBORS = 32, let's call them A_1, >>>>>>>>>>>>> A_2, >>>>>>>>>>>>> ..., >>>>>>>>>>>>> A_32. >>>>>>>>>>>>> Pyxida picks a random Gossip node B, sends it a request >>>>>>>>>>>>> and B >>>>>>>>>>>>> replies >>>>>>>>>>>>> back. Pyxida will add B to its Neighbors list, see that >>>>>>>>>>>>> size of >>>>>>>>>>>>> Neighbors > MAX_NEIGHBORS, kick A_1 out. Then from those >>>>>>>>>>>>> 32 >>>>>>>>>>>>> neighbors >>>>>>>>>>>>> A_2, ...., A_32, B, it will calculate the force, and apply >>>>>>>>>>>>> it >>>>>>>>>>>>> to the >>>>>>>>>>>>> current system coordinates. Thus, if every 10 seconds, >>>>>>>>>>>>> Pyxida >>>>>>>>>>>>> picks a >>>>>>>>>>>>> different set of nodes C, D, E, etc... it would keep >>>>>>>>>>>>> kicking >>>>>>>>>>>>> nodes out >>>>>>>>>>>>> of the neighbors list. >>>>>>>>>>>>> >>>>>>>>>>>>> It's the "kicking A_1 out" part that seemed a bit >>>>>>>>>>>>> confusing. >>>>>>>>>>>>> Churn >>>>>>>>>>>>> is a >>>>>>>>>>>>> good reason but you don't check that A_1 might still be >>>>>>>>>>>>> alive >>>>>>>>>>>>> or maybe >>>>>>>>>>>>> A_1 was just added. I have been printing the neighbors >>>>>>>>>>>>> list >>>>>>>>>>>>> every 10 >>>>>>>>>>>>> minutes, and they are completely different between those >>>>>>>>>>>>> 10 >>>>>>>>>>>>> minutes. >>>>>>>>>>>>> >>>>>>>>>>>>> One possible problem might be that I am choosing the value >>>>>>>>>>>>> of >>>>>>>>>>>>> MAX_NEIGHBORS to be too small. Originally it was set to >>>>>>>>>>>>> 512 >>>>>>>>>>>>> (which was >>>>>>>>>>>>> meant for Azureus). Since I am using Planetlab where there >>>>>>>>>>>>> are about >>>>>>>>>>>>> 400-500 "stable" nodes, I changed the value to 32. But I >>>>>>>>>>>>> don't think >>>>>>>>>>>>> that should be a problem since log(n) = log(512) = 9. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> >>>>>>>>>>>>> -Eric >>>>>>>>>>>>> >>>>>>>>>>>>> Jonathan Ledlie wrote: >>>>>>>>>>>>>> Eric, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Glad to hear it. Let Peter and me know if you publish >>>>>>>>>>>>>> anything using >>>>>>>>>>>>>> Pyxida and we can put it on the web page. >>>>>>>>>>>>>> >>>>>>>>>>>>>> "Gossip" nodes are for learning about the network. >>>>>>>>>>>>>> "Neighbors" are >>>>>>>>>>>>>> for >>>>>>>>>>>>>> constructing the coordinate. >>>>>>>>>>>>>> >>>>>>>>>>>>>> So, IIRC, the gossip process learns about nodes that >>>>>>>>>>>>>> may or >>>>>>>>>>>>>> may not >>>>>>>>>>>>>> be >>>>>>>>>>>>>> applied to the neighbor list. That is, we may eventually >>>>>>>>>>>>>> learn >>>>>>>>>>>>>> about a >>>>>>>>>>>>>> large portion of the network, via gossiping, but should >>>>>>>>>>>>>> only >>>>>>>>>>>>>> be using >>>>>>>>>>>>>> MAX_NEIGHBORS at any given time for coordinate >>>>>>>>>>>>>> computation. >>>>>>>>>>>>>> The >>>>>>>>>>>>>> reason >>>>>>>>>>>>>> we don't want to use a fixed set of neighbors for >>>>>>>>>>>>>> coordinate >>>>>>>>>>>>>> computation >>>>>>>>>>>>>> is because nodes may come and go from the system. >>>>>>>>>>>>>> However, >>>>>>>>>>>>>> it should >>>>>>>>>>>>>> not be the case that we just rotate through everybody, >>>>>>>>>>>>>> because >>>>>>>>>>>>>> then, as >>>>>>>>>>>>>> you note, we'd be computing against everyone's >>>>>>>>>>>>>> coordinate/ >>>>>>>>>>>>>> latency, >>>>>>>>>>>>>> and >>>>>>>>>>>>>> this is counter to the point of network coordinates. >>>>>>>>>>>>>> With this >>>>>>>>>>>>>> gradual >>>>>>>>>>>>>> transience as a desirable quality in mind, I am pretty >>>>>>>>>>>>>> sure we >>>>>>>>>>>>>> maintain >>>>>>>>>>>>>> neighbors in a fairly fixed manner. Does this not seem >>>>>>>>>>>>>> to >>>>>>>>>>>>>> be the >>>>>>>>>>>>>> case >>>>>>>>>>>>>> to you? >>>>>>>>>>>>>> >>>>>>>>>>>>>> -jonathan >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Oct 29, 2008, at 9:20 PM, Eric Chan-Tin wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi Jonathan and Peter, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> First of all, just wanted to say that Pyxida has been >>>>>>>>>>>>>>> working >>>>>>>>>>>>>>> great on >>>>>>>>>>>>>>> Planetlab for the past 4 months or so. Getting good data >>>>>>>>>>>>>>> and >>>>>>>>>>>>>>> experiments >>>>>>>>>>>>>>> going well. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> However, I do have one question/comment. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Every Pyxida node contains a list of Neighbors (in >>>>>>>>>>>>>>> NCClient) and a >>>>>>>>>>>>>>> list >>>>>>>>>>>>>>> of Gossip nodes (they are called neighbors in NCManager >>>>>>>>>>>>>>> but >>>>>>>>>>>>>>> I will >>>>>>>>>>>>>>> call >>>>>>>>>>>>>>> them gossip for now so as not to confuse with the 'real' >>>>>>>>>>>>>>> neighbors >>>>>>>>>>>>>>> in >>>>>>>>>>>>>>> NCClient). Every 10 seconds, Pyxida will pick a random >>>>>>>>>>>>>>> node >>>>>>>>>>>>>>> (call >>>>>>>>>>>>>>> it A) >>>>>>>>>>>>>>> from Gossip and send a gossip request (basically asking >>>>>>>>>>>>>>> for >>>>>>>>>>>>>>> rtt, >>>>>>>>>>>>>>> coordinate, error). But Gossip contains all the nodes in >>>>>>>>>>>>>>> the >>>>>>>>>>>>>>> network. >>>>>>>>>>>>>>> The comment in the code said that was to avoid Pyxida >>>>>>>>>>>>>>> becoming >>>>>>>>>>>>>>> lonely >>>>>>>>>>>>>>> and no-one to gossip with, which makes sense. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Once A responds, Pyxida will add A to its Neighbors (if >>>>>>>>>>>>>>> Neighbors >>>>>>>>>>>>>>> does >>>>>>>>>>>>>>> not contain A) and remove the first element from the >>>>>>>>>>>>>>> Neighbors list. >>>>>>>>>>>>>>> This seems a little bit odd. Was there any particular >>>>>>>>>>>>>>> reasoning >>>>>>>>>>>>>>> behind >>>>>>>>>>>>>>> this? Because it would seem like you're getting the >>>>>>>>>>>>>>> coordinate from >>>>>>>>>>>>>>> every node in the network eventually. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks a lot, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -Eric >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Jonathan Ledlie wrote: >>>>>>>>>>>>>>>> Your inclination is correct. See forwarded message. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -jonathan >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Apr 9, 2008, at 1:53 PM, ext Eric Chan-Tin wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thank you for the clarification. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I've been reading through your published papers and >>>>>>>>>>>>>>>>> code >>>>>>>>>>>>>>>>> and I >>>>>>>>>>>>>>>>> must >>>>>>>>>>>>>>>>> say I learned a lot about how Pyxida works :) >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I still have a quick and easy (hopefully) question. >>>>>>>>>>>>>>>>> Each >>>>>>>>>>>>>>>>> Pyxida >>>>>>>>>>>>>>>>> node >>>>>>>>>>>>>>>>> has a small amount of neighbors and learn about other >>>>>>>>>>>>>>>>> nodes >>>>>>>>>>>>>>>>> through >>>>>>>>>>>>>>>>> other nodes. I am running on about 200 machines, and >>>>>>>>>>>>>>>>> it >>>>>>>>>>>>>>>>> seems >>>>>>>>>>>>>>>>> like all >>>>>>>>>>>>>>>>> the Pyxida nodes are adding each other to their >>>>>>>>>>>>>>>>> upNeighbors set. >>>>>>>>>>>>>>>>> NCClient.java has the MAX_NEIGHBORS set to 512. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I wanted to make sure that I am not misreading the >>>>>>>>>>>>>>>>> code, but >>>>>>>>>>>>>>>>> shouldn't >>>>>>>>>>>>>>>>> each Pyxida node keep a small number of neighbors in >>>>>>>>>>>>>>>>> its >>>>>>>>>>>>>>>>> upNeighbors >>>>>>>>>>>>>>>>> set. Else, it will be a lot of updates to send a >>>>>>>>>>>>>>>>> gossipMessage >>>>>>>>>>>>>>>>> to all >>>>>>>>>>>>>>>>> 200 neighbors, and it also means that a Pyxida node >>>>>>>>>>>>>>>>> has >>>>>>>>>>>>>>>>> the whole >>>>>>>>>>>>>>>>> network in its upNeighbors set - that seems wrong. I >>>>>>>>>>>>>>>>> am >>>>>>>>>>>>>>>>> inclined >>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>> change the value of MAX_NEIGHBORS but wanted to make >>>>>>>>>>>>>>>>> sure >>>>>>>>>>>>>>>>> that my >>>>>>>>>>>>>>>>> understanding is correct first. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thank you, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Eric >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> ------------------------------------------------------------------------- >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> This SF.Net email is sponsored by the Moblin Your Move >>>>>>>>>>>>>>> Developer's >>>>>>>>>>>>>>> challenge Build the coolest Linux based applications >>>>>>>>>>>>>>> with >>>>>>>>>>>>>>> Moblin >>>>>>>>>>>>>>> SDK & >>>>>>>>>>>>>>> win great prizes Grand prize is a trip for two to an >>>>>>>>>>>>>>> Open >>>>>>>>>>>>>>> Source >>>>>>>>>>>>>>> event >>>>>>>>>>>>>>> anywhere in the world >>>>>>>>>>>>>>> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>> Pyxida-users mailing >>>>>>>>>>>>>>> list Pyx...@li... >>>>>>>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/pyxida- >>>>>>>>>>>>>>> users >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> ------------------------------------------------------------------------- >>>>>>>>>>> This SF.Net email is sponsored by the Moblin Your Move >>>>>>>>>>> Developer's challenge Build the coolest Linux based >>>>>>>>>>> applications >>>>>>>>>>> with Moblin SDK & win great prizes Grand prize is a trip >>>>>>>>>>> for two >>>>>>>>>>> to an Open Source event anywhere in the world >>>>>>>>>>> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >>>>>>>>>>> _______________________________________________ Pyxida-users >>>>>>>>>>> mailing list Pyx...@li... >>>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/pyxida-users >>>>>>>>> >>>>>>>>> >>>>>>>>> ------------------------------------------------------------------------------ >>>>>>>>> SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las >>>>>>>>> Vegas, Nevada. The future of the web can't happen without you. >>>>>>>>> Join us at MIX09 to help pave the way to the Next Web now. >>>>>>>>> Learn >>>>>>>>> more and register at >>>>>>>>> http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/ >>>>>>>>> _______________________________________________ >>>>>>>>> Pyxida-users mailing list Pyx...@li... >>>>>>>>> https://lists.sourceforge.net/lists/listinfo/pyxida-users >>>>>>>> >>>>>>>> >>>>>> >> > > > ------------------------------------------------------------------------------ > SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, > Nevada. > The future of the web can't happen without you. Join us at MIX09 to > help > pave the way to the Next Web now. Learn more and register at > http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/ > _______________________________________________ > Pyxida-users mailing list > Pyx...@li... > https://lists.sourceforge.net/lists/listinfo/pyxida-users |
From: Jonathan L. <jon...@ee...> - 2008-12-18 20:05:27
|
Eric, Do you have a PL Pyxida slice that I can connect to to try this out? That is, I'll run it on my machine, but connect to your slice via the right ports. -jonathan On Dec 18, 2008, at 11:35 AM, Eric Chan-Tin wrote: > Sorry for the delay. Was busy with final exams and projects. > > I made the changes. They can be found at > > http://www-users.cs.umn.edu/~dchantin/NCClient.java > http://www-users.cs.umn.edu/~dchantin/NCManager.java > http://www-users.cs.umn.edu/~dchantin/RemoteState.java > > All the changes are enclosed within > //ERIC START > ... > //ERIC END > > I compared every reported "neighbors list" of each Pyxida node > (about 370 > total) and see how many neighbors did not change within 10 minutes. > Then > did the average for all the nodes. > > Original: 43.37% of the neighbors stayed the same on average. > > Fix0 (before we found the interesting situation): 74.73% > > Fix1 (with the weight probability of picking a neighbor to ping): > 95.79% > > Fix2 (with the weight probability and the MIN_UPDATE_PING): 98.81% > > Also, the reported error, rrl, etc.. were much lower in the Fixed > Pyxida > than in the original one (this could not be significant, since the > Original > experiment was performed 1-2 months ago), but as expected, it was not > higher. > > You can take a look at the code and test it if you want. If everything > looks fine, I'll remove some of the debugging code and other "useless" > comments (like //ERIC). Also, let me know which "fix" you prefer. I > will > then email you the updated code. > > Thanks, > > -Eric > > > On Dec 10 2008, Jonathan Ledlie wrote: > >> >> Seems reasonable. >> >> Both ideas (lastPingTime and a weighted probability based on age) >> seem >> to be independent. >> >> Some of the code is more CPU intensive than it should be (e.g., >> looping through potentially long lists), and that is a slight >> downside >> to the function approach vs a set, but this probably isn't a big >> deal. >> >> -jonathan >> >> On Dec 10, 2008, at 10:40 PM, Eric Chan-Tin wrote: >> >>> okay. I think I might have misunderstood you. Your idea of >>> lastPingTime >>> looks interesting. >>> >>> My idea is to introduce a probability function. Right now, it just >>> randomly picks one of the neighbors. What I am thinking it might >>> do is >>> that instead of just randomly picking a neighbor, pick a neighbor >>> based >>> on some function. So, the neighbor whose lastUpdateTime is oldest >>> will >>> have a higher chance of being picked than a neighbor who was just >>> updated. >>> >>> Does that make more sense? >>> >>> -Eric >>> >>> Jonathan Ledlie wrote: >>>> >>>> Hmm. The advantage of lastPingTime is that you learn whether or >>>> not >>>> you've tried to contact a neighbor and they haven't responded. >>>> Then you >>>> can use that knowledge to determine whether or not to toss them. >>>> >>>> Maybe I'm not understanding your suggestion then. >>>> >>>> -jonathan >>>> >>>> On Dec 10, 2008, at 12:51 PM, Eric Chan-Tin wrote: >>>> >>>>> That's a good idea. Here is a little twist on it. >>>>> >>>>> Put a "weight" on each neighbor, based on the lastUpdateTime. So >>>>> if >>>>> currentTime - neighbor's lastUpdateTime, then the weight is >>>>> higher. >>>>> Then in getNeighborToPing, neighbor's with higher weight have a >>>>> higher >>>>> probability of getting picked than neighbor's with lower weight. >>>>> >>>>> The weight could be ( (currentTime - lastUpdateTime) / 15 >>>>> minutes). >>>>> It's not 10 minutes so as to avoid a probability of 1 if the >>>>> lastUpdateTime was the previous performMaintenance. This should >>>>> also >>>>> avoid the problem of always pinging the oldest neighbor since it's >>>>> about probability (weight). >>>>> >>>>> It seems a little bit less complicated than having lastPingTime >>>>> and >>>>> MIN_UPDATE_TIME. >>>>> >>>>> -Eric >>>>> >>>>> On Dec 10 2008, Jonathan Ledlie wrote: >>>>> >>>>>> >>>>>> This is an interesting situation. Let's think about the goal. >>>>>> The >>>>>> goal is to keep pinging the same nodes as long as they are alive, >>>>>> and to replace a node soon after it dies, keeping the total >>>>>> number >>>>>> of up neighbors at approximately MAX_NEIGHBORS. >>>>>> >>>>>> How about the following: >>>>>> - when you pick a neighbor to ping, you set a time on that event >>>>>> (in >>>>>> a new per-neighbor timestamp LastPingTime) >>>>>> - when you first insert a neighbor, set this timestamp to 0. >>>>>> - when you loop in performMaintenance, remove nodes whose >>>>>> LastPingTime - LastUpdateTime > threshold (like 10 minutes) >>>>>> >>>>>> - when picking a node, pick the guy with the oldest LastPingTime >>>>>> (not sure about this one -- I don't like situations where you >>>>>> pick >>>>>> the absolute *oldest* because then we could get in a situation >>>>>> where >>>>>> an old guy sits there, waiting to be kicked out, and we keep >>>>>> picking >>>>>> him to be pinged; I'd prefer to pick randomly, or random older >>>>>> guys). >>>>>> - a simple way around this is to simply have a self-adjusting >>>>>> threshold: so, we walk through the nodes, and everyone whose >>>>>> LastPingTime is older than MIN_UPDATE_TIME_TO_PING ago (a >>>>>> relative >>>>>> time) get added to a set; we pick a guy randomly from the set to >>>>>> ping. If the set is empty, we increase MIN_UPDATE_TIME_TO_PING, >>>>>> and >>>>>> wait for the next time the function is called. If the set is too >>>>>> large (relative to a fraction of MAX_NEIGHBORS), we decrease it. >>>>>> >>>>>> We still might get some churn from lost UDP packets. You might >>>>>> also >>>>>> give nodes a second chance to be pinged by including >>>>>> LastPingTime - >>>>>> LastUpdateTime > 1 minute in the set. Note that increases and >>>>>> decreases in MIN_UPDATE_TIME_TO_PING shouldn't be affected by the >>>>>> number of nodes that fall into this category. >>>>>> >>>>>> We still probably need a way for new potential neighbors to be >>>>>> rejected if our current set of neighbors is fine. Because I >>>>>> still >>>>>> want addNeigbor to really add the neighbor, how about putting a >>>>>> check before calling that to see if we need more neighbors. >>>>>> >>>>>> What thinks? >>>>>> >>>>>> -jonathan >>>>>> >>>>>> On Dec 9, 2008, at 11:37 PM, ext Eric Chan-Tin wrote: >>>>>> >>>>>>> Peter, >>>>>>> >>>>>>> Thanks. That fixed it. >>>>>>> >>>>>>> However, here is another problem I found while deploying Pyxida >>>>>>> and >>>>>>> running them for a while. >>>>>>> >>>>>>> In performMaintenance(), I loop through every neighbor and if >>>>>>> (neighbor.getLastUpdateTime() < current_time - 10 minutes), >>>>>>> then I >>>>>>> remove that neighbor. >>>>>> >>>>>> >>>>>>> >>>>>>> >>>>>>> In NCManager, I call getNeighborToPing, which randomly picks one >>>>>>> of >>>>>>> the >>>>>>> neighbors (32 in this case). Pinging is performed every 10 >>>>>>> seconds, so >>>>>>> in 10 minutes, you can ping 60 nodes. However, if a neighbor A >>>>>>> is not >>>>>>> among those 60 nodes, it will get kicked out in >>>>>>> performMaintenance(), >>>>>>> although A is still alive (it just hasn't been contacted yet). >>>>>>> >>>>>>> There are some possible options. >>>>>>> >>>>>>> 1) Increase the timeout period from 10 minutes to ?? minutes. >>>>>>> This >>>>>>> doesn't quite work because say in Azureus, the number of >>>>>>> neighbors is >>>>>>> 512, so the timeout needs to be increased even more... It could >>>>>>> be a >>>>>>> function of MAX_NEIGHBORS... >>>>>>> >>>>>>> 2) When Pyxida replies to a ping request, it will update the >>>>>>> neighbor's >>>>>>> lastUpdateTime. This does not quite work either because it >>>>>>> assumes a >>>>>>> symmetric relationship which is most likely not the case. >>>>>>> >>>>>>> 3) Introduce some more logic to Pyxida so it remembers what >>>>>>> nodes it >>>>>>> contacted and does not send, say, more than 2 requests in 10 >>>>>>> minutes to >>>>>>> the same node. This solution isn't that great either. >>>>>>> >>>>>>> Let me know what you guys think. I am currently leaning towards >>>>>>> option >>>>>>> 1) but make the timeout period a function of MAX_NEIGHBORS, >>>>>>> although I >>>>>>> haven't quite found the perfect function yet. >>>>>>> >>>>>>> Here are some preliminary results. I compared the number of >>>>>>> neighbors >>>>>>> which are the same at each reported time (10 minutes). Before >>>>>>> the fix, >>>>>>> more than 50% of the neighbors changed within 10 minutes. After >>>>>>> the >>>>>>> fix, >>>>>>> only about 30% of the neighbors changed within 10 minutes. It is >>>>>>> more >>>>>>> stable but I think 30% is still quite a big number. >>>>>>> >>>>>>> Sorry for the long email :) >>>>>>> >>>>>>> -Eric >>>>>>> >>>>>>> Peter Pietzuch wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> Eric Chan-Tin wrote: >>>>>>>>> for (RemoteState<T> neighbor : neighbors) { //this is line >>>>>>>>> 1029 >>>>>>>>> if (neighbor.getLastUpdateTime() < (curr_time - >>>>>>>>> MAINTENANCE_PERIOD)) { >>>>>>>>> neighbors.remove(neighbor); >>>>>>>>> } >>>>>>>>> } >>>>>>>> >>>>>>>> IIRC you can't modify a collection while iterating over it >>>>>>>> using the >>>>>>>> above for loop. Instead you have to use a proper >>>>>>>> Iterator<RemoteState<T>> and use it's remove method. >>>>>>>> >>>>>>>> Cheers, >>>>>>>> Peter >>>>>>>> >>>>>>>>> Jonathan Ledlie wrote: >>>>>>>>>> On Nov 3, 2008, at 3:46 PM, Eric Chan-Tin wrote: >>>>>>>>>> >>>>>>>>>>> Jonathan, >>>>>>>>>>> >>>>>>>>>>> I can insert some code to fix this and it doesn't look too >>>>>>>>>>> hard >>>>>>>>>>> but I do >>>>>>>>>>> have some concerns. >>>>>>>>>>> >>>>>>>>>>>> I don't remember observing this behavior in our PL >>>>>>>>>>>> deployment, but that could have been because we had the >>>>>>>>>>>> ping >>>>>>>>>>>> time set >>>>>>>>>>>> to 1-2 minutes, IIRC. >>>>>>>>>>> The default ping time was set to 10 seconds + some Gaussian >>>>>>>>>>> distribution, >>>>>>>>>>> so that's what I used, but changing the ping time to 1 >>>>>>>>>>> minute >>>>>>>>>>> or more >>>>>>>>>>> will >>>>>>>>>>> result in the same effect except it will take longer for the >>>>>>>>>>> "cycle" to >>>>>>>>>>> occur. >>>>>>>>>>> >>>>>>>>>>> What I have in mind was the following: >>>>>>>>>>> >>>>>>>>>>> 1) In NCClient.addNeighbor() would be changed to (in >>>>>>>>>>> pseudocode) >>>>>>>>>>> protected boolean addNeighbor(RemoteState<T> guy) { >>>>>>>>>>> boolean added = false; >>>>>>>>>>> //if neighbors.size() >= MAX_NEIGHBORS >>>>>>>>>>> //Check last update time of each neighbor in >>>>>>>>>>> neighbors >>>>>>>>>>> //neighbor.getLastUpdateTime() >>>>>>>>>>> //if any of them is older than some number >>>>>>>>>>> TIMEOUT, >>>>>>>>>>> remove the oldest one, then add "guy" to neighbors >>>>>>>>>>> //else don't add "guy" to neighbors >>>>>>>>>>> //else add "guy" to neighbors >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>> IMO this shouldn't be the behavior of this function because >>>>>>>>>> it >>>>>>>>>> is called >>>>>>>>>> "addNeighbor." Instead what could happen is that part of the >>>>>>>>>> performMaintenance() function could remove old neighbors. >>>>>>>>>> >>>>>>>>>>> 2) In NCManager, getGossipNode() would just pick a random >>>>>>>>>>> node >>>>>>>>>>> from >>>>>>>>>>> localNC.primaryNC.getNeighbors() to send a request to >>>>>>>>>>> every 10 >>>>>>>>>>> seconds. >>>>>>>>>>> >>>>>>>>>>> 3) Pyxida will still keep track of all GossipNodes in the >>>>>>>>>>> network. It >>>>>>>>>>> could >>>>>>>>>>> also (say every 10 minutes) check its NCClient.neighbors >>>>>>>>>>> list >>>>>>>>>>> and see if >>>>>>>>>>> any neighbor is older than TIMEOUT, remove it, then pick a >>>>>>>>>>> random >>>>>>>>>>> GossipNode to send a ping request. >>>>>>>>>>> >>>>>>>>>>> The main problem with this scheme is that one Pyxida node >>>>>>>>>>> would >>>>>>>>>>> most >>>>>>>>>>> likely >>>>>>>>>>> stay with the same set of neighbors all the time. Its >>>>>>>>>>> coordinates >>>>>>>>>>> would be >>>>>>>>>>> greatly affected by those 32 nodes (I am assuming >>>>>>>>>>> MAX_NEIGHBORS >>>>>>>>>>> = 32). >>>>>>>>>>> >>>>>>>>>> Instead of (2), why not have the getNeighborToPing function >>>>>>>>>> determine >>>>>>>>>> the behavior? If it returns a node, then ping it. If it >>>>>>>>>> returns null, >>>>>>>>>> then have the NCManager pick some other node to use? If >>>>>>>>>> things are >>>>>>>>>> hunky-dory, getNeighborToPing() would return a node (a >>>>>>>>>> current >>>>>>>>>> member of >>>>>>>>>> the neighbor set), if we're short on neighbors, then it would >>>>>>>>>> return >>>>>>>>>> null. The periodic maintenance would have already tossed old >>>>>>>>>> nodes, so >>>>>>>>>> this would be self-sustaining. >>>>>>>>>> >>>>>>>>>> The "problem" you mention is fairly central to network >>>>>>>>>> coordinates in >>>>>>>>>> general. Sure the coordinates will be a little better the >>>>>>>>>> more >>>>>>>>>> neighbors you have. But in a very large system talking to >>>>>>>>>> everybody >>>>>>>>>> isn't feasible. Peter and I have a graph in one of the >>>>>>>>>> workshop >>>>>>>>>> papers >>>>>>>>>> which shows the effect of increasing the number of neighbors >>>>>>>>>> >>>>>>>>>>> I don't think the network would be partitioned, but it could >>>>>>>>>>> happen. For >>>>>>>>>>> example, the boostrap nodes have each other in their >>>>>>>>>>> neighbors >>>>>>>>>>> list. >>>>>>>>>>> Every >>>>>>>>>>> new node bootstrapping to the network would contact those >>>>>>>>>>> bootstrapping >>>>>>>>>>> nodes first and add them to their neighbors. So every Pyxida >>>>>>>>>>> node could >>>>>>>>>>> have as its neighbors the "Bootstrap nodes", which would >>>>>>>>>>> make >>>>>>>>>>> Pyxida >>>>>>>>>>> become >>>>>>>>>>> a centralized system. >>>>>>>>>>> >>>>>>>>>> As long as the initial bootstrap list is large enough, I >>>>>>>>>> think >>>>>>>>>> you are >>>>>>>>>> right, that partitions won't be a problem. >>>>>>>>>> >>>>>>>>>>> One last thing: When Pyxida receives a ping request from a >>>>>>>>>>> node, it does >>>>>>>>>>> not update its coordinates. Is there any particular reason >>>>>>>>>>> for >>>>>>>>>>> that? >>>>>>>>>>> Is it >>>>>>>>>>> because Pyxida will not be able to calculate the RTT? >>>>>>>>>>> >>>>>>>>>> Yes. >>>>>>>>>> >>>>>>>>>>> Also, there is no "gossiping" of other nodes' coordinates, >>>>>>>>>>> right? So if A >>>>>>>>>>> has B in its neighbors list, it will from time to time tell >>>>>>>>>>> other nodes >>>>>>>>>>> about B's coordinates. I realize this has a lot of >>>>>>>>>>> complications but the >>>>>>>>>>> original Vivaldi paper was very fuzzy about this point and >>>>>>>>>>> the >>>>>>>>>>> previous >>>>>>>>>>> point that it seems it is up to the implementors to decide >>>>>>>>>>> what >>>>>>>>>>> to do. >>>>>>>>>>> >>>>>>>>>> I don't think we are doing this. It could be useful in >>>>>>>>>> certain >>>>>>>>>> circumstance (e.g., picking potential new neighbors), but >>>>>>>>>> we're >>>>>>>>>> not, IIRC. >>>>>>>>>> >>>>>>>>>>> I will have to think some more about this as there are a lot >>>>>>>>>>> of >>>>>>>>>>> nasty >>>>>>>>>>> little side effects that could happen, but these are my >>>>>>>>>>> thoughts. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> >>>>>>>>>>> -Eric >>>>>>>>>>> >>>>>>>>>> Sounds good. >>>>>>>>>> >>>>>>>>>> -jonathan >>>>>>>>>> >>>>>>>>>>> On Nov 3 2008, Jonathan Ledlie wrote: >>>>>>>>>>> >>>>>>>>>>>> Eric, >>>>>>>>>>>> >>>>>>>>>>>> Based on a look at the code just now, I see where you are >>>>>>>>>>>> coming from: >>>>>>>>>>>> with these params on PL, there isn't anything stopping the >>>>>>>>>>>> neighbor >>>>>>>>>>>> list from circulating pretty quickly through all nodes. In >>>>>>>>>>>> the >>>>>>>>>>>> Azureus implementation, the guys fed to the alg (fed to >>>>>>>>>>>> processSample) >>>>>>>>>>>> weren't changing much because they were coming from the >>>>>>>>>>>> routing finger >>>>>>>>>>>> tables. I don't remember observing this behavior in our PL >>>>>>>>>>>> deployment, but that could have been because we had the >>>>>>>>>>>> ping >>>>>>>>>>>> time set >>>>>>>>>>>> to 1-2 minutes, IIRC. >>>>>>>>>>>> >>>>>>>>>>>> Do you want to insert some code that will fix this? >>>>>>>>>>>> Basically >>>>>>>>>>>> we want >>>>>>>>>>>> to only add a new neighbor if our oldest neighbor should be >>>>>>>>>>>> tossed (or >>>>>>>>>>>> if we haven't reached MAX_NEIGHBORS). It's probably best >>>>>>>>>>>> from a >>>>>>>>>>>> clarity standpoint if addNeighbor does continue to actually >>>>>>>>>>>> add the >>>>>>>>>>>> neighbor in all cases. Probably the best thing to do is to >>>>>>>>>>>> have >>>>>>>>>>>> NCClient drop old neighbors internally and have NCManager >>>>>>>>>>>> ask >>>>>>>>>>>> it which >>>>>>>>>>>> nodes should be pinged via getNeighborToPing. So NCClient >>>>>>>>>>>> would >>>>>>>>>>>> continue to have the same behavior on processSample (it >>>>>>>>>>>> would not >>>>>>>>>>>> reject new information) and NCManager and NCClient would >>>>>>>>>>>> cooperate to >>>>>>>>>>>> measure RTTs to current neighbors. We do want to make >>>>>>>>>>>> sure, >>>>>>>>>>>> however, >>>>>>>>>>>> that the gossip set stays large enough so that partitions >>>>>>>>>>>> don't occur. >>>>>>>>>>>> >>>>>>>>>>>> Do you want to be added to the SourceForge Pyxida >>>>>>>>>>>> developers >>>>>>>>>>>> to make >>>>>>>>>>>> this change? (Please test it out carefully first.) >>>>>>>>>>>> >>>>>>>>>>>> -jonathan >>>>>>>>>>>> >>>>>>>>>>>> On Oct 30, 2008, at 11:32 AM, ext Eric Chan-Tin wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Jonathan, >>>>>>>>>>>>> >>>>>>>>>>>>> From what I saw, eventually, you learn about all the nodes >>>>>>>>>>>>> in >>>>>>>>>>>>> the >>>>>>>>>>>>> network. Every 10 seconds, Pyxida picks a random "Gossip" >>>>>>>>>>>>> node from >>>>>>>>>>>>> its >>>>>>>>>>>>> list and sends a request. The Gossip node will reply back >>>>>>>>>>>>> with an >>>>>>>>>>>>> "upNeighbor", that is, a "Gossip" node that is "up" or >>>>>>>>>>>>> with 10% >>>>>>>>>>>>> probability, a node that is "pending". Since naturally, >>>>>>>>>>>>> all >>>>>>>>>>>>> the nodes >>>>>>>>>>>>> bootstrap off the same set of nodes, every node will >>>>>>>>>>>>> eventually know >>>>>>>>>>>>> of >>>>>>>>>>>>> every other node. >>>>>>>>>>>>> >>>>>>>>>>>>> Pyxida might not be rotating through everybody >>>>>>>>>>>>> continuously >>>>>>>>>>>>> (although >>>>>>>>>>>>> eventually, since it knows everybody, it will have added >>>>>>>>>>>>> everybody to >>>>>>>>>>>>> its Neighbors list at some point), but could be removing a >>>>>>>>>>>>> node that >>>>>>>>>>>>> is >>>>>>>>>>>>> "alive". So, if MAX_NEIGHBORS = 32, let's call them A_1, >>>>>>>>>>>>> A_2, >>>>>>>>>>>>> ..., >>>>>>>>>>>>> A_32. >>>>>>>>>>>>> Pyxida picks a random Gossip node B, sends it a request >>>>>>>>>>>>> and B >>>>>>>>>>>>> replies >>>>>>>>>>>>> back. Pyxida will add B to its Neighbors list, see that >>>>>>>>>>>>> size of >>>>>>>>>>>>> Neighbors > MAX_NEIGHBORS, kick A_1 out. Then from those >>>>>>>>>>>>> 32 >>>>>>>>>>>>> neighbors >>>>>>>>>>>>> A_2, ...., A_32, B, it will calculate the force, and apply >>>>>>>>>>>>> it >>>>>>>>>>>>> to the >>>>>>>>>>>>> current system coordinates. Thus, if every 10 seconds, >>>>>>>>>>>>> Pyxida >>>>>>>>>>>>> picks a >>>>>>>>>>>>> different set of nodes C, D, E, etc... it would keep >>>>>>>>>>>>> kicking >>>>>>>>>>>>> nodes out >>>>>>>>>>>>> of the neighbors list. >>>>>>>>>>>>> >>>>>>>>>>>>> It's the "kicking A_1 out" part that seemed a bit >>>>>>>>>>>>> confusing. >>>>>>>>>>>>> Churn >>>>>>>>>>>>> is a >>>>>>>>>>>>> good reason but you don't check that A_1 might still be >>>>>>>>>>>>> alive >>>>>>>>>>>>> or maybe >>>>>>>>>>>>> A_1 was just added. I have been printing the neighbors >>>>>>>>>>>>> list >>>>>>>>>>>>> every 10 >>>>>>>>>>>>> minutes, and they are completely different between those >>>>>>>>>>>>> 10 >>>>>>>>>>>>> minutes. >>>>>>>>>>>>> >>>>>>>>>>>>> One possible problem might be that I am choosing the value >>>>>>>>>>>>> of >>>>>>>>>>>>> MAX_NEIGHBORS to be too small. Originally it was set to >>>>>>>>>>>>> 512 >>>>>>>>>>>>> (which was >>>>>>>>>>>>> meant for Azureus). Since I am using Planetlab where there >>>>>>>>>>>>> are about >>>>>>>>>>>>> 400-500 "stable" nodes, I changed the value to 32. But I >>>>>>>>>>>>> don't think >>>>>>>>>>>>> that should be a problem since log(n) = log(512) = 9. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> >>>>>>>>>>>>> -Eric >>>>>>>>>>>>> >>>>>>>>>>>>> Jonathan Ledlie wrote: >>>>>>>>>>>>>> Eric, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Glad to hear it. Let Peter and me know if you publish >>>>>>>>>>>>>> anything using >>>>>>>>>>>>>> Pyxida and we can put it on the web page. >>>>>>>>>>>>>> >>>>>>>>>>>>>> "Gossip" nodes are for learning about the network. >>>>>>>>>>>>>> "Neighbors" are >>>>>>>>>>>>>> for >>>>>>>>>>>>>> constructing the coordinate. >>>>>>>>>>>>>> >>>>>>>>>>>>>> So, IIRC, the gossip process learns about nodes that >>>>>>>>>>>>>> may or >>>>>>>>>>>>>> may not >>>>>>>>>>>>>> be >>>>>>>>>>>>>> applied to the neighbor list. That is, we may eventually >>>>>>>>>>>>>> learn >>>>>>>>>>>>>> about a >>>>>>>>>>>>>> large portion of the network, via gossiping, but should >>>>>>>>>>>>>> only >>>>>>>>>>>>>> be using >>>>>>>>>>>>>> MAX_NEIGHBORS at any given time for coordinate >>>>>>>>>>>>>> computation. >>>>>>>>>>>>>> The >>>>>>>>>>>>>> reason >>>>>>>>>>>>>> we don't want to use a fixed set of neighbors for >>>>>>>>>>>>>> coordinate >>>>>>>>>>>>>> computation >>>>>>>>>>>>>> is because nodes may come and go from the system. >>>>>>>>>>>>>> However, >>>>>>>>>>>>>> it should >>>>>>>>>>>>>> not be the case that we just rotate through everybody, >>>>>>>>>>>>>> because >>>>>>>>>>>>>> then, as >>>>>>>>>>>>>> you note, we'd be computing against everyone's >>>>>>>>>>>>>> coordinate/ >>>>>>>>>>>>>> latency, >>>>>>>>>>>>>> and >>>>>>>>>>>>>> this is counter to the point of network coordinates. >>>>>>>>>>>>>> With this >>>>>>>>>>>>>> gradual >>>>>>>>>>>>>> transience as a desirable quality in mind, I am pretty >>>>>>>>>>>>>> sure we >>>>>>>>>>>>>> maintain >>>>>>>>>>>>>> neighbors in a fairly fixed manner. Does this not seem >>>>>>>>>>>>>> to >>>>>>>>>>>>>> be the >>>>>>>>>>>>>> case >>>>>>>>>>>>>> to you? >>>>>>>>>>>>>> >>>>>>>>>>>>>> -jonathan >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Oct 29, 2008, at 9:20 PM, Eric Chan-Tin wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi Jonathan and Peter, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> First of all, just wanted to say that Pyxida has been >>>>>>>>>>>>>>> working >>>>>>>>>>>>>>> great on >>>>>>>>>>>>>>> Planetlab for the past 4 months or so. Getting good data >>>>>>>>>>>>>>> and >>>>>>>>>>>>>>> experiments >>>>>>>>>>>>>>> going well. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> However, I do have one question/comment. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Every Pyxida node contains a list of Neighbors (in >>>>>>>>>>>>>>> NCClient) and a >>>>>>>>>>>>>>> list >>>>>>>>>>>>>>> of Gossip nodes (they are called neighbors in NCManager >>>>>>>>>>>>>>> but >>>>>>>>>>>>>>> I will >>>>>>>>>>>>>>> call >>>>>>>>>>>>>>> them gossip for now so as not to confuse with the 'real' >>>>>>>>>>>>>>> neighbors >>>>>>>>>>>>>>> in >>>>>>>>>>>>>>> NCClient). Every 10 seconds, Pyxida will pick a random >>>>>>>>>>>>>>> node >>>>>>>>>>>>>>> (call >>>>>>>>>>>>>>> it A) >>>>>>>>>>>>>>> from Gossip and send a gossip request (basically asking >>>>>>>>>>>>>>> for >>>>>>>>>>>>>>> rtt, >>>>>>>>>>>>>>> coordinate, error). But Gossip contains all the nodes in >>>>>>>>>>>>>>> the >>>>>>>>>>>>>>> network. >>>>>>>>>>>>>>> The comment in the code said that was to avoid Pyxida >>>>>>>>>>>>>>> becoming >>>>>>>>>>>>>>> lonely >>>>>>>>>>>>>>> and no-one to gossip with, which makes sense. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Once A responds, Pyxida will add A to its Neighbors (if >>>>>>>>>>>>>>> Neighbors >>>>>>>>>>>>>>> does >>>>>>>>>>>>>>> not contain A) and remove the first element from the >>>>>>>>>>>>>>> Neighbors list. >>>>>>>>>>>>>>> This seems a little bit odd. Was there any particular >>>>>>>>>>>>>>> reasoning >>>>>>>>>>>>>>> behind >>>>>>>>>>>>>>> this? Because it would seem like you're getting the >>>>>>>>>>>>>>> coordinate from >>>>>>>>>>>>>>> every node in the network eventually. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks a lot, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -Eric >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Jonathan Ledlie wrote: >>>>>>>>>>>>>>>> Your inclination is correct. See forwarded message. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -jonathan >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Apr 9, 2008, at 1:53 PM, ext Eric Chan-Tin wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thank you for the clarification. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I've been reading through your published papers and >>>>>>>>>>>>>>>>> code >>>>>>>>>>>>>>>>> and I >>>>>>>>>>>>>>>>> must >>>>>>>>>>>>>>>>> say I learned a lot about how Pyxida works :) >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I still have a quick and easy (hopefully) question. >>>>>>>>>>>>>>>>> Each >>>>>>>>>>>>>>>>> Pyxida >>>>>>>>>>>>>>>>> node >>>>>>>>>>>>>>>>> has a small amount of neighbors and learn about other >>>>>>>>>>>>>>>>> nodes >>>>>>>>>>>>>>>>> through >>>>>>>>>>>>>>>>> other nodes. I am running on about 200 machines, and >>>>>>>>>>>>>>>>> it >>>>>>>>>>>>>>>>> seems >>>>>>>>>>>>>>>>> like all >>>>>>>>>>>>>>>>> the Pyxida nodes are adding each other to their >>>>>>>>>>>>>>>>> upNeighbors set. >>>>>>>>>>>>>>>>> NCClient.java has the MAX_NEIGHBORS set to 512. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I wanted to make sure that I am not misreading the >>>>>>>>>>>>>>>>> code, but >>>>>>>>>>>>>>>>> shouldn't >>>>>>>>>>>>>>>>> each Pyxida node keep a small number of neighbors in >>>>>>>>>>>>>>>>> its >>>>>>>>>>>>>>>>> upNeighbors >>>>>>>>>>>>>>>>> set. Else, it will be a lot of updates to send a >>>>>>>>>>>>>>>>> gossipMessage >>>>>>>>>>>>>>>>> to all >>>>>>>>>>>>>>>>> 200 neighbors, and it also means that a Pyxida node >>>>>>>>>>>>>>>>> has >>>>>>>>>>>>>>>>> the whole >>>>>>>>>>>>>>>>> network in its upNeighbors set - that seems wrong. I >>>>>>>>>>>>>>>>> am >>>>>>>>>>>>>>>>> inclined >>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>> change the value of MAX_NEIGHBORS but wanted to make >>>>>>>>>>>>>>>>> sure >>>>>>>>>>>>>>>>> that my >>>>>>>>>>>>>>>>> understanding is correct first. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thank you, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Eric >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> ------------------------------------------------------------------------- >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> This SF.Net email is sponsored by the Moblin Your Move >>>>>>>>>>>>>>> Developer's >>>>>>>>>>>>>>> challenge Build the coolest Linux based applications >>>>>>>>>>>>>>> with >>>>>>>>>>>>>>> Moblin >>>>>>>>>>>>>>> SDK & >>>>>>>>>>>>>>> win great prizes Grand prize is a trip for two to an >>>>>>>>>>>>>>> Open >>>>>>>>>>>>>>> Source >>>>>>>>>>>>>>> event >>>>>>>>>>>>>>> anywhere in the world >>>>>>>>>>>>>>> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>> Pyxida-users mailing >>>>>>>>>>>>>>> list Pyx...@li... >>>>>>>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/pyxida- >>>>>>>>>>>>>>> users >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> ------------------------------------------------------------------------- >>>>>>>>>>> This SF.Net email is sponsored by the Moblin Your Move >>>>>>>>>>> Developer's challenge Build the coolest Linux based >>>>>>>>>>> applications >>>>>>>>>>> with Moblin SDK & win great prizes Grand prize is a trip >>>>>>>>>>> for two >>>>>>>>>>> to an Open Source event anywhere in the world >>>>>>>>>>> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >>>>>>>>>>> _______________________________________________ Pyxida-users >>>>>>>>>>> mailing list Pyx...@li... >>>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/pyxida-users >>>>>>>>> >>>>>>>>> >>>>>>>>> ------------------------------------------------------------------------------ >>>>>>>>> SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las >>>>>>>>> Vegas, Nevada. The future of the web can't happen without you. >>>>>>>>> Join us at MIX09 to help pave the way to the Next Web now. >>>>>>>>> Learn >>>>>>>>> more and register at >>>>>>>>> http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/ >>>>>>>>> _______________________________________________ >>>>>>>>> Pyxida-users mailing list Pyx...@li... >>>>>>>>> https://lists.sourceforge.net/lists/listinfo/pyxida-users >>>>>>>> >>>>>>>> >>>>>> >> > > > ------------------------------------------------------------------------------ > SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, > Nevada. > The future of the web can't happen without you. Join us at MIX09 to > help > pave the way to the Next Web now. Learn more and register at > http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/ > _______________________________________________ > Pyxida-users mailing list > Pyx...@li... > https://lists.sourceforge.net/lists/listinfo/pyxida-users |
From: Eric Chan-T. <dch...@cs...> - 2008-12-18 16:35:20
|
Sorry for the delay. Was busy with final exams and projects. I made the changes. They can be found at http://www-users.cs.umn.edu/~dchantin/NCClient.java http://www-users.cs.umn.edu/~dchantin/NCManager.java http://www-users.cs.umn.edu/~dchantin/RemoteState.java All the changes are enclosed within //ERIC START ... //ERIC END I compared every reported "neighbors list" of each Pyxida node (about 370 total) and see how many neighbors did not change within 10 minutes. Then did the average for all the nodes. Original: 43.37% of the neighbors stayed the same on average. Fix0 (before we found the interesting situation): 74.73% Fix1 (with the weight probability of picking a neighbor to ping): 95.79% Fix2 (with the weight probability and the MIN_UPDATE_PING): 98.81% Also, the reported error, rrl, etc.. were much lower in the Fixed Pyxida than in the original one (this could not be significant, since the Original experiment was performed 1-2 months ago), but as expected, it was not higher. You can take a look at the code and test it if you want. If everything looks fine, I'll remove some of the debugging code and other "useless" comments (like //ERIC). Also, let me know which "fix" you prefer. I will then email you the updated code. Thanks, -Eric On Dec 10 2008, Jonathan Ledlie wrote: > >Seems reasonable. > >Both ideas (lastPingTime and a weighted probability based on age) seem >to be independent. > >Some of the code is more CPU intensive than it should be (e.g., >looping through potentially long lists), and that is a slight downside >to the function approach vs a set, but this probably isn't a big deal. > >-jonathan > >On Dec 10, 2008, at 10:40 PM, Eric Chan-Tin wrote: > >> okay. I think I might have misunderstood you. Your idea of >> lastPingTime >> looks interesting. >> >> My idea is to introduce a probability function. Right now, it just >> randomly picks one of the neighbors. What I am thinking it might do is >> that instead of just randomly picking a neighbor, pick a neighbor >> based >> on some function. So, the neighbor whose lastUpdateTime is oldest will >> have a higher chance of being picked than a neighbor who was just >> updated. >> >> Does that make more sense? >> >> -Eric >> >> Jonathan Ledlie wrote: >>> >>> Hmm. The advantage of lastPingTime is that you learn whether or not >>> you've tried to contact a neighbor and they haven't responded. >>> Then you >>> can use that knowledge to determine whether or not to toss them. >>> >>> Maybe I'm not understanding your suggestion then. >>> >>> -jonathan >>> >>> On Dec 10, 2008, at 12:51 PM, Eric Chan-Tin wrote: >>> >>>> That's a good idea. Here is a little twist on it. >>>> >>>> Put a "weight" on each neighbor, based on the lastUpdateTime. So if >>>> currentTime - neighbor's lastUpdateTime, then the weight is higher. >>>> Then in getNeighborToPing, neighbor's with higher weight have a >>>> higher >>>> probability of getting picked than neighbor's with lower weight. >>>> >>>> The weight could be ( (currentTime - lastUpdateTime) / 15 minutes). >>>> It's not 10 minutes so as to avoid a probability of 1 if the >>>> lastUpdateTime was the previous performMaintenance. This should also >>>> avoid the problem of always pinging the oldest neighbor since it's >>>> about probability (weight). >>>> >>>> It seems a little bit less complicated than having lastPingTime and >>>> MIN_UPDATE_TIME. >>>> >>>> -Eric >>>> >>>> On Dec 10 2008, Jonathan Ledlie wrote: >>>> >>>>> >>>>> This is an interesting situation. Let's think about the goal. The >>>>> goal is to keep pinging the same nodes as long as they are alive, >>>>> and to replace a node soon after it dies, keeping the total number >>>>> of up neighbors at approximately MAX_NEIGHBORS. >>>>> >>>>> How about the following: >>>>> - when you pick a neighbor to ping, you set a time on that event >>>>> (in >>>>> a new per-neighbor timestamp LastPingTime) >>>>> - when you first insert a neighbor, set this timestamp to 0. >>>>> - when you loop in performMaintenance, remove nodes whose >>>>> LastPingTime - LastUpdateTime > threshold (like 10 minutes) >>>>> >>>>> - when picking a node, pick the guy with the oldest LastPingTime >>>>> (not sure about this one -- I don't like situations where you pick >>>>> the absolute *oldest* because then we could get in a situation >>>>> where >>>>> an old guy sits there, waiting to be kicked out, and we keep >>>>> picking >>>>> him to be pinged; I'd prefer to pick randomly, or random older >>>>> guys). >>>>> - a simple way around this is to simply have a self-adjusting >>>>> threshold: so, we walk through the nodes, and everyone whose >>>>> LastPingTime is older than MIN_UPDATE_TIME_TO_PING ago (a relative >>>>> time) get added to a set; we pick a guy randomly from the set to >>>>> ping. If the set is empty, we increase MIN_UPDATE_TIME_TO_PING, >>>>> and >>>>> wait for the next time the function is called. If the set is too >>>>> large (relative to a fraction of MAX_NEIGHBORS), we decrease it. >>>>> >>>>> We still might get some churn from lost UDP packets. You might >>>>> also >>>>> give nodes a second chance to be pinged by including LastPingTime - >>>>> LastUpdateTime > 1 minute in the set. Note that increases and >>>>> decreases in MIN_UPDATE_TIME_TO_PING shouldn't be affected by the >>>>> number of nodes that fall into this category. >>>>> >>>>> We still probably need a way for new potential neighbors to be >>>>> rejected if our current set of neighbors is fine. Because I still >>>>> want addNeigbor to really add the neighbor, how about putting a >>>>> check before calling that to see if we need more neighbors. >>>>> >>>>> What thinks? >>>>> >>>>> -jonathan >>>>> >>>>> On Dec 9, 2008, at 11:37 PM, ext Eric Chan-Tin wrote: >>>>> >>>>>> Peter, >>>>>> >>>>>> Thanks. That fixed it. >>>>>> >>>>>> However, here is another problem I found while deploying Pyxida >>>>>> and >>>>>> running them for a while. >>>>>> >>>>>> In performMaintenance(), I loop through every neighbor and if >>>>>> (neighbor.getLastUpdateTime() < current_time - 10 minutes), then I >>>>>> remove that neighbor. >>>>> >>>>> >>>>>> >>>>>> >>>>>> In NCManager, I call getNeighborToPing, which randomly picks one >>>>>> of >>>>>> the >>>>>> neighbors (32 in this case). Pinging is performed every 10 >>>>>> seconds, so >>>>>> in 10 minutes, you can ping 60 nodes. However, if a neighbor A >>>>>> is not >>>>>> among those 60 nodes, it will get kicked out in >>>>>> performMaintenance(), >>>>>> although A is still alive (it just hasn't been contacted yet). >>>>>> >>>>>> There are some possible options. >>>>>> >>>>>> 1) Increase the timeout period from 10 minutes to ?? minutes. This >>>>>> doesn't quite work because say in Azureus, the number of >>>>>> neighbors is >>>>>> 512, so the timeout needs to be increased even more... It could >>>>>> be a >>>>>> function of MAX_NEIGHBORS... >>>>>> >>>>>> 2) When Pyxida replies to a ping request, it will update the >>>>>> neighbor's >>>>>> lastUpdateTime. This does not quite work either because it >>>>>> assumes a >>>>>> symmetric relationship which is most likely not the case. >>>>>> >>>>>> 3) Introduce some more logic to Pyxida so it remembers what >>>>>> nodes it >>>>>> contacted and does not send, say, more than 2 requests in 10 >>>>>> minutes to >>>>>> the same node. This solution isn't that great either. >>>>>> >>>>>> Let me know what you guys think. I am currently leaning towards >>>>>> option >>>>>> 1) but make the timeout period a function of MAX_NEIGHBORS, >>>>>> although I >>>>>> haven't quite found the perfect function yet. >>>>>> >>>>>> Here are some preliminary results. I compared the number of >>>>>> neighbors >>>>>> which are the same at each reported time (10 minutes). Before >>>>>> the fix, >>>>>> more than 50% of the neighbors changed within 10 minutes. After >>>>>> the >>>>>> fix, >>>>>> only about 30% of the neighbors changed within 10 minutes. It is >>>>>> more >>>>>> stable but I think 30% is still quite a big number. >>>>>> >>>>>> Sorry for the long email :) >>>>>> >>>>>> -Eric >>>>>> >>>>>> Peter Pietzuch wrote: >>>>>>> Hi, >>>>>>> >>>>>>> Eric Chan-Tin wrote: >>>>>>>> for (RemoteState<T> neighbor : neighbors) { //this is line 1029 >>>>>>>> if (neighbor.getLastUpdateTime() < (curr_time - >>>>>>>> MAINTENANCE_PERIOD)) { >>>>>>>> neighbors.remove(neighbor); >>>>>>>> } >>>>>>>> } >>>>>>> >>>>>>> IIRC you can't modify a collection while iterating over it >>>>>>> using the >>>>>>> above for loop. Instead you have to use a proper >>>>>>> Iterator<RemoteState<T>> and use it's remove method. >>>>>>> >>>>>>> Cheers, >>>>>>> Peter >>>>>>> >>>>>>>> Jonathan Ledlie wrote: >>>>>>>>> On Nov 3, 2008, at 3:46 PM, Eric Chan-Tin wrote: >>>>>>>>> >>>>>>>>>> Jonathan, >>>>>>>>>> >>>>>>>>>> I can insert some code to fix this and it doesn't look too >>>>>>>>>> hard >>>>>>>>>> but I do >>>>>>>>>> have some concerns. >>>>>>>>>> >>>>>>>>>>> I don't remember observing this behavior in our PL >>>>>>>>>>> deployment, but that could have been because we had the ping >>>>>>>>>>> time set >>>>>>>>>>> to 1-2 minutes, IIRC. >>>>>>>>>> The default ping time was set to 10 seconds + some Gaussian >>>>>>>>>> distribution, >>>>>>>>>> so that's what I used, but changing the ping time to 1 minute >>>>>>>>>> or more >>>>>>>>>> will >>>>>>>>>> result in the same effect except it will take longer for the >>>>>>>>>> "cycle" to >>>>>>>>>> occur. >>>>>>>>>> >>>>>>>>>> What I have in mind was the following: >>>>>>>>>> >>>>>>>>>> 1) In NCClient.addNeighbor() would be changed to (in >>>>>>>>>> pseudocode) >>>>>>>>>> protected boolean addNeighbor(RemoteState<T> guy) { >>>>>>>>>> boolean added = false; >>>>>>>>>> //if neighbors.size() >= MAX_NEIGHBORS >>>>>>>>>> //Check last update time of each neighbor in >>>>>>>>>> neighbors >>>>>>>>>> //neighbor.getLastUpdateTime() >>>>>>>>>> //if any of them is older than some number >>>>>>>>>> TIMEOUT, >>>>>>>>>> remove the oldest one, then add "guy" to neighbors >>>>>>>>>> //else don't add "guy" to neighbors >>>>>>>>>> //else add "guy" to neighbors >>>>>>>>>> } >>>>>>>>>> >>>>>>>>> IMO this shouldn't be the behavior of this function because it >>>>>>>>> is called >>>>>>>>> "addNeighbor." Instead what could happen is that part of the >>>>>>>>> performMaintenance() function could remove old neighbors. >>>>>>>>> >>>>>>>>>> 2) In NCManager, getGossipNode() would just pick a random node >>>>>>>>>> from >>>>>>>>>> localNC.primaryNC.getNeighbors() to send a request to every 10 >>>>>>>>>> seconds. >>>>>>>>>> >>>>>>>>>> 3) Pyxida will still keep track of all GossipNodes in the >>>>>>>>>> network. It >>>>>>>>>> could >>>>>>>>>> also (say every 10 minutes) check its NCClient.neighbors list >>>>>>>>>> and see if >>>>>>>>>> any neighbor is older than TIMEOUT, remove it, then pick a >>>>>>>>>> random >>>>>>>>>> GossipNode to send a ping request. >>>>>>>>>> >>>>>>>>>> The main problem with this scheme is that one Pyxida node >>>>>>>>>> would >>>>>>>>>> most >>>>>>>>>> likely >>>>>>>>>> stay with the same set of neighbors all the time. Its >>>>>>>>>> coordinates >>>>>>>>>> would be >>>>>>>>>> greatly affected by those 32 nodes (I am assuming >>>>>>>>>> MAX_NEIGHBORS >>>>>>>>>> = 32). >>>>>>>>>> >>>>>>>>> Instead of (2), why not have the getNeighborToPing function >>>>>>>>> determine >>>>>>>>> the behavior? If it returns a node, then ping it. If it >>>>>>>>> returns null, >>>>>>>>> then have the NCManager pick some other node to use? If >>>>>>>>> things are >>>>>>>>> hunky-dory, getNeighborToPing() would return a node (a current >>>>>>>>> member of >>>>>>>>> the neighbor set), if we're short on neighbors, then it would >>>>>>>>> return >>>>>>>>> null. The periodic maintenance would have already tossed old >>>>>>>>> nodes, so >>>>>>>>> this would be self-sustaining. >>>>>>>>> >>>>>>>>> The "problem" you mention is fairly central to network >>>>>>>>> coordinates in >>>>>>>>> general. Sure the coordinates will be a little better the more >>>>>>>>> neighbors you have. But in a very large system talking to >>>>>>>>> everybody >>>>>>>>> isn't feasible. Peter and I have a graph in one of the >>>>>>>>> workshop >>>>>>>>> papers >>>>>>>>> which shows the effect of increasing the number of neighbors >>>>>>>>> >>>>>>>>>> I don't think the network would be partitioned, but it could >>>>>>>>>> happen. For >>>>>>>>>> example, the boostrap nodes have each other in their neighbors >>>>>>>>>> list. >>>>>>>>>> Every >>>>>>>>>> new node bootstrapping to the network would contact those >>>>>>>>>> bootstrapping >>>>>>>>>> nodes first and add them to their neighbors. So every Pyxida >>>>>>>>>> node could >>>>>>>>>> have as its neighbors the "Bootstrap nodes", which would make >>>>>>>>>> Pyxida >>>>>>>>>> become >>>>>>>>>> a centralized system. >>>>>>>>>> >>>>>>>>> As long as the initial bootstrap list is large enough, I think >>>>>>>>> you are >>>>>>>>> right, that partitions won't be a problem. >>>>>>>>> >>>>>>>>>> One last thing: When Pyxida receives a ping request from a >>>>>>>>>> node, it does >>>>>>>>>> not update its coordinates. Is there any particular reason for >>>>>>>>>> that? >>>>>>>>>> Is it >>>>>>>>>> because Pyxida will not be able to calculate the RTT? >>>>>>>>>> >>>>>>>>> Yes. >>>>>>>>> >>>>>>>>>> Also, there is no "gossiping" of other nodes' coordinates, >>>>>>>>>> right? So if A >>>>>>>>>> has B in its neighbors list, it will from time to time tell >>>>>>>>>> other nodes >>>>>>>>>> about B's coordinates. I realize this has a lot of >>>>>>>>>> complications but the >>>>>>>>>> original Vivaldi paper was very fuzzy about this point and the >>>>>>>>>> previous >>>>>>>>>> point that it seems it is up to the implementors to decide >>>>>>>>>> what >>>>>>>>>> to do. >>>>>>>>>> >>>>>>>>> I don't think we are doing this. It could be useful in certain >>>>>>>>> circumstance (e.g., picking potential new neighbors), but we're >>>>>>>>> not, IIRC. >>>>>>>>> >>>>>>>>>> I will have to think some more about this as there are a lot >>>>>>>>>> of >>>>>>>>>> nasty >>>>>>>>>> little side effects that could happen, but these are my >>>>>>>>>> thoughts. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> >>>>>>>>>> -Eric >>>>>>>>>> >>>>>>>>> Sounds good. >>>>>>>>> >>>>>>>>> -jonathan >>>>>>>>> >>>>>>>>>> On Nov 3 2008, Jonathan Ledlie wrote: >>>>>>>>>> >>>>>>>>>>> Eric, >>>>>>>>>>> >>>>>>>>>>> Based on a look at the code just now, I see where you are >>>>>>>>>>> coming from: >>>>>>>>>>> with these params on PL, there isn't anything stopping the >>>>>>>>>>> neighbor >>>>>>>>>>> list from circulating pretty quickly through all nodes. In >>>>>>>>>>> the >>>>>>>>>>> Azureus implementation, the guys fed to the alg (fed to >>>>>>>>>>> processSample) >>>>>>>>>>> weren't changing much because they were coming from the >>>>>>>>>>> routing finger >>>>>>>>>>> tables. I don't remember observing this behavior in our PL >>>>>>>>>>> deployment, but that could have been because we had the ping >>>>>>>>>>> time set >>>>>>>>>>> to 1-2 minutes, IIRC. >>>>>>>>>>> >>>>>>>>>>> Do you want to insert some code that will fix this? >>>>>>>>>>> Basically >>>>>>>>>>> we want >>>>>>>>>>> to only add a new neighbor if our oldest neighbor should be >>>>>>>>>>> tossed (or >>>>>>>>>>> if we haven't reached MAX_NEIGHBORS). It's probably best >>>>>>>>>>> from a >>>>>>>>>>> clarity standpoint if addNeighbor does continue to actually >>>>>>>>>>> add the >>>>>>>>>>> neighbor in all cases. Probably the best thing to do is to >>>>>>>>>>> have >>>>>>>>>>> NCClient drop old neighbors internally and have NCManager ask >>>>>>>>>>> it which >>>>>>>>>>> nodes should be pinged via getNeighborToPing. So NCClient >>>>>>>>>>> would >>>>>>>>>>> continue to have the same behavior on processSample (it >>>>>>>>>>> would not >>>>>>>>>>> reject new information) and NCManager and NCClient would >>>>>>>>>>> cooperate to >>>>>>>>>>> measure RTTs to current neighbors. We do want to make sure, >>>>>>>>>>> however, >>>>>>>>>>> that the gossip set stays large enough so that partitions >>>>>>>>>>> don't occur. >>>>>>>>>>> >>>>>>>>>>> Do you want to be added to the SourceForge Pyxida developers >>>>>>>>>>> to make >>>>>>>>>>> this change? (Please test it out carefully first.) >>>>>>>>>>> >>>>>>>>>>> -jonathan >>>>>>>>>>> >>>>>>>>>>> On Oct 30, 2008, at 11:32 AM, ext Eric Chan-Tin wrote: >>>>>>>>>>> >>>>>>>>>>>> Jonathan, >>>>>>>>>>>> >>>>>>>>>>>> From what I saw, eventually, you learn about all the nodes >>>>>>>>>>>> in >>>>>>>>>>>> the >>>>>>>>>>>> network. Every 10 seconds, Pyxida picks a random "Gossip" >>>>>>>>>>>> node from >>>>>>>>>>>> its >>>>>>>>>>>> list and sends a request. The Gossip node will reply back >>>>>>>>>>>> with an >>>>>>>>>>>> "upNeighbor", that is, a "Gossip" node that is "up" or >>>>>>>>>>>> with 10% >>>>>>>>>>>> probability, a node that is "pending". Since naturally, all >>>>>>>>>>>> the nodes >>>>>>>>>>>> bootstrap off the same set of nodes, every node will >>>>>>>>>>>> eventually know >>>>>>>>>>>> of >>>>>>>>>>>> every other node. >>>>>>>>>>>> >>>>>>>>>>>> Pyxida might not be rotating through everybody continuously >>>>>>>>>>>> (although >>>>>>>>>>>> eventually, since it knows everybody, it will have added >>>>>>>>>>>> everybody to >>>>>>>>>>>> its Neighbors list at some point), but could be removing a >>>>>>>>>>>> node that >>>>>>>>>>>> is >>>>>>>>>>>> "alive". So, if MAX_NEIGHBORS = 32, let's call them A_1, >>>>>>>>>>>> A_2, >>>>>>>>>>>> ..., >>>>>>>>>>>> A_32. >>>>>>>>>>>> Pyxida picks a random Gossip node B, sends it a request >>>>>>>>>>>> and B >>>>>>>>>>>> replies >>>>>>>>>>>> back. Pyxida will add B to its Neighbors list, see that >>>>>>>>>>>> size of >>>>>>>>>>>> Neighbors > MAX_NEIGHBORS, kick A_1 out. Then from those 32 >>>>>>>>>>>> neighbors >>>>>>>>>>>> A_2, ...., A_32, B, it will calculate the force, and apply >>>>>>>>>>>> it >>>>>>>>>>>> to the >>>>>>>>>>>> current system coordinates. Thus, if every 10 seconds, >>>>>>>>>>>> Pyxida >>>>>>>>>>>> picks a >>>>>>>>>>>> different set of nodes C, D, E, etc... it would keep kicking >>>>>>>>>>>> nodes out >>>>>>>>>>>> of the neighbors list. >>>>>>>>>>>> >>>>>>>>>>>> It's the "kicking A_1 out" part that seemed a bit confusing. >>>>>>>>>>>> Churn >>>>>>>>>>>> is a >>>>>>>>>>>> good reason but you don't check that A_1 might still be >>>>>>>>>>>> alive >>>>>>>>>>>> or maybe >>>>>>>>>>>> A_1 was just added. I have been printing the neighbors list >>>>>>>>>>>> every 10 >>>>>>>>>>>> minutes, and they are completely different between those 10 >>>>>>>>>>>> minutes. >>>>>>>>>>>> >>>>>>>>>>>> One possible problem might be that I am choosing the value >>>>>>>>>>>> of >>>>>>>>>>>> MAX_NEIGHBORS to be too small. Originally it was set to 512 >>>>>>>>>>>> (which was >>>>>>>>>>>> meant for Azureus). Since I am using Planetlab where there >>>>>>>>>>>> are about >>>>>>>>>>>> 400-500 "stable" nodes, I changed the value to 32. But I >>>>>>>>>>>> don't think >>>>>>>>>>>> that should be a problem since log(n) = log(512) = 9. >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> >>>>>>>>>>>> -Eric >>>>>>>>>>>> >>>>>>>>>>>> Jonathan Ledlie wrote: >>>>>>>>>>>>> Eric, >>>>>>>>>>>>> >>>>>>>>>>>>> Glad to hear it. Let Peter and me know if you publish >>>>>>>>>>>>> anything using >>>>>>>>>>>>> Pyxida and we can put it on the web page. >>>>>>>>>>>>> >>>>>>>>>>>>> "Gossip" nodes are for learning about the network. >>>>>>>>>>>>> "Neighbors" are >>>>>>>>>>>>> for >>>>>>>>>>>>> constructing the coordinate. >>>>>>>>>>>>> >>>>>>>>>>>>> So, IIRC, the gossip process learns about nodes that may or >>>>>>>>>>>>> may not >>>>>>>>>>>>> be >>>>>>>>>>>>> applied to the neighbor list. That is, we may eventually >>>>>>>>>>>>> learn >>>>>>>>>>>>> about a >>>>>>>>>>>>> large portion of the network, via gossiping, but should >>>>>>>>>>>>> only >>>>>>>>>>>>> be using >>>>>>>>>>>>> MAX_NEIGHBORS at any given time for coordinate computation. >>>>>>>>>>>>> The >>>>>>>>>>>>> reason >>>>>>>>>>>>> we don't want to use a fixed set of neighbors for >>>>>>>>>>>>> coordinate >>>>>>>>>>>>> computation >>>>>>>>>>>>> is because nodes may come and go from the system. However, >>>>>>>>>>>>> it should >>>>>>>>>>>>> not be the case that we just rotate through everybody, >>>>>>>>>>>>> because >>>>>>>>>>>>> then, as >>>>>>>>>>>>> you note, we'd be computing against everyone's coordinate/ >>>>>>>>>>>>> latency, >>>>>>>>>>>>> and >>>>>>>>>>>>> this is counter to the point of network coordinates. >>>>>>>>>>>>> With this >>>>>>>>>>>>> gradual >>>>>>>>>>>>> transience as a desirable quality in mind, I am pretty >>>>>>>>>>>>> sure we >>>>>>>>>>>>> maintain >>>>>>>>>>>>> neighbors in a fairly fixed manner. Does this not seem to >>>>>>>>>>>>> be the >>>>>>>>>>>>> case >>>>>>>>>>>>> to you? >>>>>>>>>>>>> >>>>>>>>>>>>> -jonathan >>>>>>>>>>>>> >>>>>>>>>>>>> On Oct 29, 2008, at 9:20 PM, Eric Chan-Tin wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi Jonathan and Peter, >>>>>>>>>>>>>> >>>>>>>>>>>>>> First of all, just wanted to say that Pyxida has been >>>>>>>>>>>>>> working >>>>>>>>>>>>>> great on >>>>>>>>>>>>>> Planetlab for the past 4 months or so. Getting good data >>>>>>>>>>>>>> and >>>>>>>>>>>>>> experiments >>>>>>>>>>>>>> going well. >>>>>>>>>>>>>> >>>>>>>>>>>>>> However, I do have one question/comment. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Every Pyxida node contains a list of Neighbors (in >>>>>>>>>>>>>> NCClient) and a >>>>>>>>>>>>>> list >>>>>>>>>>>>>> of Gossip nodes (they are called neighbors in NCManager >>>>>>>>>>>>>> but >>>>>>>>>>>>>> I will >>>>>>>>>>>>>> call >>>>>>>>>>>>>> them gossip for now so as not to confuse with the 'real' >>>>>>>>>>>>>> neighbors >>>>>>>>>>>>>> in >>>>>>>>>>>>>> NCClient). Every 10 seconds, Pyxida will pick a random >>>>>>>>>>>>>> node >>>>>>>>>>>>>> (call >>>>>>>>>>>>>> it A) >>>>>>>>>>>>>> from Gossip and send a gossip request (basically asking >>>>>>>>>>>>>> for >>>>>>>>>>>>>> rtt, >>>>>>>>>>>>>> coordinate, error). But Gossip contains all the nodes in >>>>>>>>>>>>>> the >>>>>>>>>>>>>> network. >>>>>>>>>>>>>> The comment in the code said that was to avoid Pyxida >>>>>>>>>>>>>> becoming >>>>>>>>>>>>>> lonely >>>>>>>>>>>>>> and no-one to gossip with, which makes sense. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Once A responds, Pyxida will add A to its Neighbors (if >>>>>>>>>>>>>> Neighbors >>>>>>>>>>>>>> does >>>>>>>>>>>>>> not contain A) and remove the first element from the >>>>>>>>>>>>>> Neighbors list. >>>>>>>>>>>>>> This seems a little bit odd. Was there any particular >>>>>>>>>>>>>> reasoning >>>>>>>>>>>>>> behind >>>>>>>>>>>>>> this? Because it would seem like you're getting the >>>>>>>>>>>>>> coordinate from >>>>>>>>>>>>>> every node in the network eventually. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks a lot, >>>>>>>>>>>>>> >>>>>>>>>>>>>> -Eric >>>>>>>>>>>>>> >>>>>>>>>>>>>> Jonathan Ledlie wrote: >>>>>>>>>>>>>>> Your inclination is correct. See forwarded message. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -jonathan >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Apr 9, 2008, at 1:53 PM, ext Eric Chan-Tin wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thank you for the clarification. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I've been reading through your published papers and code >>>>>>>>>>>>>>>> and I >>>>>>>>>>>>>>>> must >>>>>>>>>>>>>>>> say I learned a lot about how Pyxida works :) >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I still have a quick and easy (hopefully) question. Each >>>>>>>>>>>>>>>> Pyxida >>>>>>>>>>>>>>>> node >>>>>>>>>>>>>>>> has a small amount of neighbors and learn about other >>>>>>>>>>>>>>>> nodes >>>>>>>>>>>>>>>> through >>>>>>>>>>>>>>>> other nodes. I am running on about 200 machines, and it >>>>>>>>>>>>>>>> seems >>>>>>>>>>>>>>>> like all >>>>>>>>>>>>>>>> the Pyxida nodes are adding each other to their >>>>>>>>>>>>>>>> upNeighbors set. >>>>>>>>>>>>>>>> NCClient.java has the MAX_NEIGHBORS set to 512. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I wanted to make sure that I am not misreading the >>>>>>>>>>>>>>>> code, but >>>>>>>>>>>>>>>> shouldn't >>>>>>>>>>>>>>>> each Pyxida node keep a small number of neighbors in its >>>>>>>>>>>>>>>> upNeighbors >>>>>>>>>>>>>>>> set. Else, it will be a lot of updates to send a >>>>>>>>>>>>>>>> gossipMessage >>>>>>>>>>>>>>>> to all >>>>>>>>>>>>>>>> 200 neighbors, and it also means that a Pyxida node has >>>>>>>>>>>>>>>> the whole >>>>>>>>>>>>>>>> network in its upNeighbors set - that seems wrong. I am >>>>>>>>>>>>>>>> inclined >>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>> change the value of MAX_NEIGHBORS but wanted to make >>>>>>>>>>>>>>>> sure >>>>>>>>>>>>>>>> that my >>>>>>>>>>>>>>>> understanding is correct first. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thank you, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Eric >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> ------------------------------------------------------------------------- >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> This SF.Net email is sponsored by the Moblin Your Move >>>>>>>>>>>>>> Developer's >>>>>>>>>>>>>> challenge Build the coolest Linux based applications with >>>>>>>>>>>>>> Moblin >>>>>>>>>>>>>> SDK & >>>>>>>>>>>>>> win great prizes Grand prize is a trip for two to an Open >>>>>>>>>>>>>> Source >>>>>>>>>>>>>> event >>>>>>>>>>>>>> anywhere in the world >>>>>>>>>>>>>> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>> Pyxida-users mailing >>>>>>>>>>>>>> list Pyx...@li... >>>>>>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/pyxida-users >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> ------------------------------------------------------------------------- >>>>>>>>>> This SF.Net email is sponsored by the Moblin Your Move >>>>>>>>>> Developer's challenge Build the coolest Linux based applications >>>>>>>>>> with Moblin SDK & win great prizes Grand prize is a trip for two >>>>>>>>>> to an Open Source event anywhere in the world >>>>>>>>>> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >>>>>>>>>> _______________________________________________ Pyxida-users >>>>>>>>>> mailing list Pyx...@li... >>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/pyxida-users >>>>>>>> >>>>>>>> >>>>>>>> ------------------------------------------------------------------------------ >>>>>>>> SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las >>>>>>>> Vegas, Nevada. The future of the web can't happen without you. >>>>>>>> Join us at MIX09 to help pave the way to the Next Web now. Learn >>>>>>>> more and register at >>>>>>>> http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/ >>>>>>>> _______________________________________________ >>>>>>>> Pyxida-users mailing list Pyx...@li... >>>>>>>> https://lists.sourceforge.net/lists/listinfo/pyxida-users >>>>>>> >>>>>>> >>>>> > |
From: Jonathan L. <jon...@ee...> - 2008-12-11 04:02:29
|
Seems reasonable. Both ideas (lastPingTime and a weighted probability based on age) seem to be independent. Some of the code is more CPU intensive than it should be (e.g., looping through potentially long lists), and that is a slight downside to the function approach vs a set, but this probably isn't a big deal. -jonathan On Dec 10, 2008, at 10:40 PM, Eric Chan-Tin wrote: > okay. I think I might have misunderstood you. Your idea of > lastPingTime > looks interesting. > > My idea is to introduce a probability function. Right now, it just > randomly picks one of the neighbors. What I am thinking it might do is > that instead of just randomly picking a neighbor, pick a neighbor > based > on some function. So, the neighbor whose lastUpdateTime is oldest will > have a higher chance of being picked than a neighbor who was just > updated. > > Does that make more sense? > > -Eric > > Jonathan Ledlie wrote: >> >> Hmm. The advantage of lastPingTime is that you learn whether or not >> you've tried to contact a neighbor and they haven't responded. >> Then you >> can use that knowledge to determine whether or not to toss them. >> >> Maybe I'm not understanding your suggestion then. >> >> -jonathan >> >> On Dec 10, 2008, at 12:51 PM, Eric Chan-Tin wrote: >> >>> That's a good idea. Here is a little twist on it. >>> >>> Put a "weight" on each neighbor, based on the lastUpdateTime. So if >>> currentTime - neighbor's lastUpdateTime, then the weight is higher. >>> Then in getNeighborToPing, neighbor's with higher weight have a >>> higher >>> probability of getting picked than neighbor's with lower weight. >>> >>> The weight could be ( (currentTime - lastUpdateTime) / 15 minutes). >>> It's not 10 minutes so as to avoid a probability of 1 if the >>> lastUpdateTime was the previous performMaintenance. This should also >>> avoid the problem of always pinging the oldest neighbor since it's >>> about probability (weight). >>> >>> It seems a little bit less complicated than having lastPingTime and >>> MIN_UPDATE_TIME. >>> >>> -Eric >>> >>> On Dec 10 2008, Jonathan Ledlie wrote: >>> >>>> >>>> This is an interesting situation. Let's think about the goal. The >>>> goal is to keep pinging the same nodes as long as they are alive, >>>> and to replace a node soon after it dies, keeping the total number >>>> of up neighbors at approximately MAX_NEIGHBORS. >>>> >>>> How about the following: >>>> - when you pick a neighbor to ping, you set a time on that event >>>> (in >>>> a new per-neighbor timestamp LastPingTime) >>>> - when you first insert a neighbor, set this timestamp to 0. >>>> - when you loop in performMaintenance, remove nodes whose >>>> LastPingTime - LastUpdateTime > threshold (like 10 minutes) >>>> >>>> - when picking a node, pick the guy with the oldest LastPingTime >>>> (not sure about this one -- I don't like situations where you pick >>>> the absolute *oldest* because then we could get in a situation >>>> where >>>> an old guy sits there, waiting to be kicked out, and we keep >>>> picking >>>> him to be pinged; I'd prefer to pick randomly, or random older >>>> guys). >>>> - a simple way around this is to simply have a self-adjusting >>>> threshold: so, we walk through the nodes, and everyone whose >>>> LastPingTime is older than MIN_UPDATE_TIME_TO_PING ago (a relative >>>> time) get added to a set; we pick a guy randomly from the set to >>>> ping. If the set is empty, we increase MIN_UPDATE_TIME_TO_PING, >>>> and >>>> wait for the next time the function is called. If the set is too >>>> large (relative to a fraction of MAX_NEIGHBORS), we decrease it. >>>> >>>> We still might get some churn from lost UDP packets. You might >>>> also >>>> give nodes a second chance to be pinged by including LastPingTime - >>>> LastUpdateTime > 1 minute in the set. Note that increases and >>>> decreases in MIN_UPDATE_TIME_TO_PING shouldn't be affected by the >>>> number of nodes that fall into this category. >>>> >>>> We still probably need a way for new potential neighbors to be >>>> rejected if our current set of neighbors is fine. Because I still >>>> want addNeigbor to really add the neighbor, how about putting a >>>> check before calling that to see if we need more neighbors. >>>> >>>> What thinks? >>>> >>>> -jonathan >>>> >>>> On Dec 9, 2008, at 11:37 PM, ext Eric Chan-Tin wrote: >>>> >>>>> Peter, >>>>> >>>>> Thanks. That fixed it. >>>>> >>>>> However, here is another problem I found while deploying Pyxida >>>>> and >>>>> running them for a while. >>>>> >>>>> In performMaintenance(), I loop through every neighbor and if >>>>> (neighbor.getLastUpdateTime() < current_time - 10 minutes), then I >>>>> remove that neighbor. >>>> >>>> >>>>> >>>>> >>>>> In NCManager, I call getNeighborToPing, which randomly picks one >>>>> of >>>>> the >>>>> neighbors (32 in this case). Pinging is performed every 10 >>>>> seconds, so >>>>> in 10 minutes, you can ping 60 nodes. However, if a neighbor A >>>>> is not >>>>> among those 60 nodes, it will get kicked out in >>>>> performMaintenance(), >>>>> although A is still alive (it just hasn't been contacted yet). >>>>> >>>>> There are some possible options. >>>>> >>>>> 1) Increase the timeout period from 10 minutes to ?? minutes. This >>>>> doesn't quite work because say in Azureus, the number of >>>>> neighbors is >>>>> 512, so the timeout needs to be increased even more... It could >>>>> be a >>>>> function of MAX_NEIGHBORS... >>>>> >>>>> 2) When Pyxida replies to a ping request, it will update the >>>>> neighbor's >>>>> lastUpdateTime. This does not quite work either because it >>>>> assumes a >>>>> symmetric relationship which is most likely not the case. >>>>> >>>>> 3) Introduce some more logic to Pyxida so it remembers what >>>>> nodes it >>>>> contacted and does not send, say, more than 2 requests in 10 >>>>> minutes to >>>>> the same node. This solution isn't that great either. >>>>> >>>>> Let me know what you guys think. I am currently leaning towards >>>>> option >>>>> 1) but make the timeout period a function of MAX_NEIGHBORS, >>>>> although I >>>>> haven't quite found the perfect function yet. >>>>> >>>>> Here are some preliminary results. I compared the number of >>>>> neighbors >>>>> which are the same at each reported time (10 minutes). Before >>>>> the fix, >>>>> more than 50% of the neighbors changed within 10 minutes. After >>>>> the >>>>> fix, >>>>> only about 30% of the neighbors changed within 10 minutes. It is >>>>> more >>>>> stable but I think 30% is still quite a big number. >>>>> >>>>> Sorry for the long email :) >>>>> >>>>> -Eric >>>>> >>>>> Peter Pietzuch wrote: >>>>>> Hi, >>>>>> >>>>>> Eric Chan-Tin wrote: >>>>>>> for (RemoteState<T> neighbor : neighbors) { //this is line 1029 >>>>>>> if (neighbor.getLastUpdateTime() < (curr_time - >>>>>>> MAINTENANCE_PERIOD)) { >>>>>>> neighbors.remove(neighbor); >>>>>>> } >>>>>>> } >>>>>> >>>>>> IIRC you can't modify a collection while iterating over it >>>>>> using the >>>>>> above for loop. Instead you have to use a proper >>>>>> Iterator<RemoteState<T>> and use it's remove method. >>>>>> >>>>>> Cheers, >>>>>> Peter >>>>>> >>>>>>> Jonathan Ledlie wrote: >>>>>>>> On Nov 3, 2008, at 3:46 PM, Eric Chan-Tin wrote: >>>>>>>> >>>>>>>>> Jonathan, >>>>>>>>> >>>>>>>>> I can insert some code to fix this and it doesn't look too >>>>>>>>> hard >>>>>>>>> but I do >>>>>>>>> have some concerns. >>>>>>>>> >>>>>>>>>> I don't remember observing this behavior in our PL >>>>>>>>>> deployment, but that could have been because we had the ping >>>>>>>>>> time set >>>>>>>>>> to 1-2 minutes, IIRC. >>>>>>>>> The default ping time was set to 10 seconds + some Gaussian >>>>>>>>> distribution, >>>>>>>>> so that's what I used, but changing the ping time to 1 minute >>>>>>>>> or more >>>>>>>>> will >>>>>>>>> result in the same effect except it will take longer for the >>>>>>>>> "cycle" to >>>>>>>>> occur. >>>>>>>>> >>>>>>>>> What I have in mind was the following: >>>>>>>>> >>>>>>>>> 1) In NCClient.addNeighbor() would be changed to (in >>>>>>>>> pseudocode) >>>>>>>>> protected boolean addNeighbor(RemoteState<T> guy) { >>>>>>>>> boolean added = false; >>>>>>>>> //if neighbors.size() >= MAX_NEIGHBORS >>>>>>>>> //Check last update time of each neighbor in >>>>>>>>> neighbors >>>>>>>>> //neighbor.getLastUpdateTime() >>>>>>>>> //if any of them is older than some number >>>>>>>>> TIMEOUT, >>>>>>>>> remove the oldest one, then add "guy" to neighbors >>>>>>>>> //else don't add "guy" to neighbors >>>>>>>>> //else add "guy" to neighbors >>>>>>>>> } >>>>>>>>> >>>>>>>> IMO this shouldn't be the behavior of this function because it >>>>>>>> is called >>>>>>>> "addNeighbor." Instead what could happen is that part of the >>>>>>>> performMaintenance() function could remove old neighbors. >>>>>>>> >>>>>>>>> 2) In NCManager, getGossipNode() would just pick a random node >>>>>>>>> from >>>>>>>>> localNC.primaryNC.getNeighbors() to send a request to every 10 >>>>>>>>> seconds. >>>>>>>>> >>>>>>>>> 3) Pyxida will still keep track of all GossipNodes in the >>>>>>>>> network. It >>>>>>>>> could >>>>>>>>> also (say every 10 minutes) check its NCClient.neighbors list >>>>>>>>> and see if >>>>>>>>> any neighbor is older than TIMEOUT, remove it, then pick a >>>>>>>>> random >>>>>>>>> GossipNode to send a ping request. >>>>>>>>> >>>>>>>>> The main problem with this scheme is that one Pyxida node >>>>>>>>> would >>>>>>>>> most >>>>>>>>> likely >>>>>>>>> stay with the same set of neighbors all the time. Its >>>>>>>>> coordinates >>>>>>>>> would be >>>>>>>>> greatly affected by those 32 nodes (I am assuming >>>>>>>>> MAX_NEIGHBORS >>>>>>>>> = 32). >>>>>>>>> >>>>>>>> Instead of (2), why not have the getNeighborToPing function >>>>>>>> determine >>>>>>>> the behavior? If it returns a node, then ping it. If it >>>>>>>> returns null, >>>>>>>> then have the NCManager pick some other node to use? If >>>>>>>> things are >>>>>>>> hunky-dory, getNeighborToPing() would return a node (a current >>>>>>>> member of >>>>>>>> the neighbor set), if we're short on neighbors, then it would >>>>>>>> return >>>>>>>> null. The periodic maintenance would have already tossed old >>>>>>>> nodes, so >>>>>>>> this would be self-sustaining. >>>>>>>> >>>>>>>> The "problem" you mention is fairly central to network >>>>>>>> coordinates in >>>>>>>> general. Sure the coordinates will be a little better the more >>>>>>>> neighbors you have. But in a very large system talking to >>>>>>>> everybody >>>>>>>> isn't feasible. Peter and I have a graph in one of the >>>>>>>> workshop >>>>>>>> papers >>>>>>>> which shows the effect of increasing the number of neighbors >>>>>>>> >>>>>>>>> I don't think the network would be partitioned, but it could >>>>>>>>> happen. For >>>>>>>>> example, the boostrap nodes have each other in their neighbors >>>>>>>>> list. >>>>>>>>> Every >>>>>>>>> new node bootstrapping to the network would contact those >>>>>>>>> bootstrapping >>>>>>>>> nodes first and add them to their neighbors. So every Pyxida >>>>>>>>> node could >>>>>>>>> have as its neighbors the "Bootstrap nodes", which would make >>>>>>>>> Pyxida >>>>>>>>> become >>>>>>>>> a centralized system. >>>>>>>>> >>>>>>>> As long as the initial bootstrap list is large enough, I think >>>>>>>> you are >>>>>>>> right, that partitions won't be a problem. >>>>>>>> >>>>>>>>> One last thing: When Pyxida receives a ping request from a >>>>>>>>> node, it does >>>>>>>>> not update its coordinates. Is there any particular reason for >>>>>>>>> that? >>>>>>>>> Is it >>>>>>>>> because Pyxida will not be able to calculate the RTT? >>>>>>>>> >>>>>>>> Yes. >>>>>>>> >>>>>>>>> Also, there is no "gossiping" of other nodes' coordinates, >>>>>>>>> right? So if A >>>>>>>>> has B in its neighbors list, it will from time to time tell >>>>>>>>> other nodes >>>>>>>>> about B's coordinates. I realize this has a lot of >>>>>>>>> complications but the >>>>>>>>> original Vivaldi paper was very fuzzy about this point and the >>>>>>>>> previous >>>>>>>>> point that it seems it is up to the implementors to decide >>>>>>>>> what >>>>>>>>> to do. >>>>>>>>> >>>>>>>> I don't think we are doing this. It could be useful in certain >>>>>>>> circumstance (e.g., picking potential new neighbors), but we're >>>>>>>> not, IIRC. >>>>>>>> >>>>>>>>> I will have to think some more about this as there are a lot >>>>>>>>> of >>>>>>>>> nasty >>>>>>>>> little side effects that could happen, but these are my >>>>>>>>> thoughts. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> -Eric >>>>>>>>> >>>>>>>> Sounds good. >>>>>>>> >>>>>>>> -jonathan >>>>>>>> >>>>>>>>> On Nov 3 2008, Jonathan Ledlie wrote: >>>>>>>>> >>>>>>>>>> Eric, >>>>>>>>>> >>>>>>>>>> Based on a look at the code just now, I see where you are >>>>>>>>>> coming from: >>>>>>>>>> with these params on PL, there isn't anything stopping the >>>>>>>>>> neighbor >>>>>>>>>> list from circulating pretty quickly through all nodes. In >>>>>>>>>> the >>>>>>>>>> Azureus implementation, the guys fed to the alg (fed to >>>>>>>>>> processSample) >>>>>>>>>> weren't changing much because they were coming from the >>>>>>>>>> routing finger >>>>>>>>>> tables. I don't remember observing this behavior in our PL >>>>>>>>>> deployment, but that could have been because we had the ping >>>>>>>>>> time set >>>>>>>>>> to 1-2 minutes, IIRC. >>>>>>>>>> >>>>>>>>>> Do you want to insert some code that will fix this? >>>>>>>>>> Basically >>>>>>>>>> we want >>>>>>>>>> to only add a new neighbor if our oldest neighbor should be >>>>>>>>>> tossed (or >>>>>>>>>> if we haven't reached MAX_NEIGHBORS). It's probably best >>>>>>>>>> from a >>>>>>>>>> clarity standpoint if addNeighbor does continue to actually >>>>>>>>>> add the >>>>>>>>>> neighbor in all cases. Probably the best thing to do is to >>>>>>>>>> have >>>>>>>>>> NCClient drop old neighbors internally and have NCManager ask >>>>>>>>>> it which >>>>>>>>>> nodes should be pinged via getNeighborToPing. So NCClient >>>>>>>>>> would >>>>>>>>>> continue to have the same behavior on processSample (it >>>>>>>>>> would not >>>>>>>>>> reject new information) and NCManager and NCClient would >>>>>>>>>> cooperate to >>>>>>>>>> measure RTTs to current neighbors. We do want to make sure, >>>>>>>>>> however, >>>>>>>>>> that the gossip set stays large enough so that partitions >>>>>>>>>> don't occur. >>>>>>>>>> >>>>>>>>>> Do you want to be added to the SourceForge Pyxida developers >>>>>>>>>> to make >>>>>>>>>> this change? (Please test it out carefully first.) >>>>>>>>>> >>>>>>>>>> -jonathan >>>>>>>>>> >>>>>>>>>> On Oct 30, 2008, at 11:32 AM, ext Eric Chan-Tin wrote: >>>>>>>>>> >>>>>>>>>>> Jonathan, >>>>>>>>>>> >>>>>>>>>>> From what I saw, eventually, you learn about all the nodes >>>>>>>>>>> in >>>>>>>>>>> the >>>>>>>>>>> network. Every 10 seconds, Pyxida picks a random "Gossip" >>>>>>>>>>> node from >>>>>>>>>>> its >>>>>>>>>>> list and sends a request. The Gossip node will reply back >>>>>>>>>>> with an >>>>>>>>>>> "upNeighbor", that is, a "Gossip" node that is "up" or >>>>>>>>>>> with 10% >>>>>>>>>>> probability, a node that is "pending". Since naturally, all >>>>>>>>>>> the nodes >>>>>>>>>>> bootstrap off the same set of nodes, every node will >>>>>>>>>>> eventually know >>>>>>>>>>> of >>>>>>>>>>> every other node. >>>>>>>>>>> >>>>>>>>>>> Pyxida might not be rotating through everybody continuously >>>>>>>>>>> (although >>>>>>>>>>> eventually, since it knows everybody, it will have added >>>>>>>>>>> everybody to >>>>>>>>>>> its Neighbors list at some point), but could be removing a >>>>>>>>>>> node that >>>>>>>>>>> is >>>>>>>>>>> "alive". So, if MAX_NEIGHBORS = 32, let's call them A_1, >>>>>>>>>>> A_2, >>>>>>>>>>> ..., >>>>>>>>>>> A_32. >>>>>>>>>>> Pyxida picks a random Gossip node B, sends it a request >>>>>>>>>>> and B >>>>>>>>>>> replies >>>>>>>>>>> back. Pyxida will add B to its Neighbors list, see that >>>>>>>>>>> size of >>>>>>>>>>> Neighbors > MAX_NEIGHBORS, kick A_1 out. Then from those 32 >>>>>>>>>>> neighbors >>>>>>>>>>> A_2, ...., A_32, B, it will calculate the force, and apply >>>>>>>>>>> it >>>>>>>>>>> to the >>>>>>>>>>> current system coordinates. Thus, if every 10 seconds, >>>>>>>>>>> Pyxida >>>>>>>>>>> picks a >>>>>>>>>>> different set of nodes C, D, E, etc... it would keep kicking >>>>>>>>>>> nodes out >>>>>>>>>>> of the neighbors list. >>>>>>>>>>> >>>>>>>>>>> It's the "kicking A_1 out" part that seemed a bit confusing. >>>>>>>>>>> Churn >>>>>>>>>>> is a >>>>>>>>>>> good reason but you don't check that A_1 might still be >>>>>>>>>>> alive >>>>>>>>>>> or maybe >>>>>>>>>>> A_1 was just added. I have been printing the neighbors list >>>>>>>>>>> every 10 >>>>>>>>>>> minutes, and they are completely different between those 10 >>>>>>>>>>> minutes. >>>>>>>>>>> >>>>>>>>>>> One possible problem might be that I am choosing the value >>>>>>>>>>> of >>>>>>>>>>> MAX_NEIGHBORS to be too small. Originally it was set to 512 >>>>>>>>>>> (which was >>>>>>>>>>> meant for Azureus). Since I am using Planetlab where there >>>>>>>>>>> are about >>>>>>>>>>> 400-500 "stable" nodes, I changed the value to 32. But I >>>>>>>>>>> don't think >>>>>>>>>>> that should be a problem since log(n) = log(512) = 9. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> >>>>>>>>>>> -Eric >>>>>>>>>>> >>>>>>>>>>> Jonathan Ledlie wrote: >>>>>>>>>>>> Eric, >>>>>>>>>>>> >>>>>>>>>>>> Glad to hear it. Let Peter and me know if you publish >>>>>>>>>>>> anything using >>>>>>>>>>>> Pyxida and we can put it on the web page. >>>>>>>>>>>> >>>>>>>>>>>> "Gossip" nodes are for learning about the network. >>>>>>>>>>>> "Neighbors" are >>>>>>>>>>>> for >>>>>>>>>>>> constructing the coordinate. >>>>>>>>>>>> >>>>>>>>>>>> So, IIRC, the gossip process learns about nodes that may or >>>>>>>>>>>> may not >>>>>>>>>>>> be >>>>>>>>>>>> applied to the neighbor list. That is, we may eventually >>>>>>>>>>>> learn >>>>>>>>>>>> about a >>>>>>>>>>>> large portion of the network, via gossiping, but should >>>>>>>>>>>> only >>>>>>>>>>>> be using >>>>>>>>>>>> MAX_NEIGHBORS at any given time for coordinate computation. >>>>>>>>>>>> The >>>>>>>>>>>> reason >>>>>>>>>>>> we don't want to use a fixed set of neighbors for >>>>>>>>>>>> coordinate >>>>>>>>>>>> computation >>>>>>>>>>>> is because nodes may come and go from the system. However, >>>>>>>>>>>> it should >>>>>>>>>>>> not be the case that we just rotate through everybody, >>>>>>>>>>>> because >>>>>>>>>>>> then, as >>>>>>>>>>>> you note, we'd be computing against everyone's coordinate/ >>>>>>>>>>>> latency, >>>>>>>>>>>> and >>>>>>>>>>>> this is counter to the point of network coordinates. >>>>>>>>>>>> With this >>>>>>>>>>>> gradual >>>>>>>>>>>> transience as a desirable quality in mind, I am pretty >>>>>>>>>>>> sure we >>>>>>>>>>>> maintain >>>>>>>>>>>> neighbors in a fairly fixed manner. Does this not seem to >>>>>>>>>>>> be the >>>>>>>>>>>> case >>>>>>>>>>>> to you? >>>>>>>>>>>> >>>>>>>>>>>> -jonathan >>>>>>>>>>>> >>>>>>>>>>>> On Oct 29, 2008, at 9:20 PM, Eric Chan-Tin wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi Jonathan and Peter, >>>>>>>>>>>>> >>>>>>>>>>>>> First of all, just wanted to say that Pyxida has been >>>>>>>>>>>>> working >>>>>>>>>>>>> great on >>>>>>>>>>>>> Planetlab for the past 4 months or so. Getting good data >>>>>>>>>>>>> and >>>>>>>>>>>>> experiments >>>>>>>>>>>>> going well. >>>>>>>>>>>>> >>>>>>>>>>>>> However, I do have one question/comment. >>>>>>>>>>>>> >>>>>>>>>>>>> Every Pyxida node contains a list of Neighbors (in >>>>>>>>>>>>> NCClient) and a >>>>>>>>>>>>> list >>>>>>>>>>>>> of Gossip nodes (they are called neighbors in NCManager >>>>>>>>>>>>> but >>>>>>>>>>>>> I will >>>>>>>>>>>>> call >>>>>>>>>>>>> them gossip for now so as not to confuse with the 'real' >>>>>>>>>>>>> neighbors >>>>>>>>>>>>> in >>>>>>>>>>>>> NCClient). Every 10 seconds, Pyxida will pick a random >>>>>>>>>>>>> node >>>>>>>>>>>>> (call >>>>>>>>>>>>> it A) >>>>>>>>>>>>> from Gossip and send a gossip request (basically asking >>>>>>>>>>>>> for >>>>>>>>>>>>> rtt, >>>>>>>>>>>>> coordinate, error). But Gossip contains all the nodes in >>>>>>>>>>>>> the >>>>>>>>>>>>> network. >>>>>>>>>>>>> The comment in the code said that was to avoid Pyxida >>>>>>>>>>>>> becoming >>>>>>>>>>>>> lonely >>>>>>>>>>>>> and no-one to gossip with, which makes sense. >>>>>>>>>>>>> >>>>>>>>>>>>> Once A responds, Pyxida will add A to its Neighbors (if >>>>>>>>>>>>> Neighbors >>>>>>>>>>>>> does >>>>>>>>>>>>> not contain A) and remove the first element from the >>>>>>>>>>>>> Neighbors list. >>>>>>>>>>>>> This seems a little bit odd. Was there any particular >>>>>>>>>>>>> reasoning >>>>>>>>>>>>> behind >>>>>>>>>>>>> this? Because it would seem like you're getting the >>>>>>>>>>>>> coordinate from >>>>>>>>>>>>> every node in the network eventually. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks a lot, >>>>>>>>>>>>> >>>>>>>>>>>>> -Eric >>>>>>>>>>>>> >>>>>>>>>>>>> Jonathan Ledlie wrote: >>>>>>>>>>>>>> Your inclination is correct. See forwarded message. >>>>>>>>>>>>>> >>>>>>>>>>>>>> -jonathan >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Apr 9, 2008, at 1:53 PM, ext Eric Chan-Tin wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thank you for the clarification. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I've been reading through your published papers and code >>>>>>>>>>>>>>> and I >>>>>>>>>>>>>>> must >>>>>>>>>>>>>>> say I learned a lot about how Pyxida works :) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I still have a quick and easy (hopefully) question. Each >>>>>>>>>>>>>>> Pyxida >>>>>>>>>>>>>>> node >>>>>>>>>>>>>>> has a small amount of neighbors and learn about other >>>>>>>>>>>>>>> nodes >>>>>>>>>>>>>>> through >>>>>>>>>>>>>>> other nodes. I am running on about 200 machines, and it >>>>>>>>>>>>>>> seems >>>>>>>>>>>>>>> like all >>>>>>>>>>>>>>> the Pyxida nodes are adding each other to their >>>>>>>>>>>>>>> upNeighbors set. >>>>>>>>>>>>>>> NCClient.java has the MAX_NEIGHBORS set to 512. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I wanted to make sure that I am not misreading the >>>>>>>>>>>>>>> code, but >>>>>>>>>>>>>>> shouldn't >>>>>>>>>>>>>>> each Pyxida node keep a small number of neighbors in its >>>>>>>>>>>>>>> upNeighbors >>>>>>>>>>>>>>> set. Else, it will be a lot of updates to send a >>>>>>>>>>>>>>> gossipMessage >>>>>>>>>>>>>>> to all >>>>>>>>>>>>>>> 200 neighbors, and it also means that a Pyxida node has >>>>>>>>>>>>>>> the whole >>>>>>>>>>>>>>> network in its upNeighbors set - that seems wrong. I am >>>>>>>>>>>>>>> inclined >>>>>>>>>>>>>>> to >>>>>>>>>>>>>>> change the value of MAX_NEIGHBORS but wanted to make >>>>>>>>>>>>>>> sure >>>>>>>>>>>>>>> that my >>>>>>>>>>>>>>> understanding is correct first. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thank you, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Eric >>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> ------------------------------------------------------------------------- >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> This SF.Net email is sponsored by the Moblin Your Move >>>>>>>>>>>>> Developer's >>>>>>>>>>>>> challenge Build the coolest Linux based applications with >>>>>>>>>>>>> Moblin >>>>>>>>>>>>> SDK & >>>>>>>>>>>>> win great prizes Grand prize is a trip for two to an Open >>>>>>>>>>>>> Source >>>>>>>>>>>>> event >>>>>>>>>>>>> anywhere in the world >>>>>>>>>>>>> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> Pyxida-users mailing >>>>>>>>>>>>> list Pyx...@li... >>>>>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/pyxida-users >>>>>>>>> ------------------------------------------------------------------------- >>>>>>>>> This SF.Net email is sponsored by the Moblin Your Move >>>>>>>>> Developer's challenge Build the coolest Linux based >>>>>>>>> applications >>>>>>>>> with Moblin SDK & win great prizes Grand prize is a trip for >>>>>>>>> two >>>>>>>>> to an Open Source event anywhere in the world >>>>>>>>> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >>>>>>>>> _______________________________________________ >>>>>>>>> Pyxida-users mailing list Pyx...@li... >>>>>>>>> https://lists.sourceforge.net/lists/listinfo/pyxida-users >>>>>>> ------------------------------------------------------------------------------ >>>>>>> SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las >>>>>>> Vegas, Nevada. The future of the web can't happen without you. >>>>>>> Join us at MIX09 to help pave the way to the Next Web now. Learn >>>>>>> more and register at >>>>>>> http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/ >>>>>>> _______________________________________________ >>>>>>> Pyxida-users mailing list Pyx...@li... >>>>>>> https://lists.sourceforge.net/lists/listinfo/pyxida-users >>>>>> >>>>>> >>>> |
From: Eric Chan-T. <dch...@cs...> - 2008-12-11 03:40:32
|
okay. I think I might have misunderstood you. Your idea of lastPingTime looks interesting. My idea is to introduce a probability function. Right now, it just randomly picks one of the neighbors. What I am thinking it might do is that instead of just randomly picking a neighbor, pick a neighbor based on some function. So, the neighbor whose lastUpdateTime is oldest will have a higher chance of being picked than a neighbor who was just updated. Does that make more sense? -Eric Jonathan Ledlie wrote: > > Hmm. The advantage of lastPingTime is that you learn whether or not > you've tried to contact a neighbor and they haven't responded. Then you > can use that knowledge to determine whether or not to toss them. > > Maybe I'm not understanding your suggestion then. > > -jonathan > > On Dec 10, 2008, at 12:51 PM, Eric Chan-Tin wrote: > >> That's a good idea. Here is a little twist on it. >> >> Put a "weight" on each neighbor, based on the lastUpdateTime. So if >> currentTime - neighbor's lastUpdateTime, then the weight is higher. >> Then in getNeighborToPing, neighbor's with higher weight have a higher >> probability of getting picked than neighbor's with lower weight. >> >> The weight could be ( (currentTime - lastUpdateTime) / 15 minutes). >> It's not 10 minutes so as to avoid a probability of 1 if the >> lastUpdateTime was the previous performMaintenance. This should also >> avoid the problem of always pinging the oldest neighbor since it's >> about probability (weight). >> >> It seems a little bit less complicated than having lastPingTime and >> MIN_UPDATE_TIME. >> >> -Eric >> >> On Dec 10 2008, Jonathan Ledlie wrote: >> >>> >>> This is an interesting situation. Let's think about the goal. The >>> goal is to keep pinging the same nodes as long as they are alive, >>> and to replace a node soon after it dies, keeping the total number >>> of up neighbors at approximately MAX_NEIGHBORS. >>> >>> How about the following: >>> - when you pick a neighbor to ping, you set a time on that event (in >>> a new per-neighbor timestamp LastPingTime) >>> - when you first insert a neighbor, set this timestamp to 0. >>> - when you loop in performMaintenance, remove nodes whose >>> LastPingTime - LastUpdateTime > threshold (like 10 minutes) >>> >>> - when picking a node, pick the guy with the oldest LastPingTime >>> (not sure about this one -- I don't like situations where you pick >>> the absolute *oldest* because then we could get in a situation where >>> an old guy sits there, waiting to be kicked out, and we keep picking >>> him to be pinged; I'd prefer to pick randomly, or random older guys). >>> - a simple way around this is to simply have a self-adjusting >>> threshold: so, we walk through the nodes, and everyone whose >>> LastPingTime is older than MIN_UPDATE_TIME_TO_PING ago (a relative >>> time) get added to a set; we pick a guy randomly from the set to >>> ping. If the set is empty, we increase MIN_UPDATE_TIME_TO_PING, and >>> wait for the next time the function is called. If the set is too >>> large (relative to a fraction of MAX_NEIGHBORS), we decrease it. >>> >>> We still might get some churn from lost UDP packets. You might also >>> give nodes a second chance to be pinged by including LastPingTime - >>> LastUpdateTime > 1 minute in the set. Note that increases and >>> decreases in MIN_UPDATE_TIME_TO_PING shouldn't be affected by the >>> number of nodes that fall into this category. >>> >>> We still probably need a way for new potential neighbors to be >>> rejected if our current set of neighbors is fine. Because I still >>> want addNeigbor to really add the neighbor, how about putting a >>> check before calling that to see if we need more neighbors. >>> >>> What thinks? >>> >>> -jonathan >>> >>> On Dec 9, 2008, at 11:37 PM, ext Eric Chan-Tin wrote: >>> >>>> Peter, >>>> >>>> Thanks. That fixed it. >>>> >>>> However, here is another problem I found while deploying Pyxida and >>>> running them for a while. >>>> >>>> In performMaintenance(), I loop through every neighbor and if >>>> (neighbor.getLastUpdateTime() < current_time - 10 minutes), then I >>>> remove that neighbor. >>> >>> >>>> >>>> >>>> In NCManager, I call getNeighborToPing, which randomly picks one of >>>> the >>>> neighbors (32 in this case). Pinging is performed every 10 seconds, so >>>> in 10 minutes, you can ping 60 nodes. However, if a neighbor A is not >>>> among those 60 nodes, it will get kicked out in performMaintenance(), >>>> although A is still alive (it just hasn't been contacted yet). >>>> >>>> There are some possible options. >>>> >>>> 1) Increase the timeout period from 10 minutes to ?? minutes. This >>>> doesn't quite work because say in Azureus, the number of neighbors is >>>> 512, so the timeout needs to be increased even more... It could be a >>>> function of MAX_NEIGHBORS... >>>> >>>> 2) When Pyxida replies to a ping request, it will update the >>>> neighbor's >>>> lastUpdateTime. This does not quite work either because it assumes a >>>> symmetric relationship which is most likely not the case. >>>> >>>> 3) Introduce some more logic to Pyxida so it remembers what nodes it >>>> contacted and does not send, say, more than 2 requests in 10 >>>> minutes to >>>> the same node. This solution isn't that great either. >>>> >>>> Let me know what you guys think. I am currently leaning towards option >>>> 1) but make the timeout period a function of MAX_NEIGHBORS, although I >>>> haven't quite found the perfect function yet. >>>> >>>> Here are some preliminary results. I compared the number of neighbors >>>> which are the same at each reported time (10 minutes). Before the fix, >>>> more than 50% of the neighbors changed within 10 minutes. After the >>>> fix, >>>> only about 30% of the neighbors changed within 10 minutes. It is more >>>> stable but I think 30% is still quite a big number. >>>> >>>> Sorry for the long email :) >>>> >>>> -Eric >>>> >>>> Peter Pietzuch wrote: >>>>> Hi, >>>>> >>>>> Eric Chan-Tin wrote: >>>>>> for (RemoteState<T> neighbor : neighbors) { //this is line 1029 >>>>>> if (neighbor.getLastUpdateTime() < (curr_time - >>>>>> MAINTENANCE_PERIOD)) { >>>>>> neighbors.remove(neighbor); >>>>>> } >>>>>> } >>>>> >>>>> IIRC you can't modify a collection while iterating over it using the >>>>> above for loop. Instead you have to use a proper >>>>> Iterator<RemoteState<T>> and use it's remove method. >>>>> >>>>> Cheers, >>>>> Peter >>>>> >>>>>> Jonathan Ledlie wrote: >>>>>>> On Nov 3, 2008, at 3:46 PM, Eric Chan-Tin wrote: >>>>>>> >>>>>>>> Jonathan, >>>>>>>> >>>>>>>> I can insert some code to fix this and it doesn't look too hard >>>>>>>> but I do >>>>>>>> have some concerns. >>>>>>>> >>>>>>>>> I don't remember observing this behavior in our PL >>>>>>>>> deployment, but that could have been because we had the ping >>>>>>>>> time set >>>>>>>>> to 1-2 minutes, IIRC. >>>>>>>> The default ping time was set to 10 seconds + some Gaussian >>>>>>>> distribution, >>>>>>>> so that's what I used, but changing the ping time to 1 minute >>>>>>>> or more >>>>>>>> will >>>>>>>> result in the same effect except it will take longer for the >>>>>>>> "cycle" to >>>>>>>> occur. >>>>>>>> >>>>>>>> What I have in mind was the following: >>>>>>>> >>>>>>>> 1) In NCClient.addNeighbor() would be changed to (in pseudocode) >>>>>>>> protected boolean addNeighbor(RemoteState<T> guy) { >>>>>>>> boolean added = false; >>>>>>>> //if neighbors.size() >= MAX_NEIGHBORS >>>>>>>> //Check last update time of each neighbor in >>>>>>>> neighbors >>>>>>>> //neighbor.getLastUpdateTime() >>>>>>>> //if any of them is older than some number >>>>>>>> TIMEOUT, >>>>>>>> remove the oldest one, then add "guy" to neighbors >>>>>>>> //else don't add "guy" to neighbors >>>>>>>> //else add "guy" to neighbors >>>>>>>> } >>>>>>>> >>>>>>> IMO this shouldn't be the behavior of this function because it >>>>>>> is called >>>>>>> "addNeighbor." Instead what could happen is that part of the >>>>>>> performMaintenance() function could remove old neighbors. >>>>>>> >>>>>>>> 2) In NCManager, getGossipNode() would just pick a random node >>>>>>>> from >>>>>>>> localNC.primaryNC.getNeighbors() to send a request to every 10 >>>>>>>> seconds. >>>>>>>> >>>>>>>> 3) Pyxida will still keep track of all GossipNodes in the >>>>>>>> network. It >>>>>>>> could >>>>>>>> also (say every 10 minutes) check its NCClient.neighbors list >>>>>>>> and see if >>>>>>>> any neighbor is older than TIMEOUT, remove it, then pick a random >>>>>>>> GossipNode to send a ping request. >>>>>>>> >>>>>>>> The main problem with this scheme is that one Pyxida node would >>>>>>>> most >>>>>>>> likely >>>>>>>> stay with the same set of neighbors all the time. Its coordinates >>>>>>>> would be >>>>>>>> greatly affected by those 32 nodes (I am assuming MAX_NEIGHBORS >>>>>>>> = 32). >>>>>>>> >>>>>>> Instead of (2), why not have the getNeighborToPing function >>>>>>> determine >>>>>>> the behavior? If it returns a node, then ping it. If it >>>>>>> returns null, >>>>>>> then have the NCManager pick some other node to use? If things are >>>>>>> hunky-dory, getNeighborToPing() would return a node (a current >>>>>>> member of >>>>>>> the neighbor set), if we're short on neighbors, then it would >>>>>>> return >>>>>>> null. The periodic maintenance would have already tossed old >>>>>>> nodes, so >>>>>>> this would be self-sustaining. >>>>>>> >>>>>>> The "problem" you mention is fairly central to network >>>>>>> coordinates in >>>>>>> general. Sure the coordinates will be a little better the more >>>>>>> neighbors you have. But in a very large system talking to >>>>>>> everybody >>>>>>> isn't feasible. Peter and I have a graph in one of the workshop >>>>>>> papers >>>>>>> which shows the effect of increasing the number of neighbors >>>>>>> >>>>>>>> I don't think the network would be partitioned, but it could >>>>>>>> happen. For >>>>>>>> example, the boostrap nodes have each other in their neighbors >>>>>>>> list. >>>>>>>> Every >>>>>>>> new node bootstrapping to the network would contact those >>>>>>>> bootstrapping >>>>>>>> nodes first and add them to their neighbors. So every Pyxida >>>>>>>> node could >>>>>>>> have as its neighbors the "Bootstrap nodes", which would make >>>>>>>> Pyxida >>>>>>>> become >>>>>>>> a centralized system. >>>>>>>> >>>>>>> As long as the initial bootstrap list is large enough, I think >>>>>>> you are >>>>>>> right, that partitions won't be a problem. >>>>>>> >>>>>>>> One last thing: When Pyxida receives a ping request from a >>>>>>>> node, it does >>>>>>>> not update its coordinates. Is there any particular reason for >>>>>>>> that? >>>>>>>> Is it >>>>>>>> because Pyxida will not be able to calculate the RTT? >>>>>>>> >>>>>>> Yes. >>>>>>> >>>>>>>> Also, there is no "gossiping" of other nodes' coordinates, >>>>>>>> right? So if A >>>>>>>> has B in its neighbors list, it will from time to time tell >>>>>>>> other nodes >>>>>>>> about B's coordinates. I realize this has a lot of >>>>>>>> complications but the >>>>>>>> original Vivaldi paper was very fuzzy about this point and the >>>>>>>> previous >>>>>>>> point that it seems it is up to the implementors to decide what >>>>>>>> to do. >>>>>>>> >>>>>>> I don't think we are doing this. It could be useful in certain >>>>>>> circumstance (e.g., picking potential new neighbors), but we're >>>>>>> not, IIRC. >>>>>>> >>>>>>>> I will have to think some more about this as there are a lot of >>>>>>>> nasty >>>>>>>> little side effects that could happen, but these are my thoughts. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> -Eric >>>>>>>> >>>>>>> Sounds good. >>>>>>> >>>>>>> -jonathan >>>>>>> >>>>>>>> On Nov 3 2008, Jonathan Ledlie wrote: >>>>>>>> >>>>>>>>> Eric, >>>>>>>>> >>>>>>>>> Based on a look at the code just now, I see where you are >>>>>>>>> coming from: >>>>>>>>> with these params on PL, there isn't anything stopping the >>>>>>>>> neighbor >>>>>>>>> list from circulating pretty quickly through all nodes. In the >>>>>>>>> Azureus implementation, the guys fed to the alg (fed to >>>>>>>>> processSample) >>>>>>>>> weren't changing much because they were coming from the >>>>>>>>> routing finger >>>>>>>>> tables. I don't remember observing this behavior in our PL >>>>>>>>> deployment, but that could have been because we had the ping >>>>>>>>> time set >>>>>>>>> to 1-2 minutes, IIRC. >>>>>>>>> >>>>>>>>> Do you want to insert some code that will fix this? Basically >>>>>>>>> we want >>>>>>>>> to only add a new neighbor if our oldest neighbor should be >>>>>>>>> tossed (or >>>>>>>>> if we haven't reached MAX_NEIGHBORS). It's probably best from a >>>>>>>>> clarity standpoint if addNeighbor does continue to actually >>>>>>>>> add the >>>>>>>>> neighbor in all cases. Probably the best thing to do is to have >>>>>>>>> NCClient drop old neighbors internally and have NCManager ask >>>>>>>>> it which >>>>>>>>> nodes should be pinged via getNeighborToPing. So NCClient would >>>>>>>>> continue to have the same behavior on processSample (it would not >>>>>>>>> reject new information) and NCManager and NCClient would >>>>>>>>> cooperate to >>>>>>>>> measure RTTs to current neighbors. We do want to make sure, >>>>>>>>> however, >>>>>>>>> that the gossip set stays large enough so that partitions >>>>>>>>> don't occur. >>>>>>>>> >>>>>>>>> Do you want to be added to the SourceForge Pyxida developers >>>>>>>>> to make >>>>>>>>> this change? (Please test it out carefully first.) >>>>>>>>> >>>>>>>>> -jonathan >>>>>>>>> >>>>>>>>> On Oct 30, 2008, at 11:32 AM, ext Eric Chan-Tin wrote: >>>>>>>>> >>>>>>>>>> Jonathan, >>>>>>>>>> >>>>>>>>>> From what I saw, eventually, you learn about all the nodes in >>>>>>>>>> the >>>>>>>>>> network. Every 10 seconds, Pyxida picks a random "Gossip" >>>>>>>>>> node from >>>>>>>>>> its >>>>>>>>>> list and sends a request. The Gossip node will reply back >>>>>>>>>> with an >>>>>>>>>> "upNeighbor", that is, a "Gossip" node that is "up" or with 10% >>>>>>>>>> probability, a node that is "pending". Since naturally, all >>>>>>>>>> the nodes >>>>>>>>>> bootstrap off the same set of nodes, every node will >>>>>>>>>> eventually know >>>>>>>>>> of >>>>>>>>>> every other node. >>>>>>>>>> >>>>>>>>>> Pyxida might not be rotating through everybody continuously >>>>>>>>>> (although >>>>>>>>>> eventually, since it knows everybody, it will have added >>>>>>>>>> everybody to >>>>>>>>>> its Neighbors list at some point), but could be removing a >>>>>>>>>> node that >>>>>>>>>> is >>>>>>>>>> "alive". So, if MAX_NEIGHBORS = 32, let's call them A_1, A_2, >>>>>>>>>> ..., >>>>>>>>>> A_32. >>>>>>>>>> Pyxida picks a random Gossip node B, sends it a request and B >>>>>>>>>> replies >>>>>>>>>> back. Pyxida will add B to its Neighbors list, see that size of >>>>>>>>>> Neighbors > MAX_NEIGHBORS, kick A_1 out. Then from those 32 >>>>>>>>>> neighbors >>>>>>>>>> A_2, ...., A_32, B, it will calculate the force, and apply it >>>>>>>>>> to the >>>>>>>>>> current system coordinates. Thus, if every 10 seconds, Pyxida >>>>>>>>>> picks a >>>>>>>>>> different set of nodes C, D, E, etc... it would keep kicking >>>>>>>>>> nodes out >>>>>>>>>> of the neighbors list. >>>>>>>>>> >>>>>>>>>> It's the "kicking A_1 out" part that seemed a bit confusing. >>>>>>>>>> Churn >>>>>>>>>> is a >>>>>>>>>> good reason but you don't check that A_1 might still be alive >>>>>>>>>> or maybe >>>>>>>>>> A_1 was just added. I have been printing the neighbors list >>>>>>>>>> every 10 >>>>>>>>>> minutes, and they are completely different between those 10 >>>>>>>>>> minutes. >>>>>>>>>> >>>>>>>>>> One possible problem might be that I am choosing the value of >>>>>>>>>> MAX_NEIGHBORS to be too small. Originally it was set to 512 >>>>>>>>>> (which was >>>>>>>>>> meant for Azureus). Since I am using Planetlab where there >>>>>>>>>> are about >>>>>>>>>> 400-500 "stable" nodes, I changed the value to 32. But I >>>>>>>>>> don't think >>>>>>>>>> that should be a problem since log(n) = log(512) = 9. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> >>>>>>>>>> -Eric >>>>>>>>>> >>>>>>>>>> Jonathan Ledlie wrote: >>>>>>>>>>> Eric, >>>>>>>>>>> >>>>>>>>>>> Glad to hear it. Let Peter and me know if you publish >>>>>>>>>>> anything using >>>>>>>>>>> Pyxida and we can put it on the web page. >>>>>>>>>>> >>>>>>>>>>> "Gossip" nodes are for learning about the network. >>>>>>>>>>> "Neighbors" are >>>>>>>>>>> for >>>>>>>>>>> constructing the coordinate. >>>>>>>>>>> >>>>>>>>>>> So, IIRC, the gossip process learns about nodes that may or >>>>>>>>>>> may not >>>>>>>>>>> be >>>>>>>>>>> applied to the neighbor list. That is, we may eventually learn >>>>>>>>>>> about a >>>>>>>>>>> large portion of the network, via gossiping, but should only >>>>>>>>>>> be using >>>>>>>>>>> MAX_NEIGHBORS at any given time for coordinate computation. >>>>>>>>>>> The >>>>>>>>>>> reason >>>>>>>>>>> we don't want to use a fixed set of neighbors for coordinate >>>>>>>>>>> computation >>>>>>>>>>> is because nodes may come and go from the system. However, >>>>>>>>>>> it should >>>>>>>>>>> not be the case that we just rotate through everybody, because >>>>>>>>>>> then, as >>>>>>>>>>> you note, we'd be computing against everyone's coordinate/ >>>>>>>>>>> latency, >>>>>>>>>>> and >>>>>>>>>>> this is counter to the point of network coordinates. With this >>>>>>>>>>> gradual >>>>>>>>>>> transience as a desirable quality in mind, I am pretty sure we >>>>>>>>>>> maintain >>>>>>>>>>> neighbors in a fairly fixed manner. Does this not seem to >>>>>>>>>>> be the >>>>>>>>>>> case >>>>>>>>>>> to you? >>>>>>>>>>> >>>>>>>>>>> -jonathan >>>>>>>>>>> >>>>>>>>>>> On Oct 29, 2008, at 9:20 PM, Eric Chan-Tin wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi Jonathan and Peter, >>>>>>>>>>>> >>>>>>>>>>>> First of all, just wanted to say that Pyxida has been working >>>>>>>>>>>> great on >>>>>>>>>>>> Planetlab for the past 4 months or so. Getting good data and >>>>>>>>>>>> experiments >>>>>>>>>>>> going well. >>>>>>>>>>>> >>>>>>>>>>>> However, I do have one question/comment. >>>>>>>>>>>> >>>>>>>>>>>> Every Pyxida node contains a list of Neighbors (in >>>>>>>>>>>> NCClient) and a >>>>>>>>>>>> list >>>>>>>>>>>> of Gossip nodes (they are called neighbors in NCManager but >>>>>>>>>>>> I will >>>>>>>>>>>> call >>>>>>>>>>>> them gossip for now so as not to confuse with the 'real' >>>>>>>>>>>> neighbors >>>>>>>>>>>> in >>>>>>>>>>>> NCClient). Every 10 seconds, Pyxida will pick a random node >>>>>>>>>>>> (call >>>>>>>>>>>> it A) >>>>>>>>>>>> from Gossip and send a gossip request (basically asking for >>>>>>>>>>>> rtt, >>>>>>>>>>>> coordinate, error). But Gossip contains all the nodes in the >>>>>>>>>>>> network. >>>>>>>>>>>> The comment in the code said that was to avoid Pyxida becoming >>>>>>>>>>>> lonely >>>>>>>>>>>> and no-one to gossip with, which makes sense. >>>>>>>>>>>> >>>>>>>>>>>> Once A responds, Pyxida will add A to its Neighbors (if >>>>>>>>>>>> Neighbors >>>>>>>>>>>> does >>>>>>>>>>>> not contain A) and remove the first element from the >>>>>>>>>>>> Neighbors list. >>>>>>>>>>>> This seems a little bit odd. Was there any particular >>>>>>>>>>>> reasoning >>>>>>>>>>>> behind >>>>>>>>>>>> this? Because it would seem like you're getting the >>>>>>>>>>>> coordinate from >>>>>>>>>>>> every node in the network eventually. >>>>>>>>>>>> >>>>>>>>>>>> Thanks a lot, >>>>>>>>>>>> >>>>>>>>>>>> -Eric >>>>>>>>>>>> >>>>>>>>>>>> Jonathan Ledlie wrote: >>>>>>>>>>>>> Your inclination is correct. See forwarded message. >>>>>>>>>>>>> >>>>>>>>>>>>> -jonathan >>>>>>>>>>>>> >>>>>>>>>>>>> On Apr 9, 2008, at 1:53 PM, ext Eric Chan-Tin wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Thank you for the clarification. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I've been reading through your published papers and code >>>>>>>>>>>>>> and I >>>>>>>>>>>>>> must >>>>>>>>>>>>>> say I learned a lot about how Pyxida works :) >>>>>>>>>>>>>> >>>>>>>>>>>>>> I still have a quick and easy (hopefully) question. Each >>>>>>>>>>>>>> Pyxida >>>>>>>>>>>>>> node >>>>>>>>>>>>>> has a small amount of neighbors and learn about other nodes >>>>>>>>>>>>>> through >>>>>>>>>>>>>> other nodes. I am running on about 200 machines, and it >>>>>>>>>>>>>> seems >>>>>>>>>>>>>> like all >>>>>>>>>>>>>> the Pyxida nodes are adding each other to their >>>>>>>>>>>>>> upNeighbors set. >>>>>>>>>>>>>> NCClient.java has the MAX_NEIGHBORS set to 512. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I wanted to make sure that I am not misreading the code, but >>>>>>>>>>>>>> shouldn't >>>>>>>>>>>>>> each Pyxida node keep a small number of neighbors in its >>>>>>>>>>>>>> upNeighbors >>>>>>>>>>>>>> set. Else, it will be a lot of updates to send a >>>>>>>>>>>>>> gossipMessage >>>>>>>>>>>>>> to all >>>>>>>>>>>>>> 200 neighbors, and it also means that a Pyxida node has >>>>>>>>>>>>>> the whole >>>>>>>>>>>>>> network in its upNeighbors set - that seems wrong. I am >>>>>>>>>>>>>> inclined >>>>>>>>>>>>>> to >>>>>>>>>>>>>> change the value of MAX_NEIGHBORS but wanted to make sure >>>>>>>>>>>>>> that my >>>>>>>>>>>>>> understanding is correct first. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thank you, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Eric >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> ------------------------------------------------------------------------- >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> This SF.Net email is sponsored by the Moblin Your Move >>>>>>>>>>>> Developer's >>>>>>>>>>>> challenge Build the coolest Linux based applications with >>>>>>>>>>>> Moblin >>>>>>>>>>>> SDK & >>>>>>>>>>>> win great prizes Grand prize is a trip for two to an Open >>>>>>>>>>>> Source >>>>>>>>>>>> event >>>>>>>>>>>> anywhere in the world >>>>>>>>>>>> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> Pyxida-users mailing >>>>>>>>>>>> list Pyx...@li... >>>>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/pyxida-users >>>>>>>> ------------------------------------------------------------------------- >>>>>>>> This SF.Net email is sponsored by the Moblin Your Move >>>>>>>> Developer's challenge Build the coolest Linux based applications >>>>>>>> with Moblin SDK & win great prizes Grand prize is a trip for two >>>>>>>> to an Open Source event anywhere in the world >>>>>>>> http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ >>>>>>>> Pyxida-users mailing list Pyx...@li... >>>>>>>> https://lists.sourceforge.net/lists/listinfo/pyxida-users >>>>>> ------------------------------------------------------------------------------ >>>>>> SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las >>>>>> Vegas, Nevada. The future of the web can't happen without you. >>>>>> Join us at MIX09 to help pave the way to the Next Web now. Learn >>>>>> more and register at >>>>>> http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/ _______________________________________________ >>>>>> Pyxida-users mailing list Pyx...@li... >>>>>> https://lists.sourceforge.net/lists/listinfo/pyxida-users >>>>> >>>>> >>> |
From: Jonathan L. <jon...@ee...> - 2008-12-10 23:14:21
|
Hmm. The advantage of lastPingTime is that you learn whether or not you've tried to contact a neighbor and they haven't responded. Then you can use that knowledge to determine whether or not to toss them. Maybe I'm not understanding your suggestion then. -jonathan On Dec 10, 2008, at 12:51 PM, Eric Chan-Tin wrote: > That's a good idea. Here is a little twist on it. > > Put a "weight" on each neighbor, based on the lastUpdateTime. So if > currentTime - neighbor's lastUpdateTime, then the weight is higher. > Then in getNeighborToPing, neighbor's with higher weight have a > higher probability of getting picked than neighbor's with lower > weight. > > The weight could be ( (currentTime - lastUpdateTime) / 15 minutes). > It's not 10 minutes so as to avoid a probability of 1 if the > lastUpdateTime was the previous performMaintenance. This should also > avoid the problem of always pinging the oldest neighbor since it's > about probability (weight). > > It seems a little bit less complicated than having lastPingTime and > MIN_UPDATE_TIME. > > -Eric > > On Dec 10 2008, Jonathan Ledlie wrote: > >> >> This is an interesting situation. Let's think about the goal. >> The goal is to keep pinging the same nodes as long as they are >> alive, and to replace a node soon after it dies, keeping the total >> number of up neighbors at approximately MAX_NEIGHBORS. >> >> How about the following: >> - when you pick a neighbor to ping, you set a time on that event >> (in a new per-neighbor timestamp LastPingTime) >> - when you first insert a neighbor, set this timestamp to 0. >> - when you loop in performMaintenance, remove nodes whose >> LastPingTime - LastUpdateTime > threshold (like 10 minutes) >> >> - when picking a node, pick the guy with the oldest LastPingTime >> (not sure about this one -- I don't like situations where you pick >> the absolute *oldest* because then we could get in a situation >> where an old guy sits there, waiting to be kicked out, and we keep >> picking him to be pinged; I'd prefer to pick randomly, or random >> older guys). >> - a simple way around this is to simply have a self-adjusting >> threshold: so, we walk through the nodes, and everyone whose >> LastPingTime is older than MIN_UPDATE_TIME_TO_PING ago (a relative >> time) get added to a set; we pick a guy randomly from the set to >> ping. If the set is empty, we increase MIN_UPDATE_TIME_TO_PING, >> and wait for the next time the function is called. If the set is >> too large (relative to a fraction of MAX_NEIGHBORS), we decrease it. >> >> We still might get some churn from lost UDP packets. You might >> also give nodes a second chance to be pinged by including >> LastPingTime - LastUpdateTime > 1 minute in the set. Note that >> increases and decreases in MIN_UPDATE_TIME_TO_PING shouldn't be >> affected by the number of nodes that fall into this category. >> >> We still probably need a way for new potential neighbors to be >> rejected if our current set of neighbors is fine. Because I still >> want addNeigbor to really add the neighbor, how about putting a >> check before calling that to see if we need more neighbors. >> >> What thinks? >> >> -jonathan >> >> On Dec 9, 2008, at 11:37 PM, ext Eric Chan-Tin wrote: >> >>> Peter, >>> >>> Thanks. That fixed it. >>> >>> However, here is another problem I found while deploying Pyxida and >>> running them for a while. >>> >>> In performMaintenance(), I loop through every neighbor and if >>> (neighbor.getLastUpdateTime() < current_time - 10 minutes), then I >>> remove that neighbor. >> >> >>> >>> >>> In NCManager, I call getNeighborToPing, which randomly picks one >>> of the >>> neighbors (32 in this case). Pinging is performed every 10 >>> seconds, so >>> in 10 minutes, you can ping 60 nodes. However, if a neighbor A is >>> not >>> among those 60 nodes, it will get kicked out in >>> performMaintenance(), >>> although A is still alive (it just hasn't been contacted yet). >>> >>> There are some possible options. >>> >>> 1) Increase the timeout period from 10 minutes to ?? minutes. This >>> doesn't quite work because say in Azureus, the number of neighbors >>> is >>> 512, so the timeout needs to be increased even more... It could be a >>> function of MAX_NEIGHBORS... >>> >>> 2) When Pyxida replies to a ping request, it will update the >>> neighbor's >>> lastUpdateTime. This does not quite work either because it assumes a >>> symmetric relationship which is most likely not the case. >>> >>> 3) Introduce some more logic to Pyxida so it remembers what nodes it >>> contacted and does not send, say, more than 2 requests in 10 >>> minutes to >>> the same node. This solution isn't that great either. >>> >>> Let me know what you guys think. I am currently leaning towards >>> option >>> 1) but make the timeout period a function of MAX_NEIGHBORS, >>> although I >>> haven't quite found the perfect function yet. >>> >>> Here are some preliminary results. I compared the number of >>> neighbors >>> which are the same at each reported time (10 minutes). Before the >>> fix, >>> more than 50% of the neighbors changed within 10 minutes. After >>> the fix, >>> only about 30% of the neighbors changed within 10 minutes. It is >>> more >>> stable but I think 30% is still quite a big number. >>> >>> Sorry for the long email :) >>> >>> -Eric >>> >>> Peter Pietzuch wrote: >>>> Hi, >>>> >>>> Eric Chan-Tin wrote: >>>>> for (RemoteState<T> neighbor : neighbors) { //this is line 1029 >>>>> if (neighbor.getLastUpdateTime() < (curr_time - >>>>> MAINTENANCE_PERIOD)) { >>>>> neighbors.remove(neighbor); >>>>> } >>>>> } >>>> >>>> IIRC you can't modify a collection while iterating over it using >>>> the >>>> above for loop. Instead you have to use a proper >>>> Iterator<RemoteState<T>> and use it's remove method. >>>> >>>> Cheers, >>>> Peter >>>> >>>>> Jonathan Ledlie wrote: >>>>>> On Nov 3, 2008, at 3:46 PM, Eric Chan-Tin wrote: >>>>>> >>>>>>> Jonathan, >>>>>>> >>>>>>> I can insert some code to fix this and it doesn't look too >>>>>>> hard but I do >>>>>>> have some concerns. >>>>>>> >>>>>>>> I don't remember observing this behavior in our PL >>>>>>>> deployment, but that could have been because we had the ping >>>>>>>> time set >>>>>>>> to 1-2 minutes, IIRC. >>>>>>> The default ping time was set to 10 seconds + some Gaussian >>>>>>> distribution, >>>>>>> so that's what I used, but changing the ping time to 1 minute >>>>>>> or more >>>>>>> will >>>>>>> result in the same effect except it will take longer for the >>>>>>> "cycle" to >>>>>>> occur. >>>>>>> >>>>>>> What I have in mind was the following: >>>>>>> >>>>>>> 1) In NCClient.addNeighbor() would be changed to (in pseudocode) >>>>>>> protected boolean addNeighbor(RemoteState<T> guy) { >>>>>>> boolean added = false; >>>>>>> //if neighbors.size() >= MAX_NEIGHBORS >>>>>>> //Check last update time of each neighbor in >>>>>>> neighbors >>>>>>> //neighbor.getLastUpdateTime() >>>>>>> //if any of them is older than some number >>>>>>> TIMEOUT, >>>>>>> remove the oldest one, then add "guy" to neighbors >>>>>>> //else don't add "guy" to neighbors >>>>>>> //else add "guy" to neighbors >>>>>>> } >>>>>>> >>>>>> IMO this shouldn't be the behavior of this function because it >>>>>> is called >>>>>> "addNeighbor." Instead what could happen is that part of the >>>>>> performMaintenance() function could remove old neighbors. >>>>>> >>>>>>> 2) In NCManager, getGossipNode() would just pick a random >>>>>>> node from >>>>>>> localNC.primaryNC.getNeighbors() to send a request to every >>>>>>> 10 seconds. >>>>>>> >>>>>>> 3) Pyxida will still keep track of all GossipNodes in the >>>>>>> network. It >>>>>>> could >>>>>>> also (say every 10 minutes) check its NCClient.neighbors list >>>>>>> and see if >>>>>>> any neighbor is older than TIMEOUT, remove it, then pick a >>>>>>> random >>>>>>> GossipNode to send a ping request. >>>>>>> >>>>>>> The main problem with this scheme is that one Pyxida node >>>>>>> would most >>>>>>> likely >>>>>>> stay with the same set of neighbors all the time. Its >>>>>>> coordinates >>>>>>> would be >>>>>>> greatly affected by those 32 nodes (I am assuming >>>>>>> MAX_NEIGHBORS = 32). >>>>>>> >>>>>> Instead of (2), why not have the getNeighborToPing function >>>>>> determine >>>>>> the behavior? If it returns a node, then ping it. If it >>>>>> returns null, >>>>>> then have the NCManager pick some other node to use? If things >>>>>> are >>>>>> hunky-dory, getNeighborToPing() would return a node (a current >>>>>> member of >>>>>> the neighbor set), if we're short on neighbors, then it would >>>>>> return >>>>>> null. The periodic maintenance would have already tossed old >>>>>> nodes, so >>>>>> this would be self-sustaining. >>>>>> >>>>>> The "problem" you mention is fairly central to network >>>>>> coordinates in >>>>>> general. Sure the coordinates will be a little better the more >>>>>> neighbors you have. But in a very large system talking to >>>>>> everybody >>>>>> isn't feasible. Peter and I have a graph in one of the >>>>>> workshop papers >>>>>> which shows the effect of increasing the number of neighbors >>>>>> >>>>>>> I don't think the network would be partitioned, but it could >>>>>>> happen. For >>>>>>> example, the boostrap nodes have each other in their >>>>>>> neighbors list. >>>>>>> Every >>>>>>> new node bootstrapping to the network would contact those >>>>>>> bootstrapping >>>>>>> nodes first and add them to their neighbors. So every Pyxida >>>>>>> node could >>>>>>> have as its neighbors the "Bootstrap nodes", which would make >>>>>>> Pyxida >>>>>>> become >>>>>>> a centralized system. >>>>>>> >>>>>> As long as the initial bootstrap list is large enough, I think >>>>>> you are >>>>>> right, that partitions won't be a problem. >>>>>> >>>>>>> One last thing: When Pyxida receives a ping request from a >>>>>>> node, it does >>>>>>> not update its coordinates. Is there any particular reason >>>>>>> for that? >>>>>>> Is it >>>>>>> because Pyxida will not be able to calculate the RTT? >>>>>>> >>>>>> Yes. >>>>>> >>>>>>> Also, there is no "gossiping" of other nodes' coordinates, >>>>>>> right? So if A >>>>>>> has B in its neighbors list, it will from time to time tell >>>>>>> other nodes >>>>>>> about B's coordinates. I realize this has a lot of >>>>>>> complications but the >>>>>>> original Vivaldi paper was very fuzzy about this point and >>>>>>> the previous >>>>>>> point that it seems it is up to the implementors to decide >>>>>>> what to do. >>>>>>> >>>>>> I don't think we are doing this. It could be useful in certain >>>>>> circumstance (e.g., picking potential new neighbors), but >>>>>> we're not, IIRC. >>>>>> >>>>>>> I will have to think some more about this as there are a lot >>>>>>> of nasty >>>>>>> little side effects that could happen, but these are my >>>>>>> thoughts. >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> -Eric >>>>>>> >>>>>> Sounds good. >>>>>> >>>>>> -jonathan >>>>>> >>>>>>> On Nov 3 2008, Jonathan Ledlie wrote: >>>>>>> >>>>>>>> Eric, >>>>>>>> >>>>>>>> Based on a look at the code just now, I see where you are >>>>>>>> coming from: >>>>>>>> with these params on PL, there isn't anything stopping the >>>>>>>> neighbor >>>>>>>> list from circulating pretty quickly through all nodes. In the >>>>>>>> Azureus implementation, the guys fed to the alg (fed to >>>>>>>> processSample) >>>>>>>> weren't changing much because they were coming from the >>>>>>>> routing finger >>>>>>>> tables. I don't remember observing this behavior in our PL >>>>>>>> deployment, but that could have been because we had the ping >>>>>>>> time set >>>>>>>> to 1-2 minutes, IIRC. >>>>>>>> >>>>>>>> Do you want to insert some code that will fix this? >>>>>>>> Basically we want >>>>>>>> to only add a new neighbor if our oldest neighbor should be >>>>>>>> tossed (or >>>>>>>> if we haven't reached MAX_NEIGHBORS). It's probably best >>>>>>>> from a >>>>>>>> clarity standpoint if addNeighbor does continue to actually >>>>>>>> add the >>>>>>>> neighbor in all cases. Probably the best thing to do is to >>>>>>>> have >>>>>>>> NCClient drop old neighbors internally and have NCManager >>>>>>>> ask it which >>>>>>>> nodes should be pinged via getNeighborToPing. So NCClient >>>>>>>> would >>>>>>>> continue to have the same behavior on processSample (it would >>>>>>>> not >>>>>>>> reject new information) and NCManager and NCClient would >>>>>>>> cooperate to >>>>>>>> measure RTTs to current neighbors. We do want to make sure, >>>>>>>> however, >>>>>>>> that the gossip set stays large enough so that partitions >>>>>>>> don't occur. >>>>>>>> >>>>>>>> Do you want to be added to the SourceForge Pyxida developers >>>>>>>> to make >>>>>>>> this change? (Please test it out carefully first.) >>>>>>>> >>>>>>>> -jonathan >>>>>>>> >>>>>>>> On Oct 30, 2008, at 11:32 AM, ext Eric Chan-Tin wrote: >>>>>>>> >>>>>>>>> Jonathan, >>>>>>>>> >>>>>>>>> From what I saw, eventually, you learn about all the nodes >>>>>>>>> in the >>>>>>>>> network. Every 10 seconds, Pyxida picks a random "Gossip" >>>>>>>>> node from >>>>>>>>> its >>>>>>>>> list and sends a request. The Gossip node will reply back >>>>>>>>> with an >>>>>>>>> "upNeighbor", that is, a "Gossip" node that is "up" or with >>>>>>>>> 10% >>>>>>>>> probability, a node that is "pending". Since naturally, all >>>>>>>>> the nodes >>>>>>>>> bootstrap off the same set of nodes, every node will >>>>>>>>> eventually know >>>>>>>>> of >>>>>>>>> every other node. >>>>>>>>> >>>>>>>>> Pyxida might not be rotating through everybody continuously >>>>>>>>> (although >>>>>>>>> eventually, since it knows everybody, it will have added >>>>>>>>> everybody to >>>>>>>>> its Neighbors list at some point), but could be removing a >>>>>>>>> node that >>>>>>>>> is >>>>>>>>> "alive". So, if MAX_NEIGHBORS = 32, let's call them A_1, >>>>>>>>> A_2, ..., >>>>>>>>> A_32. >>>>>>>>> Pyxida picks a random Gossip node B, sends it a request and >>>>>>>>> B replies >>>>>>>>> back. Pyxida will add B to its Neighbors list, see that size >>>>>>>>> of >>>>>>>>> Neighbors > MAX_NEIGHBORS, kick A_1 out. Then from those 32 >>>>>>>>> neighbors >>>>>>>>> A_2, ...., A_32, B, it will calculate the force, and apply >>>>>>>>> it to the >>>>>>>>> current system coordinates. Thus, if every 10 seconds, >>>>>>>>> Pyxida picks a >>>>>>>>> different set of nodes C, D, E, etc... it would keep >>>>>>>>> kicking nodes out >>>>>>>>> of the neighbors list. >>>>>>>>> >>>>>>>>> It's the "kicking A_1 out" part that seemed a bit >>>>>>>>> confusing. Churn >>>>>>>>> is a >>>>>>>>> good reason but you don't check that A_1 might still be >>>>>>>>> alive or maybe >>>>>>>>> A_1 was just added. I have been printing the neighbors list >>>>>>>>> every 10 >>>>>>>>> minutes, and they are completely different between those 10 >>>>>>>>> minutes. >>>>>>>>> >>>>>>>>> One possible problem might be that I am choosing the value of >>>>>>>>> MAX_NEIGHBORS to be too small. Originally it was set to 512 >>>>>>>>> (which was >>>>>>>>> meant for Azureus). Since I am using Planetlab where there >>>>>>>>> are about >>>>>>>>> 400-500 "stable" nodes, I changed the value to 32. But I >>>>>>>>> don't think >>>>>>>>> that should be a problem since log(n) = log(512) = 9. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> -Eric >>>>>>>>> >>>>>>>>> Jonathan Ledlie wrote: >>>>>>>>>> Eric, >>>>>>>>>> >>>>>>>>>> Glad to hear it. Let Peter and me know if you publish >>>>>>>>>> anything using >>>>>>>>>> Pyxida and we can put it on the web page. >>>>>>>>>> >>>>>>>>>> "Gossip" nodes are for learning about the network. >>>>>>>>>> "Neighbors" are >>>>>>>>>> for >>>>>>>>>> constructing the coordinate. >>>>>>>>>> >>>>>>>>>> So, IIRC, the gossip process learns about nodes that may >>>>>>>>>> or may not >>>>>>>>>> be >>>>>>>>>> applied to the neighbor list. That is, we may eventually >>>>>>>>>> learn >>>>>>>>>> about a >>>>>>>>>> large portion of the network, via gossiping, but should >>>>>>>>>> only be using >>>>>>>>>> MAX_NEIGHBORS at any given time for coordinate >>>>>>>>>> computation. The >>>>>>>>>> reason >>>>>>>>>> we don't want to use a fixed set of neighbors for coordinate >>>>>>>>>> computation >>>>>>>>>> is because nodes may come and go from the system. >>>>>>>>>> However, it should >>>>>>>>>> not be the case that we just rotate through everybody, >>>>>>>>>> because >>>>>>>>>> then, as >>>>>>>>>> you note, we'd be computing against everyone's coordinate/ >>>>>>>>>> latency, >>>>>>>>>> and >>>>>>>>>> this is counter to the point of network coordinates. With >>>>>>>>>> this >>>>>>>>>> gradual >>>>>>>>>> transience as a desirable quality in mind, I am pretty sure >>>>>>>>>> we >>>>>>>>>> maintain >>>>>>>>>> neighbors in a fairly fixed manner. Does this not seem to >>>>>>>>>> be the >>>>>>>>>> case >>>>>>>>>> to you? >>>>>>>>>> >>>>>>>>>> -jonathan >>>>>>>>>> >>>>>>>>>> On Oct 29, 2008, at 9:20 PM, Eric Chan-Tin wrote: >>>>>>>>>> >>>>>>>>>>> Hi Jonathan and Peter, >>>>>>>>>>> >>>>>>>>>>> First of all, just wanted to say that Pyxida has been >>>>>>>>>>> working >>>>>>>>>>> great on >>>>>>>>>>> Planetlab for the past 4 months or so. Getting good data and >>>>>>>>>>> experiments >>>>>>>>>>> going well. >>>>>>>>>>> >>>>>>>>>>> However, I do have one question/comment. >>>>>>>>>>> >>>>>>>>>>> Every Pyxida node contains a list of Neighbors (in >>>>>>>>>>> NCClient) and a >>>>>>>>>>> list >>>>>>>>>>> of Gossip nodes (they are called neighbors in NCManager >>>>>>>>>>> but I will >>>>>>>>>>> call >>>>>>>>>>> them gossip for now so as not to confuse with the 'real' >>>>>>>>>>> neighbors >>>>>>>>>>> in >>>>>>>>>>> NCClient). Every 10 seconds, Pyxida will pick a random >>>>>>>>>>> node (call >>>>>>>>>>> it A) >>>>>>>>>>> from Gossip and send a gossip request (basically asking >>>>>>>>>>> for rtt, >>>>>>>>>>> coordinate, error). But Gossip contains all the nodes in the >>>>>>>>>>> network. >>>>>>>>>>> The comment in the code said that was to avoid Pyxida >>>>>>>>>>> becoming >>>>>>>>>>> lonely >>>>>>>>>>> and no-one to gossip with, which makes sense. >>>>>>>>>>> >>>>>>>>>>> Once A responds, Pyxida will add A to its Neighbors (if >>>>>>>>>>> Neighbors >>>>>>>>>>> does >>>>>>>>>>> not contain A) and remove the first element from the >>>>>>>>>>> Neighbors list. >>>>>>>>>>> This seems a little bit odd. Was there any particular >>>>>>>>>>> reasoning >>>>>>>>>>> behind >>>>>>>>>>> this? Because it would seem like you're getting the >>>>>>>>>>> coordinate from >>>>>>>>>>> every node in the network eventually. >>>>>>>>>>> >>>>>>>>>>> Thanks a lot, >>>>>>>>>>> >>>>>>>>>>> -Eric >>>>>>>>>>> >>>>>>>>>>> Jonathan Ledlie wrote: >>>>>>>>>>>> Your inclination is correct. See forwarded message. >>>>>>>>>>>> >>>>>>>>>>>> -jonathan >>>>>>>>>>>> >>>>>>>>>>>> On Apr 9, 2008, at 1:53 PM, ext Eric Chan-Tin wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Thank you for the clarification. >>>>>>>>>>>>> >>>>>>>>>>>>> I've been reading through your published papers and >>>>>>>>>>>>> code and I >>>>>>>>>>>>> must >>>>>>>>>>>>> say I learned a lot about how Pyxida works :) >>>>>>>>>>>>> >>>>>>>>>>>>> I still have a quick and easy (hopefully) question. >>>>>>>>>>>>> Each Pyxida >>>>>>>>>>>>> node >>>>>>>>>>>>> has a small amount of neighbors and learn about other >>>>>>>>>>>>> nodes >>>>>>>>>>>>> through >>>>>>>>>>>>> other nodes. I am running on about 200 machines, and it >>>>>>>>>>>>> seems >>>>>>>>>>>>> like all >>>>>>>>>>>>> the Pyxida nodes are adding each other to their >>>>>>>>>>>>> upNeighbors set. >>>>>>>>>>>>> NCClient.java has the MAX_NEIGHBORS set to 512. >>>>>>>>>>>>> >>>>>>>>>>>>> I wanted to make sure that I am not misreading the code, >>>>>>>>>>>>> but >>>>>>>>>>>>> shouldn't >>>>>>>>>>>>> each Pyxida node keep a small number of neighbors in its >>>>>>>>>>>>> upNeighbors >>>>>>>>>>>>> set. Else, it will be a lot of updates to send a >>>>>>>>>>>>> gossipMessage >>>>>>>>>>>>> to all >>>>>>>>>>>>> 200 neighbors, and it also means that a Pyxida node has >>>>>>>>>>>>> the whole >>>>>>>>>>>>> network in its upNeighbors set - that seems wrong. I am >>>>>>>>>>>>> inclined >>>>>>>>>>>>> to >>>>>>>>>>>>> change the value of MAX_NEIGHBORS but wanted to make >>>>>>>>>>>>> sure that my >>>>>>>>>>>>> understanding is correct first. >>>>>>>>>>>>> >>>>>>>>>>>>> Thank you, >>>>>>>>>>>>> >>>>>>>>>>>>> Eric >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> ------------------------------------------------------------------------- >>>>>>>>>>> >>>>>>>>>>> This SF.Net email is sponsored by the Moblin Your Move >>>>>>>>>>> Developer's >>>>>>>>>>> challenge Build the coolest Linux based applications with >>>>>>>>>>> Moblin >>>>>>>>>>> SDK & >>>>>>>>>>> win great prizes Grand prize is a trip for two to an Open >>>>>>>>>>> Source >>>>>>>>>>> event >>>>>>>>>>> anywhere in the world >>>>>>>>>>> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >>>>>>>>>>> _______________________________________________ Pyxida- >>>>>>>>>>> users mailing >>>>>>>>>>> list Pyx...@li... >>>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/pyxida-users >>>>>>> >>>>>>> ------------------------------------------------------------------------- This >>>>>>> SF.Net email is sponsored by the Moblin Your Move Developer's >>>>>>> challenge Build the coolest Linux based applications with >>>>>>> Moblin SDK & win great prizes Grand prize is a trip for two to >>>>>>> an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ >>>>>>> _______________________________________________ Pyxida-users >>>>>>> mailing list Pyx...@li... https://lists.sourceforge.net/lists/listinfo/pyxida-users >>>>> >>>>> ------------------------------------------------------------------------------ SF >>>>> .Net email is Sponsored by MIX09, March 18-20, 2009 in Las >>>>> Vegas, Nevada. The future of the web can't happen without you. >>>>> Join us at MIX09 to help pave the way to the Next Web now. Learn >>>>> more and register at http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/ >>>>> _______________________________________________ Pyxida-users >>>>> mailing list Pyx...@li... https://lists.sourceforge.net/lists/listinfo/pyxida-users >>>> >>>> >> |
From: Eric Chan-T. <dch...@cs...> - 2008-12-10 18:09:12
|
That's a good idea. Here is a little twist on it. Put a "weight" on each neighbor, based on the lastUpdateTime. So if currentTime - neighbor's lastUpdateTime, then the weight is higher. Then in getNeighborToPing, neighbor's with higher weight have a higher probability of getting picked than neighbor's with lower weight. The weight could be ( (currentTime - lastUpdateTime) / 15 minutes). It's not 10 minutes so as to avoid a probability of 1 if the lastUpdateTime was the previous performMaintenance. This should also avoid the problem of always pinging the oldest neighbor since it's about probability (weight). It seems a little bit less complicated than having lastPingTime and MIN_UPDATE_TIME. -Eric On Dec 10 2008, Jonathan Ledlie wrote: > >This is an interesting situation. Let's think about the goal. The >goal is to keep pinging the same nodes as long as they are alive, and >to replace a node soon after it dies, keeping the total number of up >neighbors at approximately MAX_NEIGHBORS. > >How about the following: >- when you pick a neighbor to ping, you set a time on that event (in a >new per-neighbor timestamp LastPingTime) >- when you first insert a neighbor, set this timestamp to 0. >- when you loop in performMaintenance, remove nodes whose LastPingTime >- LastUpdateTime > threshold (like 10 minutes) > >- when picking a node, pick the guy with the oldest LastPingTime (not >sure about this one -- I don't like situations where you pick the >absolute *oldest* because then we could get in a situation where an >old guy sits there, waiting to be kicked out, and we keep picking him >to be pinged; I'd prefer to pick randomly, or random older guys). > - a simple way around this is to simply have a self-adjusting >threshold: so, we walk through the nodes, and everyone whose >LastPingTime is older than MIN_UPDATE_TIME_TO_PING ago (a relative >time) get added to a set; we pick a guy randomly from the set to >ping. If the set is empty, we increase MIN_UPDATE_TIME_TO_PING, and >wait for the next time the function is called. If the set is too >large (relative to a fraction of MAX_NEIGHBORS), we decrease it. > >We still might get some churn from lost UDP packets. You might also >give nodes a second chance to be pinged by including LastPingTime - >LastUpdateTime > 1 minute in the set. Note that increases and >decreases in MIN_UPDATE_TIME_TO_PING shouldn't be affected by the >number of nodes that fall into this category. > >We still probably need a way for new potential neighbors to be >rejected if our current set of neighbors is fine. Because I still >want addNeigbor to really add the neighbor, how about putting a check >before calling that to see if we need more neighbors. > >What thinks? > >-jonathan > >On Dec 9, 2008, at 11:37 PM, ext Eric Chan-Tin wrote: > >> Peter, >> >> Thanks. That fixed it. >> >> However, here is another problem I found while deploying Pyxida and >> running them for a while. >> >> In performMaintenance(), I loop through every neighbor and if >> (neighbor.getLastUpdateTime() < current_time - 10 minutes), then I >> remove that neighbor. > > >> >> >> In NCManager, I call getNeighborToPing, which randomly picks one of >> the >> neighbors (32 in this case). Pinging is performed every 10 seconds, so >> in 10 minutes, you can ping 60 nodes. However, if a neighbor A is not >> among those 60 nodes, it will get kicked out in performMaintenance(), >> although A is still alive (it just hasn't been contacted yet). >> >> There are some possible options. >> >> 1) Increase the timeout period from 10 minutes to ?? minutes. This >> doesn't quite work because say in Azureus, the number of neighbors is >> 512, so the timeout needs to be increased even more... It could be a >> function of MAX_NEIGHBORS... >> >> 2) When Pyxida replies to a ping request, it will update the >> neighbor's >> lastUpdateTime. This does not quite work either because it assumes a >> symmetric relationship which is most likely not the case. >> >> 3) Introduce some more logic to Pyxida so it remembers what nodes it >> contacted and does not send, say, more than 2 requests in 10 minutes >> to >> the same node. This solution isn't that great either. >> >> Let me know what you guys think. I am currently leaning towards option >> 1) but make the timeout period a function of MAX_NEIGHBORS, although I >> haven't quite found the perfect function yet. >> >> Here are some preliminary results. I compared the number of neighbors >> which are the same at each reported time (10 minutes). Before the fix, >> more than 50% of the neighbors changed within 10 minutes. After the >> fix, >> only about 30% of the neighbors changed within 10 minutes. It is more >> stable but I think 30% is still quite a big number. >> >> Sorry for the long email :) >> >> -Eric >> >> Peter Pietzuch wrote: >>> Hi, >>> >>> Eric Chan-Tin wrote: >>>> for (RemoteState<T> neighbor : neighbors) { //this is line 1029 >>>> if (neighbor.getLastUpdateTime() < (curr_time - >>>> MAINTENANCE_PERIOD)) { >>>> neighbors.remove(neighbor); >>>> } >>>> } >>> >>> IIRC you can't modify a collection while iterating over it using the >>> above for loop. Instead you have to use a proper >>> Iterator<RemoteState<T>> and use it's remove method. >>> >>> Cheers, >>> Peter >>> >>>> Jonathan Ledlie wrote: >>>>> On Nov 3, 2008, at 3:46 PM, Eric Chan-Tin wrote: >>>>> >>>>>> Jonathan, >>>>>> >>>>>> I can insert some code to fix this and it doesn't look too hard >>>>>> but I do >>>>>> have some concerns. >>>>>> >>>>>>> I don't remember observing this behavior in our PL >>>>>>> deployment, but that could have been because we had the ping >>>>>>> time set >>>>>>> to 1-2 minutes, IIRC. >>>>>> The default ping time was set to 10 seconds + some Gaussian >>>>>> distribution, >>>>>> so that's what I used, but changing the ping time to 1 minute or >>>>>> more >>>>>> will >>>>>> result in the same effect except it will take longer for the >>>>>> "cycle" to >>>>>> occur. >>>>>> >>>>>> What I have in mind was the following: >>>>>> >>>>>> 1) In NCClient.addNeighbor() would be changed to (in pseudocode) >>>>>> protected boolean addNeighbor(RemoteState<T> guy) { >>>>>> boolean added = false; >>>>>> //if neighbors.size() >= MAX_NEIGHBORS >>>>>> //Check last update time of each neighbor in >>>>>> neighbors >>>>>> //neighbor.getLastUpdateTime() >>>>>> //if any of them is older than some number >>>>>> TIMEOUT, >>>>>> remove the oldest one, then add "guy" to neighbors >>>>>> //else don't add "guy" to neighbors >>>>>> //else add "guy" to neighbors >>>>>> } >>>>>> >>>>> IMO this shouldn't be the behavior of this function because it is >>>>> called >>>>> "addNeighbor." Instead what could happen is that part of the >>>>> performMaintenance() function could remove old neighbors. >>>>> >>>>>> 2) In NCManager, getGossipNode() would just pick a random node >>>>>> from >>>>>> localNC.primaryNC.getNeighbors() to send a request to every 10 >>>>>> seconds. >>>>>> >>>>>> 3) Pyxida will still keep track of all GossipNodes in the >>>>>> network. It >>>>>> could >>>>>> also (say every 10 minutes) check its NCClient.neighbors list >>>>>> and see if >>>>>> any neighbor is older than TIMEOUT, remove it, then pick a random >>>>>> GossipNode to send a ping request. >>>>>> >>>>>> The main problem with this scheme is that one Pyxida node would >>>>>> most >>>>>> likely >>>>>> stay with the same set of neighbors all the time. Its coordinates >>>>>> would be >>>>>> greatly affected by those 32 nodes (I am assuming MAX_NEIGHBORS >>>>>> = 32). >>>>>> >>>>> Instead of (2), why not have the getNeighborToPing function >>>>> determine >>>>> the behavior? If it returns a node, then ping it. If it returns >>>>> null, >>>>> then have the NCManager pick some other node to use? If things are >>>>> hunky-dory, getNeighborToPing() would return a node (a current >>>>> member of >>>>> the neighbor set), if we're short on neighbors, then it would >>>>> return >>>>> null. The periodic maintenance would have already tossed old >>>>> nodes, so >>>>> this would be self-sustaining. >>>>> >>>>> The "problem" you mention is fairly central to network >>>>> coordinates in >>>>> general. Sure the coordinates will be a little better the more >>>>> neighbors you have. But in a very large system talking to >>>>> everybody >>>>> isn't feasible. Peter and I have a graph in one of the workshop >>>>> papers >>>>> which shows the effect of increasing the number of neighbors >>>>> >>>>>> I don't think the network would be partitioned, but it could >>>>>> happen. For >>>>>> example, the boostrap nodes have each other in their neighbors >>>>>> list. >>>>>> Every >>>>>> new node bootstrapping to the network would contact those >>>>>> bootstrapping >>>>>> nodes first and add them to their neighbors. So every Pyxida >>>>>> node could >>>>>> have as its neighbors the "Bootstrap nodes", which would make >>>>>> Pyxida >>>>>> become >>>>>> a centralized system. >>>>>> >>>>> As long as the initial bootstrap list is large enough, I think >>>>> you are >>>>> right, that partitions won't be a problem. >>>>> >>>>>> One last thing: When Pyxida receives a ping request from a node, >>>>>> it does >>>>>> not update its coordinates. Is there any particular reason for >>>>>> that? >>>>>> Is it >>>>>> because Pyxida will not be able to calculate the RTT? >>>>>> >>>>> Yes. >>>>> >>>>>> Also, there is no "gossiping" of other nodes' coordinates, >>>>>> right? So if A >>>>>> has B in its neighbors list, it will from time to time tell >>>>>> other nodes >>>>>> about B's coordinates. I realize this has a lot of complications >>>>>> but the >>>>>> original Vivaldi paper was very fuzzy about this point and the >>>>>> previous >>>>>> point that it seems it is up to the implementors to decide what >>>>>> to do. >>>>>> >>>>> I don't think we are doing this. It could be useful in certain >>>>> circumstance (e.g., picking potential new neighbors), but we're >>>>> not, IIRC. >>>>> >>>>>> I will have to think some more about this as there are a lot of >>>>>> nasty >>>>>> little side effects that could happen, but these are my thoughts. >>>>>> >>>>>> Thanks, >>>>>> >>>>>> -Eric >>>>>> >>>>> Sounds good. >>>>> >>>>> -jonathan >>>>> >>>>>> On Nov 3 2008, Jonathan Ledlie wrote: >>>>>> >>>>>>> Eric, >>>>>>> >>>>>>> Based on a look at the code just now, I see where you are >>>>>>> coming from: >>>>>>> with these params on PL, there isn't anything stopping the >>>>>>> neighbor >>>>>>> list from circulating pretty quickly through all nodes. In the >>>>>>> Azureus implementation, the guys fed to the alg (fed to >>>>>>> processSample) >>>>>>> weren't changing much because they were coming from the routing >>>>>>> finger >>>>>>> tables. I don't remember observing this behavior in our PL >>>>>>> deployment, but that could have been because we had the ping >>>>>>> time set >>>>>>> to 1-2 minutes, IIRC. >>>>>>> >>>>>>> Do you want to insert some code that will fix this? Basically >>>>>>> we want >>>>>>> to only add a new neighbor if our oldest neighbor should be >>>>>>> tossed (or >>>>>>> if we haven't reached MAX_NEIGHBORS). It's probably best from a >>>>>>> clarity standpoint if addNeighbor does continue to actually add >>>>>>> the >>>>>>> neighbor in all cases. Probably the best thing to do is to have >>>>>>> NCClient drop old neighbors internally and have NCManager ask >>>>>>> it which >>>>>>> nodes should be pinged via getNeighborToPing. So NCClient would >>>>>>> continue to have the same behavior on processSample (it would not >>>>>>> reject new information) and NCManager and NCClient would >>>>>>> cooperate to >>>>>>> measure RTTs to current neighbors. We do want to make sure, >>>>>>> however, >>>>>>> that the gossip set stays large enough so that partitions don't >>>>>>> occur. >>>>>>> >>>>>>> Do you want to be added to the SourceForge Pyxida developers to >>>>>>> make >>>>>>> this change? (Please test it out carefully first.) >>>>>>> >>>>>>> -jonathan >>>>>>> >>>>>>> On Oct 30, 2008, at 11:32 AM, ext Eric Chan-Tin wrote: >>>>>>> >>>>>>>> Jonathan, >>>>>>>> >>>>>>>> From what I saw, eventually, you learn about all the nodes in >>>>>>>> the >>>>>>>> network. Every 10 seconds, Pyxida picks a random "Gossip" node >>>>>>>> from >>>>>>>> its >>>>>>>> list and sends a request. The Gossip node will reply back with >>>>>>>> an >>>>>>>> "upNeighbor", that is, a "Gossip" node that is "up" or with 10% >>>>>>>> probability, a node that is "pending". Since naturally, all >>>>>>>> the nodes >>>>>>>> bootstrap off the same set of nodes, every node will >>>>>>>> eventually know >>>>>>>> of >>>>>>>> every other node. >>>>>>>> >>>>>>>> Pyxida might not be rotating through everybody continuously >>>>>>>> (although >>>>>>>> eventually, since it knows everybody, it will have added >>>>>>>> everybody to >>>>>>>> its Neighbors list at some point), but could be removing a >>>>>>>> node that >>>>>>>> is >>>>>>>> "alive". So, if MAX_NEIGHBORS = 32, let's call them A_1, >>>>>>>> A_2, ..., >>>>>>>> A_32. >>>>>>>> Pyxida picks a random Gossip node B, sends it a request and B >>>>>>>> replies >>>>>>>> back. Pyxida will add B to its Neighbors list, see that size of >>>>>>>> Neighbors > MAX_NEIGHBORS, kick A_1 out. Then from those 32 >>>>>>>> neighbors >>>>>>>> A_2, ...., A_32, B, it will calculate the force, and apply it >>>>>>>> to the >>>>>>>> current system coordinates. Thus, if every 10 seconds, Pyxida >>>>>>>> picks a >>>>>>>> different set of nodes C, D, E, etc... it would keep kicking >>>>>>>> nodes out >>>>>>>> of the neighbors list. >>>>>>>> >>>>>>>> It's the "kicking A_1 out" part that seemed a bit confusing. >>>>>>>> Churn >>>>>>>> is a >>>>>>>> good reason but you don't check that A_1 might still be alive >>>>>>>> or maybe >>>>>>>> A_1 was just added. I have been printing the neighbors list >>>>>>>> every 10 >>>>>>>> minutes, and they are completely different between those 10 >>>>>>>> minutes. >>>>>>>> >>>>>>>> One possible problem might be that I am choosing the value of >>>>>>>> MAX_NEIGHBORS to be too small. Originally it was set to 512 >>>>>>>> (which was >>>>>>>> meant for Azureus). Since I am using Planetlab where there are >>>>>>>> about >>>>>>>> 400-500 "stable" nodes, I changed the value to 32. But I don't >>>>>>>> think >>>>>>>> that should be a problem since log(n) = log(512) = 9. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> -Eric >>>>>>>> >>>>>>>> Jonathan Ledlie wrote: >>>>>>>>> Eric, >>>>>>>>> >>>>>>>>> Glad to hear it. Let Peter and me know if you publish >>>>>>>>> anything using >>>>>>>>> Pyxida and we can put it on the web page. >>>>>>>>> >>>>>>>>> "Gossip" nodes are for learning about the network. >>>>>>>>> "Neighbors" are >>>>>>>>> for >>>>>>>>> constructing the coordinate. >>>>>>>>> >>>>>>>>> So, IIRC, the gossip process learns about nodes that may or >>>>>>>>> may not >>>>>>>>> be >>>>>>>>> applied to the neighbor list. That is, we may eventually learn >>>>>>>>> about a >>>>>>>>> large portion of the network, via gossiping, but should only >>>>>>>>> be using >>>>>>>>> MAX_NEIGHBORS at any given time for coordinate computation. >>>>>>>>> The >>>>>>>>> reason >>>>>>>>> we don't want to use a fixed set of neighbors for coordinate >>>>>>>>> computation >>>>>>>>> is because nodes may come and go from the system. However, >>>>>>>>> it should >>>>>>>>> not be the case that we just rotate through everybody, because >>>>>>>>> then, as >>>>>>>>> you note, we'd be computing against everyone's coordinate/ >>>>>>>>> latency, >>>>>>>>> and >>>>>>>>> this is counter to the point of network coordinates. With this >>>>>>>>> gradual >>>>>>>>> transience as a desirable quality in mind, I am pretty sure we >>>>>>>>> maintain >>>>>>>>> neighbors in a fairly fixed manner. Does this not seem to be >>>>>>>>> the >>>>>>>>> case >>>>>>>>> to you? >>>>>>>>> >>>>>>>>> -jonathan >>>>>>>>> >>>>>>>>> On Oct 29, 2008, at 9:20 PM, Eric Chan-Tin wrote: >>>>>>>>> >>>>>>>>>> Hi Jonathan and Peter, >>>>>>>>>> >>>>>>>>>> First of all, just wanted to say that Pyxida has been working >>>>>>>>>> great on >>>>>>>>>> Planetlab for the past 4 months or so. Getting good data and >>>>>>>>>> experiments >>>>>>>>>> going well. >>>>>>>>>> >>>>>>>>>> However, I do have one question/comment. >>>>>>>>>> >>>>>>>>>> Every Pyxida node contains a list of Neighbors (in NCClient) >>>>>>>>>> and a >>>>>>>>>> list >>>>>>>>>> of Gossip nodes (they are called neighbors in NCManager but >>>>>>>>>> I will >>>>>>>>>> call >>>>>>>>>> them gossip for now so as not to confuse with the 'real' >>>>>>>>>> neighbors >>>>>>>>>> in >>>>>>>>>> NCClient). Every 10 seconds, Pyxida will pick a random node >>>>>>>>>> (call >>>>>>>>>> it A) >>>>>>>>>> from Gossip and send a gossip request (basically asking for >>>>>>>>>> rtt, >>>>>>>>>> coordinate, error). But Gossip contains all the nodes in the >>>>>>>>>> network. >>>>>>>>>> The comment in the code said that was to avoid Pyxida becoming >>>>>>>>>> lonely >>>>>>>>>> and no-one to gossip with, which makes sense. >>>>>>>>>> >>>>>>>>>> Once A responds, Pyxida will add A to its Neighbors (if >>>>>>>>>> Neighbors >>>>>>>>>> does >>>>>>>>>> not contain A) and remove the first element from the >>>>>>>>>> Neighbors list. >>>>>>>>>> This seems a little bit odd. Was there any particular >>>>>>>>>> reasoning >>>>>>>>>> behind >>>>>>>>>> this? Because it would seem like you're getting the >>>>>>>>>> coordinate from >>>>>>>>>> every node in the network eventually. >>>>>>>>>> >>>>>>>>>> Thanks a lot, >>>>>>>>>> >>>>>>>>>> -Eric >>>>>>>>>> >>>>>>>>>> Jonathan Ledlie wrote: >>>>>>>>>>> Your inclination is correct. See forwarded message. >>>>>>>>>>> >>>>>>>>>>> -jonathan >>>>>>>>>>> >>>>>>>>>>> On Apr 9, 2008, at 1:53 PM, ext Eric Chan-Tin wrote: >>>>>>>>>>> >>>>>>>>>>>> Thank you for the clarification. >>>>>>>>>>>> >>>>>>>>>>>> I've been reading through your published papers and code >>>>>>>>>>>> and I >>>>>>>>>>>> must >>>>>>>>>>>> say I learned a lot about how Pyxida works :) >>>>>>>>>>>> >>>>>>>>>>>> I still have a quick and easy (hopefully) question. Each >>>>>>>>>>>> Pyxida >>>>>>>>>>>> node >>>>>>>>>>>> has a small amount of neighbors and learn about other nodes >>>>>>>>>>>> through >>>>>>>>>>>> other nodes. I am running on about 200 machines, and it >>>>>>>>>>>> seems >>>>>>>>>>>> like all >>>>>>>>>>>> the Pyxida nodes are adding each other to their >>>>>>>>>>>> upNeighbors set. >>>>>>>>>>>> NCClient.java has the MAX_NEIGHBORS set to 512. >>>>>>>>>>>> >>>>>>>>>>>> I wanted to make sure that I am not misreading the code, but >>>>>>>>>>>> shouldn't >>>>>>>>>>>> each Pyxida node keep a small number of neighbors in its >>>>>>>>>>>> upNeighbors >>>>>>>>>>>> set. Else, it will be a lot of updates to send a >>>>>>>>>>>> gossipMessage >>>>>>>>>>>> to all >>>>>>>>>>>> 200 neighbors, and it also means that a Pyxida node has >>>>>>>>>>>> the whole >>>>>>>>>>>> network in its upNeighbors set - that seems wrong. I am >>>>>>>>>>>> inclined >>>>>>>>>>>> to >>>>>>>>>>>> change the value of MAX_NEIGHBORS but wanted to make sure >>>>>>>>>>>> that my >>>>>>>>>>>> understanding is correct first. >>>>>>>>>>>> >>>>>>>>>>>> Thank you, >>>>>>>>>>>> >>>>>>>>>>>> Eric >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> ------------------------------------------------------------------------- >>>>>>>>>> >>>>>>>>>> This SF.Net email is sponsored by the Moblin Your Move >>>>>>>>>> Developer's >>>>>>>>>> challenge Build the coolest Linux based applications with >>>>>>>>>> Moblin >>>>>>>>>> SDK & >>>>>>>>>> win great prizes Grand prize is a trip for two to an Open >>>>>>>>>> Source >>>>>>>>>> event >>>>>>>>>> anywhere in the world >>>>>>>>>> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >>>>>>>>>> _______________________________________________ Pyxida-users >>>>>>>>>> mailing >>>>>>>>>> list Pyx...@li... >>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/pyxida-users >>>>>> >>>>>> >>>>>> ------------------------------------------------------------------------- >>>>>> This SF.Net email is sponsored by the Moblin Your Move Developer's >>>>>> challenge Build the coolest Linux based applications with Moblin SDK >>>>>> & win great prizes Grand prize is a trip for two to an Open Source >>>>>> event anywhere in the world >>>>>> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >>>>>> _______________________________________________ Pyxida-users mailing >>>>>> list Pyx...@li... >>>>>> https://lists.sourceforge.net/lists/listinfo/pyxida-users >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, >>>> Nevada. The future of the web can't happen without you. Join us at >>>> MIX09 to help pave the way to the Next Web now. Learn more and >>>> register at >>>> http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/ >>>> _______________________________________________ Pyxida-users mailing >>>> list Pyx...@li... >>>> https://lists.sourceforge.net/lists/listinfo/pyxida-users >>> >>> > |
From: Jonathan L. <jon...@ee...> - 2008-12-10 15:31:14
|
This is an interesting situation. Let's think about the goal. The goal is to keep pinging the same nodes as long as they are alive, and to replace a node soon after it dies, keeping the total number of up neighbors at approximately MAX_NEIGHBORS. How about the following: - when you pick a neighbor to ping, you set a time on that event (in a new per-neighbor timestamp LastPingTime) - when you first insert a neighbor, set this timestamp to 0. - when you loop in performMaintenance, remove nodes whose LastPingTime - LastUpdateTime > threshold (like 10 minutes) - when picking a node, pick the guy with the oldest LastPingTime (not sure about this one -- I don't like situations where you pick the absolute *oldest* because then we could get in a situation where an old guy sits there, waiting to be kicked out, and we keep picking him to be pinged; I'd prefer to pick randomly, or random older guys). - a simple way around this is to simply have a self-adjusting threshold: so, we walk through the nodes, and everyone whose LastPingTime is older than MIN_UPDATE_TIME_TO_PING ago (a relative time) get added to a set; we pick a guy randomly from the set to ping. If the set is empty, we increase MIN_UPDATE_TIME_TO_PING, and wait for the next time the function is called. If the set is too large (relative to a fraction of MAX_NEIGHBORS), we decrease it. We still might get some churn from lost UDP packets. You might also give nodes a second chance to be pinged by including LastPingTime - LastUpdateTime > 1 minute in the set. Note that increases and decreases in MIN_UPDATE_TIME_TO_PING shouldn't be affected by the number of nodes that fall into this category. We still probably need a way for new potential neighbors to be rejected if our current set of neighbors is fine. Because I still want addNeigbor to really add the neighbor, how about putting a check before calling that to see if we need more neighbors. What thinks? -jonathan On Dec 9, 2008, at 11:37 PM, ext Eric Chan-Tin wrote: > Peter, > > Thanks. That fixed it. > > However, here is another problem I found while deploying Pyxida and > running them for a while. > > In performMaintenance(), I loop through every neighbor and if > (neighbor.getLastUpdateTime() < current_time - 10 minutes), then I > remove that neighbor. > > > In NCManager, I call getNeighborToPing, which randomly picks one of > the > neighbors (32 in this case). Pinging is performed every 10 seconds, so > in 10 minutes, you can ping 60 nodes. However, if a neighbor A is not > among those 60 nodes, it will get kicked out in performMaintenance(), > although A is still alive (it just hasn't been contacted yet). > > There are some possible options. > > 1) Increase the timeout period from 10 minutes to ?? minutes. This > doesn't quite work because say in Azureus, the number of neighbors is > 512, so the timeout needs to be increased even more... It could be a > function of MAX_NEIGHBORS... > > 2) When Pyxida replies to a ping request, it will update the > neighbor's > lastUpdateTime. This does not quite work either because it assumes a > symmetric relationship which is most likely not the case. > > 3) Introduce some more logic to Pyxida so it remembers what nodes it > contacted and does not send, say, more than 2 requests in 10 minutes > to > the same node. This solution isn't that great either. > > Let me know what you guys think. I am currently leaning towards option > 1) but make the timeout period a function of MAX_NEIGHBORS, although I > haven't quite found the perfect function yet. > > Here are some preliminary results. I compared the number of neighbors > which are the same at each reported time (10 minutes). Before the fix, > more than 50% of the neighbors changed within 10 minutes. After the > fix, > only about 30% of the neighbors changed within 10 minutes. It is more > stable but I think 30% is still quite a big number. > > Sorry for the long email :) > > -Eric > > Peter Pietzuch wrote: >> Hi, >> >> Eric Chan-Tin wrote: >>> for (RemoteState<T> neighbor : neighbors) { //this is line 1029 >>> if (neighbor.getLastUpdateTime() < (curr_time - >>> MAINTENANCE_PERIOD)) { >>> neighbors.remove(neighbor); >>> } >>> } >> >> IIRC you can't modify a collection while iterating over it using the >> above for loop. Instead you have to use a proper >> Iterator<RemoteState<T>> and use it's remove method. >> >> Cheers, >> Peter >> >>> Jonathan Ledlie wrote: >>>> On Nov 3, 2008, at 3:46 PM, Eric Chan-Tin wrote: >>>> >>>>> Jonathan, >>>>> >>>>> I can insert some code to fix this and it doesn't look too hard >>>>> but I do >>>>> have some concerns. >>>>> >>>>>> I don't remember observing this behavior in our PL >>>>>> deployment, but that could have been because we had the ping >>>>>> time set >>>>>> to 1-2 minutes, IIRC. >>>>> The default ping time was set to 10 seconds + some Gaussian >>>>> distribution, >>>>> so that's what I used, but changing the ping time to 1 minute or >>>>> more >>>>> will >>>>> result in the same effect except it will take longer for the >>>>> "cycle" to >>>>> occur. >>>>> >>>>> What I have in mind was the following: >>>>> >>>>> 1) In NCClient.addNeighbor() would be changed to (in pseudocode) >>>>> protected boolean addNeighbor(RemoteState<T> guy) { >>>>> boolean added = false; >>>>> //if neighbors.size() >= MAX_NEIGHBORS >>>>> //Check last update time of each neighbor in >>>>> neighbors >>>>> //neighbor.getLastUpdateTime() >>>>> //if any of them is older than some number >>>>> TIMEOUT, >>>>> remove the oldest one, then add "guy" to neighbors >>>>> //else don't add "guy" to neighbors >>>>> //else add "guy" to neighbors >>>>> } >>>>> >>>> IMO this shouldn't be the behavior of this function because it is >>>> called >>>> "addNeighbor." Instead what could happen is that part of the >>>> performMaintenance() function could remove old neighbors. >>>> >>>>> 2) In NCManager, getGossipNode() would just pick a random node >>>>> from >>>>> localNC.primaryNC.getNeighbors() to send a request to every 10 >>>>> seconds. >>>>> >>>>> 3) Pyxida will still keep track of all GossipNodes in the >>>>> network. It >>>>> could >>>>> also (say every 10 minutes) check its NCClient.neighbors list >>>>> and see if >>>>> any neighbor is older than TIMEOUT, remove it, then pick a random >>>>> GossipNode to send a ping request. >>>>> >>>>> The main problem with this scheme is that one Pyxida node would >>>>> most >>>>> likely >>>>> stay with the same set of neighbors all the time. Its coordinates >>>>> would be >>>>> greatly affected by those 32 nodes (I am assuming MAX_NEIGHBORS >>>>> = 32). >>>>> >>>> Instead of (2), why not have the getNeighborToPing function >>>> determine >>>> the behavior? If it returns a node, then ping it. If it returns >>>> null, >>>> then have the NCManager pick some other node to use? If things are >>>> hunky-dory, getNeighborToPing() would return a node (a current >>>> member of >>>> the neighbor set), if we're short on neighbors, then it would >>>> return >>>> null. The periodic maintenance would have already tossed old >>>> nodes, so >>>> this would be self-sustaining. >>>> >>>> The "problem" you mention is fairly central to network >>>> coordinates in >>>> general. Sure the coordinates will be a little better the more >>>> neighbors you have. But in a very large system talking to >>>> everybody >>>> isn't feasible. Peter and I have a graph in one of the workshop >>>> papers >>>> which shows the effect of increasing the number of neighbors >>>> >>>>> I don't think the network would be partitioned, but it could >>>>> happen. For >>>>> example, the boostrap nodes have each other in their neighbors >>>>> list. >>>>> Every >>>>> new node bootstrapping to the network would contact those >>>>> bootstrapping >>>>> nodes first and add them to their neighbors. So every Pyxida >>>>> node could >>>>> have as its neighbors the "Bootstrap nodes", which would make >>>>> Pyxida >>>>> become >>>>> a centralized system. >>>>> >>>> As long as the initial bootstrap list is large enough, I think >>>> you are >>>> right, that partitions won't be a problem. >>>> >>>>> One last thing: When Pyxida receives a ping request from a node, >>>>> it does >>>>> not update its coordinates. Is there any particular reason for >>>>> that? >>>>> Is it >>>>> because Pyxida will not be able to calculate the RTT? >>>>> >>>> Yes. >>>> >>>>> Also, there is no "gossiping" of other nodes' coordinates, >>>>> right? So if A >>>>> has B in its neighbors list, it will from time to time tell >>>>> other nodes >>>>> about B's coordinates. I realize this has a lot of complications >>>>> but the >>>>> original Vivaldi paper was very fuzzy about this point and the >>>>> previous >>>>> point that it seems it is up to the implementors to decide what >>>>> to do. >>>>> >>>> I don't think we are doing this. It could be useful in certain >>>> circumstance (e.g., picking potential new neighbors), but we're >>>> not, IIRC. >>>> >>>>> I will have to think some more about this as there are a lot of >>>>> nasty >>>>> little side effects that could happen, but these are my thoughts. >>>>> >>>>> Thanks, >>>>> >>>>> -Eric >>>>> >>>> Sounds good. >>>> >>>> -jonathan >>>> >>>>> On Nov 3 2008, Jonathan Ledlie wrote: >>>>> >>>>>> Eric, >>>>>> >>>>>> Based on a look at the code just now, I see where you are >>>>>> coming from: >>>>>> with these params on PL, there isn't anything stopping the >>>>>> neighbor >>>>>> list from circulating pretty quickly through all nodes. In the >>>>>> Azureus implementation, the guys fed to the alg (fed to >>>>>> processSample) >>>>>> weren't changing much because they were coming from the routing >>>>>> finger >>>>>> tables. I don't remember observing this behavior in our PL >>>>>> deployment, but that could have been because we had the ping >>>>>> time set >>>>>> to 1-2 minutes, IIRC. >>>>>> >>>>>> Do you want to insert some code that will fix this? Basically >>>>>> we want >>>>>> to only add a new neighbor if our oldest neighbor should be >>>>>> tossed (or >>>>>> if we haven't reached MAX_NEIGHBORS). It's probably best from a >>>>>> clarity standpoint if addNeighbor does continue to actually add >>>>>> the >>>>>> neighbor in all cases. Probably the best thing to do is to have >>>>>> NCClient drop old neighbors internally and have NCManager ask >>>>>> it which >>>>>> nodes should be pinged via getNeighborToPing. So NCClient would >>>>>> continue to have the same behavior on processSample (it would not >>>>>> reject new information) and NCManager and NCClient would >>>>>> cooperate to >>>>>> measure RTTs to current neighbors. We do want to make sure, >>>>>> however, >>>>>> that the gossip set stays large enough so that partitions don't >>>>>> occur. >>>>>> >>>>>> Do you want to be added to the SourceForge Pyxida developers to >>>>>> make >>>>>> this change? (Please test it out carefully first.) >>>>>> >>>>>> -jonathan >>>>>> >>>>>> On Oct 30, 2008, at 11:32 AM, ext Eric Chan-Tin wrote: >>>>>> >>>>>>> Jonathan, >>>>>>> >>>>>>> From what I saw, eventually, you learn about all the nodes in >>>>>>> the >>>>>>> network. Every 10 seconds, Pyxida picks a random "Gossip" node >>>>>>> from >>>>>>> its >>>>>>> list and sends a request. The Gossip node will reply back with >>>>>>> an >>>>>>> "upNeighbor", that is, a "Gossip" node that is "up" or with 10% >>>>>>> probability, a node that is "pending". Since naturally, all >>>>>>> the nodes >>>>>>> bootstrap off the same set of nodes, every node will >>>>>>> eventually know >>>>>>> of >>>>>>> every other node. >>>>>>> >>>>>>> Pyxida might not be rotating through everybody continuously >>>>>>> (although >>>>>>> eventually, since it knows everybody, it will have added >>>>>>> everybody to >>>>>>> its Neighbors list at some point), but could be removing a >>>>>>> node that >>>>>>> is >>>>>>> "alive". So, if MAX_NEIGHBORS = 32, let's call them A_1, >>>>>>> A_2, ..., >>>>>>> A_32. >>>>>>> Pyxida picks a random Gossip node B, sends it a request and B >>>>>>> replies >>>>>>> back. Pyxida will add B to its Neighbors list, see that size of >>>>>>> Neighbors > MAX_NEIGHBORS, kick A_1 out. Then from those 32 >>>>>>> neighbors >>>>>>> A_2, ...., A_32, B, it will calculate the force, and apply it >>>>>>> to the >>>>>>> current system coordinates. Thus, if every 10 seconds, Pyxida >>>>>>> picks a >>>>>>> different set of nodes C, D, E, etc... it would keep kicking >>>>>>> nodes out >>>>>>> of the neighbors list. >>>>>>> >>>>>>> It's the "kicking A_1 out" part that seemed a bit confusing. >>>>>>> Churn >>>>>>> is a >>>>>>> good reason but you don't check that A_1 might still be alive >>>>>>> or maybe >>>>>>> A_1 was just added. I have been printing the neighbors list >>>>>>> every 10 >>>>>>> minutes, and they are completely different between those 10 >>>>>>> minutes. >>>>>>> >>>>>>> One possible problem might be that I am choosing the value of >>>>>>> MAX_NEIGHBORS to be too small. Originally it was set to 512 >>>>>>> (which was >>>>>>> meant for Azureus). Since I am using Planetlab where there are >>>>>>> about >>>>>>> 400-500 "stable" nodes, I changed the value to 32. But I don't >>>>>>> think >>>>>>> that should be a problem since log(n) = log(512) = 9. >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> -Eric >>>>>>> >>>>>>> Jonathan Ledlie wrote: >>>>>>>> Eric, >>>>>>>> >>>>>>>> Glad to hear it. Let Peter and me know if you publish >>>>>>>> anything using >>>>>>>> Pyxida and we can put it on the web page. >>>>>>>> >>>>>>>> "Gossip" nodes are for learning about the network. >>>>>>>> "Neighbors" are >>>>>>>> for >>>>>>>> constructing the coordinate. >>>>>>>> >>>>>>>> So, IIRC, the gossip process learns about nodes that may or >>>>>>>> may not >>>>>>>> be >>>>>>>> applied to the neighbor list. That is, we may eventually learn >>>>>>>> about a >>>>>>>> large portion of the network, via gossiping, but should only >>>>>>>> be using >>>>>>>> MAX_NEIGHBORS at any given time for coordinate computation. >>>>>>>> The >>>>>>>> reason >>>>>>>> we don't want to use a fixed set of neighbors for coordinate >>>>>>>> computation >>>>>>>> is because nodes may come and go from the system. However, >>>>>>>> it should >>>>>>>> not be the case that we just rotate through everybody, because >>>>>>>> then, as >>>>>>>> you note, we'd be computing against everyone's coordinate/ >>>>>>>> latency, >>>>>>>> and >>>>>>>> this is counter to the point of network coordinates. With this >>>>>>>> gradual >>>>>>>> transience as a desirable quality in mind, I am pretty sure we >>>>>>>> maintain >>>>>>>> neighbors in a fairly fixed manner. Does this not seem to be >>>>>>>> the >>>>>>>> case >>>>>>>> to you? >>>>>>>> >>>>>>>> -jonathan >>>>>>>> >>>>>>>> On Oct 29, 2008, at 9:20 PM, Eric Chan-Tin wrote: >>>>>>>> >>>>>>>>> Hi Jonathan and Peter, >>>>>>>>> >>>>>>>>> First of all, just wanted to say that Pyxida has been working >>>>>>>>> great on >>>>>>>>> Planetlab for the past 4 months or so. Getting good data and >>>>>>>>> experiments >>>>>>>>> going well. >>>>>>>>> >>>>>>>>> However, I do have one question/comment. >>>>>>>>> >>>>>>>>> Every Pyxida node contains a list of Neighbors (in NCClient) >>>>>>>>> and a >>>>>>>>> list >>>>>>>>> of Gossip nodes (they are called neighbors in NCManager but >>>>>>>>> I will >>>>>>>>> call >>>>>>>>> them gossip for now so as not to confuse with the 'real' >>>>>>>>> neighbors >>>>>>>>> in >>>>>>>>> NCClient). Every 10 seconds, Pyxida will pick a random node >>>>>>>>> (call >>>>>>>>> it A) >>>>>>>>> from Gossip and send a gossip request (basically asking for >>>>>>>>> rtt, >>>>>>>>> coordinate, error). But Gossip contains all the nodes in the >>>>>>>>> network. >>>>>>>>> The comment in the code said that was to avoid Pyxida becoming >>>>>>>>> lonely >>>>>>>>> and no-one to gossip with, which makes sense. >>>>>>>>> >>>>>>>>> Once A responds, Pyxida will add A to its Neighbors (if >>>>>>>>> Neighbors >>>>>>>>> does >>>>>>>>> not contain A) and remove the first element from the >>>>>>>>> Neighbors list. >>>>>>>>> This seems a little bit odd. Was there any particular >>>>>>>>> reasoning >>>>>>>>> behind >>>>>>>>> this? Because it would seem like you're getting the >>>>>>>>> coordinate from >>>>>>>>> every node in the network eventually. >>>>>>>>> >>>>>>>>> Thanks a lot, >>>>>>>>> >>>>>>>>> -Eric >>>>>>>>> >>>>>>>>> Jonathan Ledlie wrote: >>>>>>>>>> Your inclination is correct. See forwarded message. >>>>>>>>>> >>>>>>>>>> -jonathan >>>>>>>>>> >>>>>>>>>> On Apr 9, 2008, at 1:53 PM, ext Eric Chan-Tin wrote: >>>>>>>>>> >>>>>>>>>>> Thank you for the clarification. >>>>>>>>>>> >>>>>>>>>>> I've been reading through your published papers and code >>>>>>>>>>> and I >>>>>>>>>>> must >>>>>>>>>>> say I learned a lot about how Pyxida works :) >>>>>>>>>>> >>>>>>>>>>> I still have a quick and easy (hopefully) question. Each >>>>>>>>>>> Pyxida >>>>>>>>>>> node >>>>>>>>>>> has a small amount of neighbors and learn about other nodes >>>>>>>>>>> through >>>>>>>>>>> other nodes. I am running on about 200 machines, and it >>>>>>>>>>> seems >>>>>>>>>>> like all >>>>>>>>>>> the Pyxida nodes are adding each other to their >>>>>>>>>>> upNeighbors set. >>>>>>>>>>> NCClient.java has the MAX_NEIGHBORS set to 512. >>>>>>>>>>> >>>>>>>>>>> I wanted to make sure that I am not misreading the code, but >>>>>>>>>>> shouldn't >>>>>>>>>>> each Pyxida node keep a small number of neighbors in its >>>>>>>>>>> upNeighbors >>>>>>>>>>> set. Else, it will be a lot of updates to send a >>>>>>>>>>> gossipMessage >>>>>>>>>>> to all >>>>>>>>>>> 200 neighbors, and it also means that a Pyxida node has >>>>>>>>>>> the whole >>>>>>>>>>> network in its upNeighbors set - that seems wrong. I am >>>>>>>>>>> inclined >>>>>>>>>>> to >>>>>>>>>>> change the value of MAX_NEIGHBORS but wanted to make sure >>>>>>>>>>> that my >>>>>>>>>>> understanding is correct first. >>>>>>>>>>> >>>>>>>>>>> Thank you, >>>>>>>>>>> >>>>>>>>>>> Eric >>>>>>>>>>> >>>>>>>>> >>>>>>>>> ------------------------------------------------------------------------- >>>>>>>>> >>>>>>>>> This SF.Net email is sponsored by the Moblin Your Move >>>>>>>>> Developer's >>>>>>>>> challenge Build the coolest Linux based applications with >>>>>>>>> Moblin >>>>>>>>> SDK & >>>>>>>>> win great prizes Grand prize is a trip for two to an Open >>>>>>>>> Source >>>>>>>>> event >>>>>>>>> anywhere in the world >>>>>>>>> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >>>>>>>>> _______________________________________________ Pyxida-users >>>>>>>>> mailing >>>>>>>>> list Pyx...@li... >>>>>>>>> https://lists.sourceforge.net/lists/listinfo/pyxida-users >>>>> ------------------------------------------------------------------------- >>>>> This SF.Net email is sponsored by the Moblin Your Move Developer's >>>>> challenge >>>>> Build the coolest Linux based applications with Moblin SDK & win >>>>> great >>>>> prizes >>>>> Grand prize is a trip for two to an Open Source event anywhere >>>>> in the >>>>> world >>>>> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >>>>> _______________________________________________ >>>>> Pyxida-users mailing list >>>>> Pyx...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/pyxida-users >>> ------------------------------------------------------------------------------ >>> SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las >>> Vegas, Nevada. >>> The future of the web can't happen without you. Join us at MIX09 >>> to help >>> pave the way to the Next Web now. Learn more and register at >>> http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/ >>> _______________________________________________ >>> Pyxida-users mailing list >>> Pyx...@li... >>> https://lists.sourceforge.net/lists/listinfo/pyxida-users >> >> |
From: Eric Chan-T. <dch...@cs...> - 2008-12-10 04:52:07
|
Peter, Thanks. That fixed it. However, here is another problem I found while deploying Pyxida and running them for a while. In performMaintenance(), I loop through every neighbor and if (neighbor.getLastUpdateTime() < current_time - 10 minutes), then I remove that neighbor. In NCManager, I call getNeighborToPing, which randomly picks one of the neighbors (32 in this case). Pinging is performed every 10 seconds, so in 10 minutes, you can ping 60 nodes. However, if a neighbor A is not among those 60 nodes, it will get kicked out in performMaintenance(), although A is still alive (it just hasn't been contacted yet). There are some possible options. 1) Increase the timeout period from 10 minutes to ?? minutes. This doesn't quite work because say in Azureus, the number of neighbors is 512, so the timeout needs to be increased even more... It could be a function of MAX_NEIGHBORS... 2) When Pyxida replies to a ping request, it will update the neighbor's lastUpdateTime. This does not quite work either because it assumes a symmetric relationship which is most likely not the case. 3) Introduce some more logic to Pyxida so it remembers what nodes it contacted and does not send, say, more than 2 requests in 10 minutes to the same node. This solution isn't that great either. Let me know what you guys think. I am currently leaning towards option 1) but make the timeout period a function of MAX_NEIGHBORS, although I haven't quite found the perfect function yet. Here are some preliminary results. I compared the number of neighbors which are the same at each reported time (10 minutes). Before the fix, more than 50% of the neighbors changed within 10 minutes. After the fix, only about 30% of the neighbors changed within 10 minutes. It is more stable but I think 30% is still quite a big number. Sorry for the long email :) -Eric Peter Pietzuch wrote: > Hi, > > Eric Chan-Tin wrote: >> for (RemoteState<T> neighbor : neighbors) { //this is line 1029 >> if (neighbor.getLastUpdateTime() < (curr_time - MAINTENANCE_PERIOD)) { >> neighbors.remove(neighbor); >> } >> } > > IIRC you can't modify a collection while iterating over it using the > above for loop. Instead you have to use a proper > Iterator<RemoteState<T>> and use it's remove method. > > Cheers, > Peter > >> Jonathan Ledlie wrote: >>> On Nov 3, 2008, at 3:46 PM, Eric Chan-Tin wrote: >>> >>>> Jonathan, >>>> >>>> I can insert some code to fix this and it doesn't look too hard but I do >>>> have some concerns. >>>> >>>>> I don't remember observing this behavior in our PL >>>>> deployment, but that could have been because we had the ping time set >>>>> to 1-2 minutes, IIRC. >>>> The default ping time was set to 10 seconds + some Gaussian distribution, >>>> so that's what I used, but changing the ping time to 1 minute or more >>>> will >>>> result in the same effect except it will take longer for the "cycle" to >>>> occur. >>>> >>>> What I have in mind was the following: >>>> >>>> 1) In NCClient.addNeighbor() would be changed to (in pseudocode) >>>> protected boolean addNeighbor(RemoteState<T> guy) { >>>> boolean added = false; >>>> //if neighbors.size() >= MAX_NEIGHBORS >>>> //Check last update time of each neighbor in neighbors >>>> //neighbor.getLastUpdateTime() >>>> //if any of them is older than some number TIMEOUT, >>>> remove the oldest one, then add "guy" to neighbors >>>> //else don't add "guy" to neighbors >>>> //else add "guy" to neighbors >>>> } >>>> >>> IMO this shouldn't be the behavior of this function because it is called >>> "addNeighbor." Instead what could happen is that part of the >>> performMaintenance() function could remove old neighbors. >>> >>>> 2) In NCManager, getGossipNode() would just pick a random node from >>>> localNC.primaryNC.getNeighbors() to send a request to every 10 seconds. >>>> >>>> 3) Pyxida will still keep track of all GossipNodes in the network. It >>>> could >>>> also (say every 10 minutes) check its NCClient.neighbors list and see if >>>> any neighbor is older than TIMEOUT, remove it, then pick a random >>>> GossipNode to send a ping request. >>>> >>>> The main problem with this scheme is that one Pyxida node would most >>>> likely >>>> stay with the same set of neighbors all the time. Its coordinates >>>> would be >>>> greatly affected by those 32 nodes (I am assuming MAX_NEIGHBORS = 32). >>>> >>> Instead of (2), why not have the getNeighborToPing function determine >>> the behavior? If it returns a node, then ping it. If it returns null, >>> then have the NCManager pick some other node to use? If things are >>> hunky-dory, getNeighborToPing() would return a node (a current member of >>> the neighbor set), if we're short on neighbors, then it would return >>> null. The periodic maintenance would have already tossed old nodes, so >>> this would be self-sustaining. >>> >>> The "problem" you mention is fairly central to network coordinates in >>> general. Sure the coordinates will be a little better the more >>> neighbors you have. But in a very large system talking to everybody >>> isn't feasible. Peter and I have a graph in one of the workshop papers >>> which shows the effect of increasing the number of neighbors >>> >>>> I don't think the network would be partitioned, but it could happen. For >>>> example, the boostrap nodes have each other in their neighbors list. >>>> Every >>>> new node bootstrapping to the network would contact those bootstrapping >>>> nodes first and add them to their neighbors. So every Pyxida node could >>>> have as its neighbors the "Bootstrap nodes", which would make Pyxida >>>> become >>>> a centralized system. >>>> >>> As long as the initial bootstrap list is large enough, I think you are >>> right, that partitions won't be a problem. >>> >>>> One last thing: When Pyxida receives a ping request from a node, it does >>>> not update its coordinates. Is there any particular reason for that? >>>> Is it >>>> because Pyxida will not be able to calculate the RTT? >>>> >>> Yes. >>> >>>> Also, there is no "gossiping" of other nodes' coordinates, right? So if A >>>> has B in its neighbors list, it will from time to time tell other nodes >>>> about B's coordinates. I realize this has a lot of complications but the >>>> original Vivaldi paper was very fuzzy about this point and the previous >>>> point that it seems it is up to the implementors to decide what to do. >>>> >>> I don't think we are doing this. It could be useful in certain >>> circumstance (e.g., picking potential new neighbors), but we're not, IIRC. >>> >>>> I will have to think some more about this as there are a lot of nasty >>>> little side effects that could happen, but these are my thoughts. >>>> >>>> Thanks, >>>> >>>> -Eric >>>> >>> Sounds good. >>> >>> -jonathan >>> >>>> On Nov 3 2008, Jonathan Ledlie wrote: >>>> >>>>> Eric, >>>>> >>>>> Based on a look at the code just now, I see where you are coming from: >>>>> with these params on PL, there isn't anything stopping the neighbor >>>>> list from circulating pretty quickly through all nodes. In the >>>>> Azureus implementation, the guys fed to the alg (fed to processSample) >>>>> weren't changing much because they were coming from the routing finger >>>>> tables. I don't remember observing this behavior in our PL >>>>> deployment, but that could have been because we had the ping time set >>>>> to 1-2 minutes, IIRC. >>>>> >>>>> Do you want to insert some code that will fix this? Basically we want >>>>> to only add a new neighbor if our oldest neighbor should be tossed (or >>>>> if we haven't reached MAX_NEIGHBORS). It's probably best from a >>>>> clarity standpoint if addNeighbor does continue to actually add the >>>>> neighbor in all cases. Probably the best thing to do is to have >>>>> NCClient drop old neighbors internally and have NCManager ask it which >>>>> nodes should be pinged via getNeighborToPing. So NCClient would >>>>> continue to have the same behavior on processSample (it would not >>>>> reject new information) and NCManager and NCClient would cooperate to >>>>> measure RTTs to current neighbors. We do want to make sure, however, >>>>> that the gossip set stays large enough so that partitions don't occur. >>>>> >>>>> Do you want to be added to the SourceForge Pyxida developers to make >>>>> this change? (Please test it out carefully first.) >>>>> >>>>> -jonathan >>>>> >>>>> On Oct 30, 2008, at 11:32 AM, ext Eric Chan-Tin wrote: >>>>> >>>>>> Jonathan, >>>>>> >>>>>> From what I saw, eventually, you learn about all the nodes in the >>>>>> network. Every 10 seconds, Pyxida picks a random "Gossip" node from >>>>>> its >>>>>> list and sends a request. The Gossip node will reply back with an >>>>>> "upNeighbor", that is, a "Gossip" node that is "up" or with 10% >>>>>> probability, a node that is "pending". Since naturally, all the nodes >>>>>> bootstrap off the same set of nodes, every node will eventually know >>>>>> of >>>>>> every other node. >>>>>> >>>>>> Pyxida might not be rotating through everybody continuously (although >>>>>> eventually, since it knows everybody, it will have added everybody to >>>>>> its Neighbors list at some point), but could be removing a node that >>>>>> is >>>>>> "alive". So, if MAX_NEIGHBORS = 32, let's call them A_1, A_2, ..., >>>>>> A_32. >>>>>> Pyxida picks a random Gossip node B, sends it a request and B replies >>>>>> back. Pyxida will add B to its Neighbors list, see that size of >>>>>> Neighbors > MAX_NEIGHBORS, kick A_1 out. Then from those 32 neighbors >>>>>> A_2, ...., A_32, B, it will calculate the force, and apply it to the >>>>>> current system coordinates. Thus, if every 10 seconds, Pyxida picks a >>>>>> different set of nodes C, D, E, etc... it would keep kicking nodes out >>>>>> of the neighbors list. >>>>>> >>>>>> It's the "kicking A_1 out" part that seemed a bit confusing. Churn >>>>>> is a >>>>>> good reason but you don't check that A_1 might still be alive or maybe >>>>>> A_1 was just added. I have been printing the neighbors list every 10 >>>>>> minutes, and they are completely different between those 10 minutes. >>>>>> >>>>>> One possible problem might be that I am choosing the value of >>>>>> MAX_NEIGHBORS to be too small. Originally it was set to 512 (which was >>>>>> meant for Azureus). Since I am using Planetlab where there are about >>>>>> 400-500 "stable" nodes, I changed the value to 32. But I don't think >>>>>> that should be a problem since log(n) = log(512) = 9. >>>>>> >>>>>> Thanks, >>>>>> >>>>>> -Eric >>>>>> >>>>>> Jonathan Ledlie wrote: >>>>>>> Eric, >>>>>>> >>>>>>> Glad to hear it. Let Peter and me know if you publish anything using >>>>>>> Pyxida and we can put it on the web page. >>>>>>> >>>>>>> "Gossip" nodes are for learning about the network. "Neighbors" are >>>>>>> for >>>>>>> constructing the coordinate. >>>>>>> >>>>>>> So, IIRC, the gossip process learns about nodes that may or may not >>>>>>> be >>>>>>> applied to the neighbor list. That is, we may eventually learn >>>>>>> about a >>>>>>> large portion of the network, via gossiping, but should only be using >>>>>>> MAX_NEIGHBORS at any given time for coordinate computation. The >>>>>>> reason >>>>>>> we don't want to use a fixed set of neighbors for coordinate >>>>>>> computation >>>>>>> is because nodes may come and go from the system. However, it should >>>>>>> not be the case that we just rotate through everybody, because >>>>>>> then, as >>>>>>> you note, we'd be computing against everyone's coordinate/latency, >>>>>>> and >>>>>>> this is counter to the point of network coordinates. With this >>>>>>> gradual >>>>>>> transience as a desirable quality in mind, I am pretty sure we >>>>>>> maintain >>>>>>> neighbors in a fairly fixed manner. Does this not seem to be the >>>>>>> case >>>>>>> to you? >>>>>>> >>>>>>> -jonathan >>>>>>> >>>>>>> On Oct 29, 2008, at 9:20 PM, Eric Chan-Tin wrote: >>>>>>> >>>>>>>> Hi Jonathan and Peter, >>>>>>>> >>>>>>>> First of all, just wanted to say that Pyxida has been working >>>>>>>> great on >>>>>>>> Planetlab for the past 4 months or so. Getting good data and >>>>>>>> experiments >>>>>>>> going well. >>>>>>>> >>>>>>>> However, I do have one question/comment. >>>>>>>> >>>>>>>> Every Pyxida node contains a list of Neighbors (in NCClient) and a >>>>>>>> list >>>>>>>> of Gossip nodes (they are called neighbors in NCManager but I will >>>>>>>> call >>>>>>>> them gossip for now so as not to confuse with the 'real' neighbors >>>>>>>> in >>>>>>>> NCClient). Every 10 seconds, Pyxida will pick a random node (call >>>>>>>> it A) >>>>>>>> from Gossip and send a gossip request (basically asking for rtt, >>>>>>>> coordinate, error). But Gossip contains all the nodes in the >>>>>>>> network. >>>>>>>> The comment in the code said that was to avoid Pyxida becoming >>>>>>>> lonely >>>>>>>> and no-one to gossip with, which makes sense. >>>>>>>> >>>>>>>> Once A responds, Pyxida will add A to its Neighbors (if Neighbors >>>>>>>> does >>>>>>>> not contain A) and remove the first element from the Neighbors list. >>>>>>>> This seems a little bit odd. Was there any particular reasoning >>>>>>>> behind >>>>>>>> this? Because it would seem like you're getting the coordinate from >>>>>>>> every node in the network eventually. >>>>>>>> >>>>>>>> Thanks a lot, >>>>>>>> >>>>>>>> -Eric >>>>>>>> >>>>>>>> Jonathan Ledlie wrote: >>>>>>>>> Your inclination is correct. See forwarded message. >>>>>>>>> >>>>>>>>> -jonathan >>>>>>>>> >>>>>>>>> On Apr 9, 2008, at 1:53 PM, ext Eric Chan-Tin wrote: >>>>>>>>> >>>>>>>>>> Thank you for the clarification. >>>>>>>>>> >>>>>>>>>> I've been reading through your published papers and code and I >>>>>>>>>> must >>>>>>>>>> say I learned a lot about how Pyxida works :) >>>>>>>>>> >>>>>>>>>> I still have a quick and easy (hopefully) question. Each Pyxida >>>>>>>>>> node >>>>>>>>>> has a small amount of neighbors and learn about other nodes >>>>>>>>>> through >>>>>>>>>> other nodes. I am running on about 200 machines, and it seems >>>>>>>>>> like all >>>>>>>>>> the Pyxida nodes are adding each other to their upNeighbors set. >>>>>>>>>> NCClient.java has the MAX_NEIGHBORS set to 512. >>>>>>>>>> >>>>>>>>>> I wanted to make sure that I am not misreading the code, but >>>>>>>>>> shouldn't >>>>>>>>>> each Pyxida node keep a small number of neighbors in its >>>>>>>>>> upNeighbors >>>>>>>>>> set. Else, it will be a lot of updates to send a gossipMessage >>>>>>>>>> to all >>>>>>>>>> 200 neighbors, and it also means that a Pyxida node has the whole >>>>>>>>>> network in its upNeighbors set - that seems wrong. I am inclined >>>>>>>>>> to >>>>>>>>>> change the value of MAX_NEIGHBORS but wanted to make sure that my >>>>>>>>>> understanding is correct first. >>>>>>>>>> >>>>>>>>>> Thank you, >>>>>>>>>> >>>>>>>>>> Eric >>>>>>>>>> >>>>>>>> >>>>>>>> ------------------------------------------------------------------------- >>>>>>>> >>>>>>>> This SF.Net email is sponsored by the Moblin Your Move Developer's >>>>>>>> challenge Build the coolest Linux based applications with Moblin >>>>>>>> SDK & >>>>>>>> win great prizes Grand prize is a trip for two to an Open Source >>>>>>>> event >>>>>>>> anywhere in the world >>>>>>>> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >>>>>>>> _______________________________________________ Pyxida-users mailing >>>>>>>> list Pyx...@li... >>>>>>>> https://lists.sourceforge.net/lists/listinfo/pyxida-users >>>> ------------------------------------------------------------------------- >>>> This SF.Net email is sponsored by the Moblin Your Move Developer's >>>> challenge >>>> Build the coolest Linux based applications with Moblin SDK & win great >>>> prizes >>>> Grand prize is a trip for two to an Open Source event anywhere in the >>>> world >>>> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >>>> _______________________________________________ >>>> Pyxida-users mailing list >>>> Pyx...@li... >>>> https://lists.sourceforge.net/lists/listinfo/pyxida-users >> ------------------------------------------------------------------------------ >> SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada. >> The future of the web can't happen without you. Join us at MIX09 to help >> pave the way to the Next Web now. Learn more and register at >> http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/ >> _______________________________________________ >> Pyxida-users mailing list >> Pyx...@li... >> https://lists.sourceforge.net/lists/listinfo/pyxida-users > > |
From: Peter P. <pr...@do...> - 2008-12-09 09:37:04
|
Hi, Eric Chan-Tin wrote: > for (RemoteState<T> neighbor : neighbors) { //this is line 1029 > if (neighbor.getLastUpdateTime() < (curr_time - MAINTENANCE_PERIOD)) { > neighbors.remove(neighbor); > } > } IIRC you can't modify a collection while iterating over it using the above for loop. Instead you have to use a proper Iterator<RemoteState<T>> and use it's remove method. Cheers, Peter > Jonathan Ledlie wrote: >> On Nov 3, 2008, at 3:46 PM, Eric Chan-Tin wrote: >> >>> Jonathan, >>> >>> I can insert some code to fix this and it doesn't look too hard but I do >>> have some concerns. >>> >>>> I don't remember observing this behavior in our PL >>>> deployment, but that could have been because we had the ping time set >>>> to 1-2 minutes, IIRC. >>> The default ping time was set to 10 seconds + some Gaussian distribution, >>> so that's what I used, but changing the ping time to 1 minute or more >>> will >>> result in the same effect except it will take longer for the "cycle" to >>> occur. >>> >>> What I have in mind was the following: >>> >>> 1) In NCClient.addNeighbor() would be changed to (in pseudocode) >>> protected boolean addNeighbor(RemoteState<T> guy) { >>> boolean added = false; >>> //if neighbors.size() >= MAX_NEIGHBORS >>> //Check last update time of each neighbor in neighbors >>> //neighbor.getLastUpdateTime() >>> //if any of them is older than some number TIMEOUT, >>> remove the oldest one, then add "guy" to neighbors >>> //else don't add "guy" to neighbors >>> //else add "guy" to neighbors >>> } >>> >> IMO this shouldn't be the behavior of this function because it is called >> "addNeighbor." Instead what could happen is that part of the >> performMaintenance() function could remove old neighbors. >> >>> 2) In NCManager, getGossipNode() would just pick a random node from >>> localNC.primaryNC.getNeighbors() to send a request to every 10 seconds. >>> >>> 3) Pyxida will still keep track of all GossipNodes in the network. It >>> could >>> also (say every 10 minutes) check its NCClient.neighbors list and see if >>> any neighbor is older than TIMEOUT, remove it, then pick a random >>> GossipNode to send a ping request. >>> >>> The main problem with this scheme is that one Pyxida node would most >>> likely >>> stay with the same set of neighbors all the time. Its coordinates >>> would be >>> greatly affected by those 32 nodes (I am assuming MAX_NEIGHBORS = 32). >>> >> Instead of (2), why not have the getNeighborToPing function determine >> the behavior? If it returns a node, then ping it. If it returns null, >> then have the NCManager pick some other node to use? If things are >> hunky-dory, getNeighborToPing() would return a node (a current member of >> the neighbor set), if we're short on neighbors, then it would return >> null. The periodic maintenance would have already tossed old nodes, so >> this would be self-sustaining. >> >> The "problem" you mention is fairly central to network coordinates in >> general. Sure the coordinates will be a little better the more >> neighbors you have. But in a very large system talking to everybody >> isn't feasible. Peter and I have a graph in one of the workshop papers >> which shows the effect of increasing the number of neighbors >> >>> I don't think the network would be partitioned, but it could happen. For >>> example, the boostrap nodes have each other in their neighbors list. >>> Every >>> new node bootstrapping to the network would contact those bootstrapping >>> nodes first and add them to their neighbors. So every Pyxida node could >>> have as its neighbors the "Bootstrap nodes", which would make Pyxida >>> become >>> a centralized system. >>> >> As long as the initial bootstrap list is large enough, I think you are >> right, that partitions won't be a problem. >> >>> One last thing: When Pyxida receives a ping request from a node, it does >>> not update its coordinates. Is there any particular reason for that? >>> Is it >>> because Pyxida will not be able to calculate the RTT? >>> >> Yes. >> >>> Also, there is no "gossiping" of other nodes' coordinates, right? So if A >>> has B in its neighbors list, it will from time to time tell other nodes >>> about B's coordinates. I realize this has a lot of complications but the >>> original Vivaldi paper was very fuzzy about this point and the previous >>> point that it seems it is up to the implementors to decide what to do. >>> >> I don't think we are doing this. It could be useful in certain >> circumstance (e.g., picking potential new neighbors), but we're not, IIRC. >> >>> I will have to think some more about this as there are a lot of nasty >>> little side effects that could happen, but these are my thoughts. >>> >>> Thanks, >>> >>> -Eric >>> >> Sounds good. >> >> -jonathan >> >>> On Nov 3 2008, Jonathan Ledlie wrote: >>> >>>> Eric, >>>> >>>> Based on a look at the code just now, I see where you are coming from: >>>> with these params on PL, there isn't anything stopping the neighbor >>>> list from circulating pretty quickly through all nodes. In the >>>> Azureus implementation, the guys fed to the alg (fed to processSample) >>>> weren't changing much because they were coming from the routing finger >>>> tables. I don't remember observing this behavior in our PL >>>> deployment, but that could have been because we had the ping time set >>>> to 1-2 minutes, IIRC. >>>> >>>> Do you want to insert some code that will fix this? Basically we want >>>> to only add a new neighbor if our oldest neighbor should be tossed (or >>>> if we haven't reached MAX_NEIGHBORS). It's probably best from a >>>> clarity standpoint if addNeighbor does continue to actually add the >>>> neighbor in all cases. Probably the best thing to do is to have >>>> NCClient drop old neighbors internally and have NCManager ask it which >>>> nodes should be pinged via getNeighborToPing. So NCClient would >>>> continue to have the same behavior on processSample (it would not >>>> reject new information) and NCManager and NCClient would cooperate to >>>> measure RTTs to current neighbors. We do want to make sure, however, >>>> that the gossip set stays large enough so that partitions don't occur. >>>> >>>> Do you want to be added to the SourceForge Pyxida developers to make >>>> this change? (Please test it out carefully first.) >>>> >>>> -jonathan >>>> >>>> On Oct 30, 2008, at 11:32 AM, ext Eric Chan-Tin wrote: >>>> >>>>> Jonathan, >>>>> >>>>> From what I saw, eventually, you learn about all the nodes in the >>>>> network. Every 10 seconds, Pyxida picks a random "Gossip" node from >>>>> its >>>>> list and sends a request. The Gossip node will reply back with an >>>>> "upNeighbor", that is, a "Gossip" node that is "up" or with 10% >>>>> probability, a node that is "pending". Since naturally, all the nodes >>>>> bootstrap off the same set of nodes, every node will eventually know >>>>> of >>>>> every other node. >>>>> >>>>> Pyxida might not be rotating through everybody continuously (although >>>>> eventually, since it knows everybody, it will have added everybody to >>>>> its Neighbors list at some point), but could be removing a node that >>>>> is >>>>> "alive". So, if MAX_NEIGHBORS = 32, let's call them A_1, A_2, ..., >>>>> A_32. >>>>> Pyxida picks a random Gossip node B, sends it a request and B replies >>>>> back. Pyxida will add B to its Neighbors list, see that size of >>>>> Neighbors > MAX_NEIGHBORS, kick A_1 out. Then from those 32 neighbors >>>>> A_2, ...., A_32, B, it will calculate the force, and apply it to the >>>>> current system coordinates. Thus, if every 10 seconds, Pyxida picks a >>>>> different set of nodes C, D, E, etc... it would keep kicking nodes out >>>>> of the neighbors list. >>>>> >>>>> It's the "kicking A_1 out" part that seemed a bit confusing. Churn >>>>> is a >>>>> good reason but you don't check that A_1 might still be alive or maybe >>>>> A_1 was just added. I have been printing the neighbors list every 10 >>>>> minutes, and they are completely different between those 10 minutes. >>>>> >>>>> One possible problem might be that I am choosing the value of >>>>> MAX_NEIGHBORS to be too small. Originally it was set to 512 (which was >>>>> meant for Azureus). Since I am using Planetlab where there are about >>>>> 400-500 "stable" nodes, I changed the value to 32. But I don't think >>>>> that should be a problem since log(n) = log(512) = 9. >>>>> >>>>> Thanks, >>>>> >>>>> -Eric >>>>> >>>>> Jonathan Ledlie wrote: >>>>>> Eric, >>>>>> >>>>>> Glad to hear it. Let Peter and me know if you publish anything using >>>>>> Pyxida and we can put it on the web page. >>>>>> >>>>>> "Gossip" nodes are for learning about the network. "Neighbors" are >>>>>> for >>>>>> constructing the coordinate. >>>>>> >>>>>> So, IIRC, the gossip process learns about nodes that may or may not >>>>>> be >>>>>> applied to the neighbor list. That is, we may eventually learn >>>>>> about a >>>>>> large portion of the network, via gossiping, but should only be using >>>>>> MAX_NEIGHBORS at any given time for coordinate computation. The >>>>>> reason >>>>>> we don't want to use a fixed set of neighbors for coordinate >>>>>> computation >>>>>> is because nodes may come and go from the system. However, it should >>>>>> not be the case that we just rotate through everybody, because >>>>>> then, as >>>>>> you note, we'd be computing against everyone's coordinate/latency, >>>>>> and >>>>>> this is counter to the point of network coordinates. With this >>>>>> gradual >>>>>> transience as a desirable quality in mind, I am pretty sure we >>>>>> maintain >>>>>> neighbors in a fairly fixed manner. Does this not seem to be the >>>>>> case >>>>>> to you? >>>>>> >>>>>> -jonathan >>>>>> >>>>>> On Oct 29, 2008, at 9:20 PM, Eric Chan-Tin wrote: >>>>>> >>>>>>> Hi Jonathan and Peter, >>>>>>> >>>>>>> First of all, just wanted to say that Pyxida has been working >>>>>>> great on >>>>>>> Planetlab for the past 4 months or so. Getting good data and >>>>>>> experiments >>>>>>> going well. >>>>>>> >>>>>>> However, I do have one question/comment. >>>>>>> >>>>>>> Every Pyxida node contains a list of Neighbors (in NCClient) and a >>>>>>> list >>>>>>> of Gossip nodes (they are called neighbors in NCManager but I will >>>>>>> call >>>>>>> them gossip for now so as not to confuse with the 'real' neighbors >>>>>>> in >>>>>>> NCClient). Every 10 seconds, Pyxida will pick a random node (call >>>>>>> it A) >>>>>>> from Gossip and send a gossip request (basically asking for rtt, >>>>>>> coordinate, error). But Gossip contains all the nodes in the >>>>>>> network. >>>>>>> The comment in the code said that was to avoid Pyxida becoming >>>>>>> lonely >>>>>>> and no-one to gossip with, which makes sense. >>>>>>> >>>>>>> Once A responds, Pyxida will add A to its Neighbors (if Neighbors >>>>>>> does >>>>>>> not contain A) and remove the first element from the Neighbors list. >>>>>>> This seems a little bit odd. Was there any particular reasoning >>>>>>> behind >>>>>>> this? Because it would seem like you're getting the coordinate from >>>>>>> every node in the network eventually. >>>>>>> >>>>>>> Thanks a lot, >>>>>>> >>>>>>> -Eric >>>>>>> >>>>>>> Jonathan Ledlie wrote: >>>>>>>> Your inclination is correct. See forwarded message. >>>>>>>> >>>>>>>> -jonathan >>>>>>>> >>>>>>>> On Apr 9, 2008, at 1:53 PM, ext Eric Chan-Tin wrote: >>>>>>>> >>>>>>>>> Thank you for the clarification. >>>>>>>>> >>>>>>>>> I've been reading through your published papers and code and I >>>>>>>>> must >>>>>>>>> say I learned a lot about how Pyxida works :) >>>>>>>>> >>>>>>>>> I still have a quick and easy (hopefully) question. Each Pyxida >>>>>>>>> node >>>>>>>>> has a small amount of neighbors and learn about other nodes >>>>>>>>> through >>>>>>>>> other nodes. I am running on about 200 machines, and it seems >>>>>>>>> like all >>>>>>>>> the Pyxida nodes are adding each other to their upNeighbors set. >>>>>>>>> NCClient.java has the MAX_NEIGHBORS set to 512. >>>>>>>>> >>>>>>>>> I wanted to make sure that I am not misreading the code, but >>>>>>>>> shouldn't >>>>>>>>> each Pyxida node keep a small number of neighbors in its >>>>>>>>> upNeighbors >>>>>>>>> set. Else, it will be a lot of updates to send a gossipMessage >>>>>>>>> to all >>>>>>>>> 200 neighbors, and it also means that a Pyxida node has the whole >>>>>>>>> network in its upNeighbors set - that seems wrong. I am inclined >>>>>>>>> to >>>>>>>>> change the value of MAX_NEIGHBORS but wanted to make sure that my >>>>>>>>> understanding is correct first. >>>>>>>>> >>>>>>>>> Thank you, >>>>>>>>> >>>>>>>>> Eric >>>>>>>>> >>>>>>> >>>>>>> >>>>>>> ------------------------------------------------------------------------- >>>>>>> >>>>>>> This SF.Net email is sponsored by the Moblin Your Move Developer's >>>>>>> challenge Build the coolest Linux based applications with Moblin >>>>>>> SDK & >>>>>>> win great prizes Grand prize is a trip for two to an Open Source >>>>>>> event >>>>>>> anywhere in the world >>>>>>> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >>>>>>> _______________________________________________ Pyxida-users mailing >>>>>>> list Pyx...@li... >>>>>>> https://lists.sourceforge.net/lists/listinfo/pyxida-users >>> ------------------------------------------------------------------------- >>> This SF.Net email is sponsored by the Moblin Your Move Developer's >>> challenge >>> Build the coolest Linux based applications with Moblin SDK & win great >>> prizes >>> Grand prize is a trip for two to an Open Source event anywhere in the >>> world >>> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >>> _______________________________________________ >>> Pyxida-users mailing list >>> Pyx...@li... >>> https://lists.sourceforge.net/lists/listinfo/pyxida-users > > ------------------------------------------------------------------------------ > SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada. > The future of the web can't happen without you. Join us at MIX09 to help > pave the way to the Next Web now. Learn more and register at > http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/ > _______________________________________________ > Pyxida-users mailing list > Pyx...@li... > https://lists.sourceforge.net/lists/listinfo/pyxida-users -- Dr Peter R Pietzuch Department of Computing, Imperial College London South Kensington Campus, 180 Queen's Gate, London SW7 2AZ, UK Telephone: +44 (20) 7594 8314 http://www.doc.ic.ac.uk/~prp |
From: Eric Chan-T. <dch...@cs...> - 2008-12-09 02:05:57
|
Jonathan, I'm done adding in the changes and been testing/debugging over the weekend. I think this is something that might be easier to ask. Pyxida is crashing at the following place: Exception in thread "main" java.util.ConcurrentModificationException at java.util.AbstractList$Itr.checkForComodification(AbstractList.java:449) at java.util.AbstractList$Itr.next(AbstractList.java:420) at edu.harvard.syrah.pyxida.nc.lib.NCClient.performMaintenance(NCClient.java:1029) at edu.harvard.syrah.pyxida.nc.lib.NCClient.processSample(NCClient.java:711) at edu.harvard.syrah.pyxida.nc.NCManager$CoordClient$2.cb(NCManager.java:675) at edu.harvard.syrah.pyxida.nc.NCManager$CoordClient$2.cb(NCManager.java:618) at edu.harvard.syrah.sbon.async.CallbacksIF$CB3.call(CallbacksIF.java:250) at edu.harvard.syrah.sbon.async.comm.obj.ObjComm$ReadConnHandler$4.cb(ObjComm.java:702) at edu.harvard.syrah.sbon.async.CallbacksIF$CB0.call(CallbacksIF.java:183) at edu.harvard.syrah.sbon.async.EL.handleNowEvent(EL.java:695) at edu.harvard.syrah.sbon.async.EL.handleEventQueues(EL.java:607) at edu.harvard.syrah.sbon.async.EL.main(EL.java:406) at edu.harvard.syrah.pyxida.Pyxida.main(Pyxida.java:169) In NCClient.performMaintenance, I have the following for (RemoteState<T> neighbor : neighbors) { //this is line 1029 if (neighbor.getLastUpdateTime() < (curr_time - MAINTENANCE_PERIOD)) { neighbors.remove(neighbor); } } I am not exactly sure why the exception is being thrown. Could it be that the method has to be synchronized (other methods that have the same line like updateSystemCoordinates are not synchronized)? or should I use an iterator instead? If you do not know, I'll just try the above and see which one works :) Thanks, -Eric Jonathan Ledlie wrote: > > On Nov 3, 2008, at 3:46 PM, Eric Chan-Tin wrote: > >> Jonathan, >> >> I can insert some code to fix this and it doesn't look too hard but I do >> have some concerns. >> >>> I don't remember observing this behavior in our PL >>> deployment, but that could have been because we had the ping time set >>> to 1-2 minutes, IIRC. >> >> The default ping time was set to 10 seconds + some Gaussian distribution, >> so that's what I used, but changing the ping time to 1 minute or more >> will >> result in the same effect except it will take longer for the "cycle" to >> occur. >> >> What I have in mind was the following: >> >> 1) In NCClient.addNeighbor() would be changed to (in pseudocode) >> protected boolean addNeighbor(RemoteState<T> guy) { >> boolean added = false; >> //if neighbors.size() >= MAX_NEIGHBORS >> //Check last update time of each neighbor in neighbors >> //neighbor.getLastUpdateTime() >> //if any of them is older than some number TIMEOUT, >> remove the oldest one, then add "guy" to neighbors >> //else don't add "guy" to neighbors >> //else add "guy" to neighbors >> } >> > > IMO this shouldn't be the behavior of this function because it is called > "addNeighbor." Instead what could happen is that part of the > performMaintenance() function could remove old neighbors. > >> 2) In NCManager, getGossipNode() would just pick a random node from >> localNC.primaryNC.getNeighbors() to send a request to every 10 seconds. >> >> 3) Pyxida will still keep track of all GossipNodes in the network. It >> could >> also (say every 10 minutes) check its NCClient.neighbors list and see if >> any neighbor is older than TIMEOUT, remove it, then pick a random >> GossipNode to send a ping request. >> >> The main problem with this scheme is that one Pyxida node would most >> likely >> stay with the same set of neighbors all the time. Its coordinates >> would be >> greatly affected by those 32 nodes (I am assuming MAX_NEIGHBORS = 32). >> > > Instead of (2), why not have the getNeighborToPing function determine > the behavior? If it returns a node, then ping it. If it returns null, > then have the NCManager pick some other node to use? If things are > hunky-dory, getNeighborToPing() would return a node (a current member of > the neighbor set), if we're short on neighbors, then it would return > null. The periodic maintenance would have already tossed old nodes, so > this would be self-sustaining. > > The "problem" you mention is fairly central to network coordinates in > general. Sure the coordinates will be a little better the more > neighbors you have. But in a very large system talking to everybody > isn't feasible. Peter and I have a graph in one of the workshop papers > which shows the effect of increasing the number of neighbors > >> I don't think the network would be partitioned, but it could happen. For >> example, the boostrap nodes have each other in their neighbors list. >> Every >> new node bootstrapping to the network would contact those bootstrapping >> nodes first and add them to their neighbors. So every Pyxida node could >> have as its neighbors the "Bootstrap nodes", which would make Pyxida >> become >> a centralized system. >> > > As long as the initial bootstrap list is large enough, I think you are > right, that partitions won't be a problem. > >> One last thing: When Pyxida receives a ping request from a node, it does >> not update its coordinates. Is there any particular reason for that? >> Is it >> because Pyxida will not be able to calculate the RTT? >> > > Yes. > >> Also, there is no "gossiping" of other nodes' coordinates, right? So if A >> has B in its neighbors list, it will from time to time tell other nodes >> about B's coordinates. I realize this has a lot of complications but the >> original Vivaldi paper was very fuzzy about this point and the previous >> point that it seems it is up to the implementors to decide what to do. >> > > I don't think we are doing this. It could be useful in certain > circumstance (e.g., picking potential new neighbors), but we're not, IIRC. > >> I will have to think some more about this as there are a lot of nasty >> little side effects that could happen, but these are my thoughts. >> >> Thanks, >> >> -Eric >> > > Sounds good. > > -jonathan > >> On Nov 3 2008, Jonathan Ledlie wrote: >> >>> >>> Eric, >>> >>> Based on a look at the code just now, I see where you are coming from: >>> with these params on PL, there isn't anything stopping the neighbor >>> list from circulating pretty quickly through all nodes. In the >>> Azureus implementation, the guys fed to the alg (fed to processSample) >>> weren't changing much because they were coming from the routing finger >>> tables. I don't remember observing this behavior in our PL >>> deployment, but that could have been because we had the ping time set >>> to 1-2 minutes, IIRC. >>> >>> Do you want to insert some code that will fix this? Basically we want >>> to only add a new neighbor if our oldest neighbor should be tossed (or >>> if we haven't reached MAX_NEIGHBORS). It's probably best from a >>> clarity standpoint if addNeighbor does continue to actually add the >>> neighbor in all cases. Probably the best thing to do is to have >>> NCClient drop old neighbors internally and have NCManager ask it which >>> nodes should be pinged via getNeighborToPing. So NCClient would >>> continue to have the same behavior on processSample (it would not >>> reject new information) and NCManager and NCClient would cooperate to >>> measure RTTs to current neighbors. We do want to make sure, however, >>> that the gossip set stays large enough so that partitions don't occur. >>> >>> Do you want to be added to the SourceForge Pyxida developers to make >>> this change? (Please test it out carefully first.) >>> >>> -jonathan >>> >>> On Oct 30, 2008, at 11:32 AM, ext Eric Chan-Tin wrote: >>> >>>> Jonathan, >>>> >>>> From what I saw, eventually, you learn about all the nodes in the >>>> network. Every 10 seconds, Pyxida picks a random "Gossip" node from >>>> its >>>> list and sends a request. The Gossip node will reply back with an >>>> "upNeighbor", that is, a "Gossip" node that is "up" or with 10% >>>> probability, a node that is "pending". Since naturally, all the nodes >>>> bootstrap off the same set of nodes, every node will eventually know >>>> of >>>> every other node. >>>> >>>> Pyxida might not be rotating through everybody continuously (although >>>> eventually, since it knows everybody, it will have added everybody to >>>> its Neighbors list at some point), but could be removing a node that >>>> is >>>> "alive". So, if MAX_NEIGHBORS = 32, let's call them A_1, A_2, ..., >>>> A_32. >>>> Pyxida picks a random Gossip node B, sends it a request and B replies >>>> back. Pyxida will add B to its Neighbors list, see that size of >>>> Neighbors > MAX_NEIGHBORS, kick A_1 out. Then from those 32 neighbors >>>> A_2, ...., A_32, B, it will calculate the force, and apply it to the >>>> current system coordinates. Thus, if every 10 seconds, Pyxida picks a >>>> different set of nodes C, D, E, etc... it would keep kicking nodes out >>>> of the neighbors list. >>>> >>>> It's the "kicking A_1 out" part that seemed a bit confusing. Churn >>>> is a >>>> good reason but you don't check that A_1 might still be alive or maybe >>>> A_1 was just added. I have been printing the neighbors list every 10 >>>> minutes, and they are completely different between those 10 minutes. >>>> >>>> One possible problem might be that I am choosing the value of >>>> MAX_NEIGHBORS to be too small. Originally it was set to 512 (which was >>>> meant for Azureus). Since I am using Planetlab where there are about >>>> 400-500 "stable" nodes, I changed the value to 32. But I don't think >>>> that should be a problem since log(n) = log(512) = 9. >>>> >>>> Thanks, >>>> >>>> -Eric >>>> >>>> Jonathan Ledlie wrote: >>>>> >>>>> Eric, >>>>> >>>>> Glad to hear it. Let Peter and me know if you publish anything using >>>>> Pyxida and we can put it on the web page. >>>>> >>>>> "Gossip" nodes are for learning about the network. "Neighbors" are >>>>> for >>>>> constructing the coordinate. >>>>> >>>>> So, IIRC, the gossip process learns about nodes that may or may not >>>>> be >>>>> applied to the neighbor list. That is, we may eventually learn >>>>> about a >>>>> large portion of the network, via gossiping, but should only be using >>>>> MAX_NEIGHBORS at any given time for coordinate computation. The >>>>> reason >>>>> we don't want to use a fixed set of neighbors for coordinate >>>>> computation >>>>> is because nodes may come and go from the system. However, it should >>>>> not be the case that we just rotate through everybody, because >>>>> then, as >>>>> you note, we'd be computing against everyone's coordinate/latency, >>>>> and >>>>> this is counter to the point of network coordinates. With this >>>>> gradual >>>>> transience as a desirable quality in mind, I am pretty sure we >>>>> maintain >>>>> neighbors in a fairly fixed manner. Does this not seem to be the >>>>> case >>>>> to you? >>>>> >>>>> -jonathan >>>>> >>>>> On Oct 29, 2008, at 9:20 PM, Eric Chan-Tin wrote: >>>>> >>>>>> Hi Jonathan and Peter, >>>>>> >>>>>> First of all, just wanted to say that Pyxida has been working >>>>>> great on >>>>>> Planetlab for the past 4 months or so. Getting good data and >>>>>> experiments >>>>>> going well. >>>>>> >>>>>> However, I do have one question/comment. >>>>>> >>>>>> Every Pyxida node contains a list of Neighbors (in NCClient) and a >>>>>> list >>>>>> of Gossip nodes (they are called neighbors in NCManager but I will >>>>>> call >>>>>> them gossip for now so as not to confuse with the 'real' neighbors >>>>>> in >>>>>> NCClient). Every 10 seconds, Pyxida will pick a random node (call >>>>>> it A) >>>>>> from Gossip and send a gossip request (basically asking for rtt, >>>>>> coordinate, error). But Gossip contains all the nodes in the >>>>>> network. >>>>>> The comment in the code said that was to avoid Pyxida becoming >>>>>> lonely >>>>>> and no-one to gossip with, which makes sense. >>>>>> >>>>>> Once A responds, Pyxida will add A to its Neighbors (if Neighbors >>>>>> does >>>>>> not contain A) and remove the first element from the Neighbors list. >>>>>> This seems a little bit odd. Was there any particular reasoning >>>>>> behind >>>>>> this? Because it would seem like you're getting the coordinate from >>>>>> every node in the network eventually. >>>>>> >>>>>> Thanks a lot, >>>>>> >>>>>> -Eric >>>>>> >>>>>> Jonathan Ledlie wrote: >>>>>>> >>>>>>> Your inclination is correct. See forwarded message. >>>>>>> >>>>>>> -jonathan >>>>>>> >>>>>>> On Apr 9, 2008, at 1:53 PM, ext Eric Chan-Tin wrote: >>>>>>> >>>>>>>> Thank you for the clarification. >>>>>>>> >>>>>>>> I've been reading through your published papers and code and I >>>>>>>> must >>>>>>>> say I learned a lot about how Pyxida works :) >>>>>>>> >>>>>>>> I still have a quick and easy (hopefully) question. Each Pyxida >>>>>>>> node >>>>>>>> has a small amount of neighbors and learn about other nodes >>>>>>>> through >>>>>>>> other nodes. I am running on about 200 machines, and it seems >>>>>>>> like all >>>>>>>> the Pyxida nodes are adding each other to their upNeighbors set. >>>>>>>> NCClient.java has the MAX_NEIGHBORS set to 512. >>>>>>>> >>>>>>>> I wanted to make sure that I am not misreading the code, but >>>>>>>> shouldn't >>>>>>>> each Pyxida node keep a small number of neighbors in its >>>>>>>> upNeighbors >>>>>>>> set. Else, it will be a lot of updates to send a gossipMessage >>>>>>>> to all >>>>>>>> 200 neighbors, and it also means that a Pyxida node has the whole >>>>>>>> network in its upNeighbors set - that seems wrong. I am inclined >>>>>>>> to >>>>>>>> change the value of MAX_NEIGHBORS but wanted to make sure that my >>>>>>>> understanding is correct first. >>>>>>>> >>>>>>>> Thank you, >>>>>>>> >>>>>>>> Eric >>>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> ------------------------------------------------------------------------- >>>>>> >>>>>> This SF.Net email is sponsored by the Moblin Your Move Developer's >>>>>> challenge Build the coolest Linux based applications with Moblin >>>>>> SDK & >>>>>> win great prizes Grand prize is a trip for two to an Open Source >>>>>> event >>>>>> anywhere in the world >>>>>> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >>>>>> _______________________________________________ Pyxida-users mailing >>>>>> list Pyx...@li... >>>>>> https://lists.sourceforge.net/lists/listinfo/pyxida-users >>> >> >> ------------------------------------------------------------------------- >> This SF.Net email is sponsored by the Moblin Your Move Developer's >> challenge >> Build the coolest Linux based applications with Moblin SDK & win great >> prizes >> Grand prize is a trip for two to an Open Source event anywhere in the >> world >> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >> _______________________________________________ >> Pyxida-users mailing list >> Pyx...@li... >> https://lists.sourceforge.net/lists/listinfo/pyxida-users |
From: Jonathan L. <jon...@ee...> - 2008-12-03 18:07:53
|
Eric, Ah, that's a good point you raise. I concur. -jonathan On Dec 2, 2008, at 5:17 PM, Eric Chan-Tin wrote: > Sorry for the late reply. Was busy getting a paper ready for > submission. I > will make the changes as you described. Should be fairly > straightforward. > > One small thing though: I will have a time (say 1 hour) where Pyxida > will > do as it is doing now. This allows it to build its list of Gossip > nodes. > After 1 hour, then it will behave with the fix. This is to prevent the > following situation. > > neighbors list can contain max of N neighbors. There are ~ N bootstrap > nodes. Every bootstrap node will have each other in their neighbors > list > (which is okay). Every new node will have a subset of the bootstrap > nodes > in their neighbors list. If there are N or a little bit more than N > bootstrap nodes, then every node will basically have the same set of > nodes > as neighbors. > > To avoid this, the neighbors are "permanent" after 1 hour, then each > Pyxida > node would have randomly picked N nodes as its neighbors. > > I hope this made sense. I will send the snippets of code by email > unless > you wanted it via another method (for example, I post the java code > on my > website and you can download from there). > > -Eric > > On Nov 4 2008, Jonathan Ledlie wrote: > >> >> On Nov 3, 2008, at 3:46 PM, Eric Chan-Tin wrote: >> >>> Jonathan, >>> >>> I can insert some code to fix this and it doesn't look too hard but >>> I do >>> have some concerns. >>> >>>> I don't remember observing this behavior in our PL >>>> deployment, but that could have been because we had the ping time >>>> set >>>> to 1-2 minutes, IIRC. >>> >>> The default ping time was set to 10 seconds + some Gaussian >>> distribution, >>> so that's what I used, but changing the ping time to 1 minute or >>> more will >>> result in the same effect except it will take longer for the "cycle" >>> to >>> occur. >>> >>> What I have in mind was the following: >>> >>> 1) In NCClient.addNeighbor() would be changed to (in pseudocode) >>> protected boolean addNeighbor(RemoteState<T> guy) { >>> boolean added = false; >>> //if neighbors.size() >= MAX_NEIGHBORS >>> //Check last update time of each neighbor in >>> neighbors >>> //neighbor.getLastUpdateTime() >>> //if any of them is older than some number TIMEOUT, >>> remove the oldest one, then add "guy" to neighbors >>> //else don't add "guy" to neighbors >>> //else add "guy" to neighbors >>> } >>> >> >> IMO this shouldn't be the behavior of this function because it is >> called "addNeighbor." Instead what could happen is that part of the >> performMaintenance() function could remove old neighbors. >> >>> 2) In NCManager, getGossipNode() would just pick a random node from >>> localNC.primaryNC.getNeighbors() to send a request to every 10 >>> seconds. >>> >>> 3) Pyxida will still keep track of all GossipNodes in the network. >>> It could >>> also (say every 10 minutes) check its NCClient.neighbors list and >>> see if >>> any neighbor is older than TIMEOUT, remove it, then pick a random >>> GossipNode to send a ping request. >>> >>> The main problem with this scheme is that one Pyxida node would most >>> likely >>> stay with the same set of neighbors all the time. Its coordinates >>> would be >>> greatly affected by those 32 nodes (I am assuming MAX_NEIGHBORS = >>> 32). >>> >> >> Instead of (2), why not have the getNeighborToPing function determine >> the behavior? If it returns a node, then ping it. If it returns >> null, then have the NCManager pick some other node to use? If things >> are hunky-dory, getNeighborToPing() would return a node (a current >> member of the neighbor set), if we're short on neighbors, then it >> would return null. The periodic maintenance would have already >> tossed >> old nodes, so this would be self-sustaining. >> >> The "problem" you mention is fairly central to network coordinates in >> general. Sure the coordinates will be a little better the more >> neighbors you have. But in a very large system talking to everybody >> isn't feasible. Peter and I have a graph in one of the workshop >> papers which shows the effect of increasing the number of neighbors >> >>> I don't think the network would be partitioned, but it could happen. >>> For >>> example, the boostrap nodes have each other in their neighbors list. >>> Every >>> new node bootstrapping to the network would contact those >>> bootstrapping >>> nodes first and add them to their neighbors. So every Pyxida node >>> could >>> have as its neighbors the "Bootstrap nodes", which would make Pyxida >>> become >>> a centralized system. >>> >> >> As long as the initial bootstrap list is large enough, I think you >> are >> right, that partitions won't be a problem. >> >>> One last thing: When Pyxida receives a ping request from a node, it >>> does >>> not update its coordinates. Is there any particular reason for that? >>> Is it >>> because Pyxida will not be able to calculate the RTT? >>> >> >> Yes. >> >>> Also, there is no "gossiping" of other nodes' coordinates, right? So >>> if A >>> has B in its neighbors list, it will from time to time tell other >>> nodes >>> about B's coordinates. I realize this has a lot of complications but >>> the >>> original Vivaldi paper was very fuzzy about this point and the >>> previous >>> point that it seems it is up to the implementors to decide what to >>> do. >>> >> >> I don't think we are doing this. It could be useful in certain >> circumstance (e.g., picking potential new neighbors), but we're not, >> IIRC. >> >>> I will have to think some more about this as there are a lot of >>> nasty >>> little side effects that could happen, but these are my thoughts. >>> >>> Thanks, >>> >>> -Eric >>> >> >> Sounds good. >> >> -jonathan >> >>> On Nov 3 2008, Jonathan Ledlie wrote: >>> >>>> >>>> Eric, >>>> >>>> Based on a look at the code just now, I see where you are coming >>>> from: >>>> with these params on PL, there isn't anything stopping the neighbor >>>> list from circulating pretty quickly through all nodes. In the >>>> Azureus implementation, the guys fed to the alg (fed to >>>> processSample) >>>> weren't changing much because they were coming from the routing >>>> finger >>>> tables. I don't remember observing this behavior in our PL >>>> deployment, but that could have been because we had the ping time >>>> set >>>> to 1-2 minutes, IIRC. >>>> >>>> Do you want to insert some code that will fix this? Basically we >>>> want >>>> to only add a new neighbor if our oldest neighbor should be tossed >>>> (or >>>> if we haven't reached MAX_NEIGHBORS). It's probably best from a >>>> clarity standpoint if addNeighbor does continue to actually add the >>>> neighbor in all cases. Probably the best thing to do is to have >>>> NCClient drop old neighbors internally and have NCManager ask it >>>> which >>>> nodes should be pinged via getNeighborToPing. So NCClient would >>>> continue to have the same behavior on processSample (it would not >>>> reject new information) and NCManager and NCClient would >>>> cooperate to >>>> measure RTTs to current neighbors. We do want to make sure, >>>> however, >>>> that the gossip set stays large enough so that partitions don't >>>> occur. >>>> >>>> Do you want to be added to the SourceForge Pyxida developers to >>>> make >>>> this change? (Please test it out carefully first.) >>>> >>>> -jonathan >>>> >>>> On Oct 30, 2008, at 11:32 AM, ext Eric Chan-Tin wrote: >>>> >>>>> Jonathan, >>>>> >>>>> From what I saw, eventually, you learn about all the nodes in the >>>>> network. Every 10 seconds, Pyxida picks a random "Gossip" node >>>>> from >>>>> its >>>>> list and sends a request. The Gossip node will reply back with an >>>>> "upNeighbor", that is, a "Gossip" node that is "up" or with 10% >>>>> probability, a node that is "pending". Since naturally, all the >>>>> nodes >>>>> bootstrap off the same set of nodes, every node will eventually >>>>> know >>>>> of >>>>> every other node. >>>>> >>>>> Pyxida might not be rotating through everybody continuously >>>>> (although >>>>> eventually, since it knows everybody, it will have added everybody >>>>> to >>>>> its Neighbors list at some point), but could be removing a node >>>>> that >>>>> is >>>>> "alive". So, if MAX_NEIGHBORS = 32, let's call them A_1, A_2, ..., >>>>> A_32. >>>>> Pyxida picks a random Gossip node B, sends it a request and B >>>>> replies >>>>> back. Pyxida will add B to its Neighbors list, see that size of >>>>> Neighbors > MAX_NEIGHBORS, kick A_1 out. Then from those 32 >>>>> neighbors >>>>> A_2, ...., A_32, B, it will calculate the force, and apply it to >>>>> the >>>>> current system coordinates. Thus, if every 10 seconds, Pyxida >>>>> picks a >>>>> different set of nodes C, D, E, etc... it would keep kicking nodes >>>>> out >>>>> of the neighbors list. >>>>> >>>>> It's the "kicking A_1 out" part that seemed a bit confusing. Churn >>>>> is a >>>>> good reason but you don't check that A_1 might still be alive or >>>>> maybe >>>>> A_1 was just added. I have been printing the neighbors list >>>>> every 10 >>>>> minutes, and they are completely different between those 10 >>>>> minutes. >>>>> >>>>> One possible problem might be that I am choosing the value of >>>>> MAX_NEIGHBORS to be too small. Originally it was set to 512 (which >>>>> was >>>>> meant for Azureus). Since I am using Planetlab where there are >>>>> about >>>>> 400-500 "stable" nodes, I changed the value to 32. But I don't >>>>> think >>>>> that should be a problem since log(n) = log(512) = 9. >>>>> >>>>> Thanks, >>>>> >>>>> -Eric >>>>> >>>>> Jonathan Ledlie wrote: >>>>>> >>>>>> Eric, >>>>>> >>>>>> Glad to hear it. Let Peter and me know if you publish anything >>>>>> using >>>>>> Pyxida and we can put it on the web page. >>>>>> >>>>>> "Gossip" nodes are for learning about the network. "Neighbors" >>>>>> are >>>>>> for >>>>>> constructing the coordinate. >>>>>> >>>>>> So, IIRC, the gossip process learns about nodes that may or may >>>>>> not >>>>>> be >>>>>> applied to the neighbor list. That is, we may eventually learn >>>>>> about a >>>>>> large portion of the network, via gossiping, but should only be >>>>>> using >>>>>> MAX_NEIGHBORS at any given time for coordinate computation. The >>>>>> reason >>>>>> we don't want to use a fixed set of neighbors for coordinate >>>>>> computation >>>>>> is because nodes may come and go from the system. However, it >>>>>> should >>>>>> not be the case that we just rotate through everybody, because >>>>>> then, as >>>>>> you note, we'd be computing against everyone's coordinate/ >>>>>> latency, >>>>>> and >>>>>> this is counter to the point of network coordinates. With this >>>>>> gradual >>>>>> transience as a desirable quality in mind, I am pretty sure we >>>>>> maintain >>>>>> neighbors in a fairly fixed manner. Does this not seem to be the >>>>>> case >>>>>> to you? >>>>>> >>>>>> -jonathan >>>>>> >>>>>> On Oct 29, 2008, at 9:20 PM, Eric Chan-Tin wrote: >>>>>> >>>>>>> Hi Jonathan and Peter, >>>>>>> >>>>>>> First of all, just wanted to say that Pyxida has been working >>>>>>> great on >>>>>>> Planetlab for the past 4 months or so. Getting good data and >>>>>>> experiments >>>>>>> going well. >>>>>>> >>>>>>> However, I do have one question/comment. >>>>>>> >>>>>>> Every Pyxida node contains a list of Neighbors (in NCClient) >>>>>>> and a >>>>>>> list >>>>>>> of Gossip nodes (they are called neighbors in NCManager but I >>>>>>> will >>>>>>> call >>>>>>> them gossip for now so as not to confuse with the 'real' >>>>>>> neighbors >>>>>>> in >>>>>>> NCClient). Every 10 seconds, Pyxida will pick a random node >>>>>>> (call >>>>>>> it A) >>>>>>> from Gossip and send a gossip request (basically asking for rtt, >>>>>>> coordinate, error). But Gossip contains all the nodes in the >>>>>>> network. >>>>>>> The comment in the code said that was to avoid Pyxida becoming >>>>>>> lonely >>>>>>> and no-one to gossip with, which makes sense. >>>>>>> >>>>>>> Once A responds, Pyxida will add A to its Neighbors (if >>>>>>> Neighbors >>>>>>> does >>>>>>> not contain A) and remove the first element from the Neighbors >>>>>>> list. >>>>>>> This seems a little bit odd. Was there any particular reasoning >>>>>>> behind >>>>>>> this? Because it would seem like you're getting the coordinate >>>>>>> from >>>>>>> every node in the network eventually. >>>>>>> >>>>>>> Thanks a lot, >>>>>>> >>>>>>> -Eric >>>>>>> >>>>>>> Jonathan Ledlie wrote: >>>>>>>> >>>>>>>> Your inclination is correct. See forwarded message. >>>>>>>> >>>>>>>> -jonathan >>>>>>>> >>>>>>>> On Apr 9, 2008, at 1:53 PM, ext Eric Chan-Tin wrote: >>>>>>>> >>>>>>>>> Thank you for the clarification. >>>>>>>>> >>>>>>>>> I've been reading through your published papers and code and I >>>>>>>>> must >>>>>>>>> say I learned a lot about how Pyxida works :) >>>>>>>>> >>>>>>>>> I still have a quick and easy (hopefully) question. Each >>>>>>>>> Pyxida >>>>>>>>> node >>>>>>>>> has a small amount of neighbors and learn about other nodes >>>>>>>>> through >>>>>>>>> other nodes. I am running on about 200 machines, and it seems >>>>>>>>> like all >>>>>>>>> the Pyxida nodes are adding each other to their upNeighbors >>>>>>>>> set. >>>>>>>>> NCClient.java has the MAX_NEIGHBORS set to 512. >>>>>>>>> >>>>>>>>> I wanted to make sure that I am not misreading the code, but >>>>>>>>> shouldn't >>>>>>>>> each Pyxida node keep a small number of neighbors in its >>>>>>>>> upNeighbors >>>>>>>>> set. Else, it will be a lot of updates to send a gossipMessage >>>>>>>>> to all >>>>>>>>> 200 neighbors, and it also means that a Pyxida node has the >>>>>>>>> whole >>>>>>>>> network in its upNeighbors set - that seems wrong. I am >>>>>>>>> inclined >>>>>>>>> to >>>>>>>>> change the value of MAX_NEIGHBORS but wanted to make sure that >>>>>>>>> my >>>>>>>>> understanding is correct first. >>>>>>>>> >>>>>>>>> Thank you, >>>>>>>>> >>>>>>>>> Eric >>>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> ------------------------------------------------------------------------- >>>>>>> This SF.Net email is sponsored by the Moblin Your Move >>>>>>> Developer's >>>>>>> challenge Build the coolest Linux based applications with >>>>>>> Moblin SDK >>>>>>> & win great prizes Grand prize is a trip for two to an Open >>>>>>> Source >>>>>>> event anywhere in the world >>>>>>> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >>>>>>> _______________________________________________ Pyxida-users >>>>>>> mailing >>>>>>> list Pyx...@li... >>>>>>> https://lists.sourceforge.net/lists/listinfo/pyxida-users >>>> >>> >>> ------------------------------------------------------------------------- >>> This SF.Net email is sponsored by the Moblin Your Move Developer's >>> challenge >>> Build the coolest Linux based applications with Moblin SDK & win >>> great prizes >>> Grand prize is a trip for two to an Open Source event anywhere in >>> the world >>> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >>> _______________________________________________ >>> Pyxida-users mailing list >>> Pyx...@li... >>> https://lists.sourceforge.net/lists/listinfo/pyxida-users >> > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's > challenge > Build the coolest Linux based applications with Moblin SDK & win > great prizes > Grand prize is a trip for two to an Open Source event anywhere in > the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Pyxida-users mailing list > Pyx...@li... > https://lists.sourceforge.net/lists/listinfo/pyxida-users -jonathan |
From: wgd(TsU) <wg...@ma...> - 2008-12-03 03:13:17
|
Hi, I am a student in tsinghua University (China). Recently I am trying to deploy vivaldi on planetlab. But it seems that I can not get NC of some nodes. I have found that the connection of xmlrpc was built successfully, but the rpc method pyxida.getLocalCoord didn't work. In my experiments , I have changed the api port and pyxida port at the same time. Please help me to handle this problem. Thank you ! Guodong Wang Lab of New Generation Network Department of Electronic Engineering Tsinghua University Beijing 100084 P.R. China |
From: Eric Chan-T. <dch...@cs...> - 2008-12-03 00:26:59
|
Sorry for the late reply. Was busy getting a paper ready for submission. I will make the changes as you described. Should be fairly straightforward. One small thing though: I will have a time (say 1 hour) where Pyxida will do as it is doing now. This allows it to build its list of Gossip nodes. After 1 hour, then it will behave with the fix. This is to prevent the following situation. neighbors list can contain max of N neighbors. There are ~ N bootstrap nodes. Every bootstrap node will have each other in their neighbors list (which is okay). Every new node will have a subset of the bootstrap nodes in their neighbors list. If there are N or a little bit more than N bootstrap nodes, then every node will basically have the same set of nodes as neighbors. To avoid this, the neighbors are "permanent" after 1 hour, then each Pyxida node would have randomly picked N nodes as its neighbors. I hope this made sense. I will send the snippets of code by email unless you wanted it via another method (for example, I post the java code on my website and you can download from there). -Eric On Nov 4 2008, Jonathan Ledlie wrote: > >On Nov 3, 2008, at 3:46 PM, Eric Chan-Tin wrote: > >> Jonathan, >> >> I can insert some code to fix this and it doesn't look too hard but >> I do >> have some concerns. >> >>> I don't remember observing this behavior in our PL >>> deployment, but that could have been because we had the ping time set >>> to 1-2 minutes, IIRC. >> >> The default ping time was set to 10 seconds + some Gaussian >> distribution, >> so that's what I used, but changing the ping time to 1 minute or >> more will >> result in the same effect except it will take longer for the "cycle" >> to >> occur. >> >> What I have in mind was the following: >> >> 1) In NCClient.addNeighbor() would be changed to (in pseudocode) >> protected boolean addNeighbor(RemoteState<T> guy) { >> boolean added = false; >> //if neighbors.size() >= MAX_NEIGHBORS >> //Check last update time of each neighbor in >> neighbors >> //neighbor.getLastUpdateTime() >> //if any of them is older than some number TIMEOUT, >> remove the oldest one, then add "guy" to neighbors >> //else don't add "guy" to neighbors >> //else add "guy" to neighbors >> } >> > >IMO this shouldn't be the behavior of this function because it is >called "addNeighbor." Instead what could happen is that part of the >performMaintenance() function could remove old neighbors. > >> 2) In NCManager, getGossipNode() would just pick a random node from >> localNC.primaryNC.getNeighbors() to send a request to every 10 >> seconds. >> >> 3) Pyxida will still keep track of all GossipNodes in the network. >> It could >> also (say every 10 minutes) check its NCClient.neighbors list and >> see if >> any neighbor is older than TIMEOUT, remove it, then pick a random >> GossipNode to send a ping request. >> >> The main problem with this scheme is that one Pyxida node would most >> likely >> stay with the same set of neighbors all the time. Its coordinates >> would be >> greatly affected by those 32 nodes (I am assuming MAX_NEIGHBORS = 32). >> > >Instead of (2), why not have the getNeighborToPing function determine >the behavior? If it returns a node, then ping it. If it returns >null, then have the NCManager pick some other node to use? If things >are hunky-dory, getNeighborToPing() would return a node (a current >member of the neighbor set), if we're short on neighbors, then it >would return null. The periodic maintenance would have already tossed >old nodes, so this would be self-sustaining. > >The "problem" you mention is fairly central to network coordinates in >general. Sure the coordinates will be a little better the more >neighbors you have. But in a very large system talking to everybody >isn't feasible. Peter and I have a graph in one of the workshop >papers which shows the effect of increasing the number of neighbors > >> I don't think the network would be partitioned, but it could happen. >> For >> example, the boostrap nodes have each other in their neighbors list. >> Every >> new node bootstrapping to the network would contact those >> bootstrapping >> nodes first and add them to their neighbors. So every Pyxida node >> could >> have as its neighbors the "Bootstrap nodes", which would make Pyxida >> become >> a centralized system. >> > >As long as the initial bootstrap list is large enough, I think you are >right, that partitions won't be a problem. > >> One last thing: When Pyxida receives a ping request from a node, it >> does >> not update its coordinates. Is there any particular reason for that? >> Is it >> because Pyxida will not be able to calculate the RTT? >> > >Yes. > >> Also, there is no "gossiping" of other nodes' coordinates, right? So >> if A >> has B in its neighbors list, it will from time to time tell other >> nodes >> about B's coordinates. I realize this has a lot of complications but >> the >> original Vivaldi paper was very fuzzy about this point and the >> previous >> point that it seems it is up to the implementors to decide what to do. >> > >I don't think we are doing this. It could be useful in certain >circumstance (e.g., picking potential new neighbors), but we're not, >IIRC. > >> I will have to think some more about this as there are a lot of nasty >> little side effects that could happen, but these are my thoughts. >> >> Thanks, >> >> -Eric >> > >Sounds good. > >-jonathan > >> On Nov 3 2008, Jonathan Ledlie wrote: >> >>> >>> Eric, >>> >>> Based on a look at the code just now, I see where you are coming >>> from: >>> with these params on PL, there isn't anything stopping the neighbor >>> list from circulating pretty quickly through all nodes. In the >>> Azureus implementation, the guys fed to the alg (fed to >>> processSample) >>> weren't changing much because they were coming from the routing >>> finger >>> tables. I don't remember observing this behavior in our PL >>> deployment, but that could have been because we had the ping time set >>> to 1-2 minutes, IIRC. >>> >>> Do you want to insert some code that will fix this? Basically we >>> want >>> to only add a new neighbor if our oldest neighbor should be tossed >>> (or >>> if we haven't reached MAX_NEIGHBORS). It's probably best from a >>> clarity standpoint if addNeighbor does continue to actually add the >>> neighbor in all cases. Probably the best thing to do is to have >>> NCClient drop old neighbors internally and have NCManager ask it >>> which >>> nodes should be pinged via getNeighborToPing. So NCClient would >>> continue to have the same behavior on processSample (it would not >>> reject new information) and NCManager and NCClient would cooperate to >>> measure RTTs to current neighbors. We do want to make sure, however, >>> that the gossip set stays large enough so that partitions don't >>> occur. >>> >>> Do you want to be added to the SourceForge Pyxida developers to make >>> this change? (Please test it out carefully first.) >>> >>> -jonathan >>> >>> On Oct 30, 2008, at 11:32 AM, ext Eric Chan-Tin wrote: >>> >>>> Jonathan, >>>> >>>> From what I saw, eventually, you learn about all the nodes in the >>>> network. Every 10 seconds, Pyxida picks a random "Gossip" node from >>>> its >>>> list and sends a request. The Gossip node will reply back with an >>>> "upNeighbor", that is, a "Gossip" node that is "up" or with 10% >>>> probability, a node that is "pending". Since naturally, all the >>>> nodes >>>> bootstrap off the same set of nodes, every node will eventually know >>>> of >>>> every other node. >>>> >>>> Pyxida might not be rotating through everybody continuously >>>> (although >>>> eventually, since it knows everybody, it will have added everybody >>>> to >>>> its Neighbors list at some point), but could be removing a node that >>>> is >>>> "alive". So, if MAX_NEIGHBORS = 32, let's call them A_1, A_2, ..., >>>> A_32. >>>> Pyxida picks a random Gossip node B, sends it a request and B >>>> replies >>>> back. Pyxida will add B to its Neighbors list, see that size of >>>> Neighbors > MAX_NEIGHBORS, kick A_1 out. Then from those 32 >>>> neighbors >>>> A_2, ...., A_32, B, it will calculate the force, and apply it to the >>>> current system coordinates. Thus, if every 10 seconds, Pyxida >>>> picks a >>>> different set of nodes C, D, E, etc... it would keep kicking nodes >>>> out >>>> of the neighbors list. >>>> >>>> It's the "kicking A_1 out" part that seemed a bit confusing. Churn >>>> is a >>>> good reason but you don't check that A_1 might still be alive or >>>> maybe >>>> A_1 was just added. I have been printing the neighbors list every 10 >>>> minutes, and they are completely different between those 10 minutes. >>>> >>>> One possible problem might be that I am choosing the value of >>>> MAX_NEIGHBORS to be too small. Originally it was set to 512 (which >>>> was >>>> meant for Azureus). Since I am using Planetlab where there are about >>>> 400-500 "stable" nodes, I changed the value to 32. But I don't think >>>> that should be a problem since log(n) = log(512) = 9. >>>> >>>> Thanks, >>>> >>>> -Eric >>>> >>>> Jonathan Ledlie wrote: >>>>> >>>>> Eric, >>>>> >>>>> Glad to hear it. Let Peter and me know if you publish anything >>>>> using >>>>> Pyxida and we can put it on the web page. >>>>> >>>>> "Gossip" nodes are for learning about the network. "Neighbors" are >>>>> for >>>>> constructing the coordinate. >>>>> >>>>> So, IIRC, the gossip process learns about nodes that may or may not >>>>> be >>>>> applied to the neighbor list. That is, we may eventually learn >>>>> about a >>>>> large portion of the network, via gossiping, but should only be >>>>> using >>>>> MAX_NEIGHBORS at any given time for coordinate computation. The >>>>> reason >>>>> we don't want to use a fixed set of neighbors for coordinate >>>>> computation >>>>> is because nodes may come and go from the system. However, it >>>>> should >>>>> not be the case that we just rotate through everybody, because >>>>> then, as >>>>> you note, we'd be computing against everyone's coordinate/latency, >>>>> and >>>>> this is counter to the point of network coordinates. With this >>>>> gradual >>>>> transience as a desirable quality in mind, I am pretty sure we >>>>> maintain >>>>> neighbors in a fairly fixed manner. Does this not seem to be the >>>>> case >>>>> to you? >>>>> >>>>> -jonathan >>>>> >>>>> On Oct 29, 2008, at 9:20 PM, Eric Chan-Tin wrote: >>>>> >>>>>> Hi Jonathan and Peter, >>>>>> >>>>>> First of all, just wanted to say that Pyxida has been working >>>>>> great on >>>>>> Planetlab for the past 4 months or so. Getting good data and >>>>>> experiments >>>>>> going well. >>>>>> >>>>>> However, I do have one question/comment. >>>>>> >>>>>> Every Pyxida node contains a list of Neighbors (in NCClient) and a >>>>>> list >>>>>> of Gossip nodes (they are called neighbors in NCManager but I will >>>>>> call >>>>>> them gossip for now so as not to confuse with the 'real' neighbors >>>>>> in >>>>>> NCClient). Every 10 seconds, Pyxida will pick a random node (call >>>>>> it A) >>>>>> from Gossip and send a gossip request (basically asking for rtt, >>>>>> coordinate, error). But Gossip contains all the nodes in the >>>>>> network. >>>>>> The comment in the code said that was to avoid Pyxida becoming >>>>>> lonely >>>>>> and no-one to gossip with, which makes sense. >>>>>> >>>>>> Once A responds, Pyxida will add A to its Neighbors (if Neighbors >>>>>> does >>>>>> not contain A) and remove the first element from the Neighbors >>>>>> list. >>>>>> This seems a little bit odd. Was there any particular reasoning >>>>>> behind >>>>>> this? Because it would seem like you're getting the coordinate >>>>>> from >>>>>> every node in the network eventually. >>>>>> >>>>>> Thanks a lot, >>>>>> >>>>>> -Eric >>>>>> >>>>>> Jonathan Ledlie wrote: >>>>>>> >>>>>>> Your inclination is correct. See forwarded message. >>>>>>> >>>>>>> -jonathan >>>>>>> >>>>>>> On Apr 9, 2008, at 1:53 PM, ext Eric Chan-Tin wrote: >>>>>>> >>>>>>>> Thank you for the clarification. >>>>>>>> >>>>>>>> I've been reading through your published papers and code and I >>>>>>>> must >>>>>>>> say I learned a lot about how Pyxida works :) >>>>>>>> >>>>>>>> I still have a quick and easy (hopefully) question. Each Pyxida >>>>>>>> node >>>>>>>> has a small amount of neighbors and learn about other nodes >>>>>>>> through >>>>>>>> other nodes. I am running on about 200 machines, and it seems >>>>>>>> like all >>>>>>>> the Pyxida nodes are adding each other to their upNeighbors set. >>>>>>>> NCClient.java has the MAX_NEIGHBORS set to 512. >>>>>>>> >>>>>>>> I wanted to make sure that I am not misreading the code, but >>>>>>>> shouldn't >>>>>>>> each Pyxida node keep a small number of neighbors in its >>>>>>>> upNeighbors >>>>>>>> set. Else, it will be a lot of updates to send a gossipMessage >>>>>>>> to all >>>>>>>> 200 neighbors, and it also means that a Pyxida node has the >>>>>>>> whole >>>>>>>> network in its upNeighbors set - that seems wrong. I am inclined >>>>>>>> to >>>>>>>> change the value of MAX_NEIGHBORS but wanted to make sure that >>>>>>>> my >>>>>>>> understanding is correct first. >>>>>>>> >>>>>>>> Thank you, >>>>>>>> >>>>>>>> Eric >>>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> ------------------------------------------------------------------------- >>>>>> This SF.Net email is sponsored by the Moblin Your Move Developer's >>>>>> challenge Build the coolest Linux based applications with Moblin SDK >>>>>> & win great prizes Grand prize is a trip for two to an Open Source >>>>>> event anywhere in the world >>>>>> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >>>>>> _______________________________________________ Pyxida-users mailing >>>>>> list Pyx...@li... >>>>>> https://lists.sourceforge.net/lists/listinfo/pyxida-users >>> >> >> ------------------------------------------------------------------------- >> This SF.Net email is sponsored by the Moblin Your Move Developer's >> challenge >> Build the coolest Linux based applications with Moblin SDK & win >> great prizes >> Grand prize is a trip for two to an Open Source event anywhere in >> the world >> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >> _______________________________________________ >> Pyxida-users mailing list >> Pyx...@li... >> https://lists.sourceforge.net/lists/listinfo/pyxida-users > |
From: Jonathan L. <jon...@no...> - 2008-11-04 14:31:09
|
On Nov 3, 2008, at 3:46 PM, Eric Chan-Tin wrote: > Jonathan, > > I can insert some code to fix this and it doesn't look too hard but > I do > have some concerns. > >> I don't remember observing this behavior in our PL >> deployment, but that could have been because we had the ping time set >> to 1-2 minutes, IIRC. > > The default ping time was set to 10 seconds + some Gaussian > distribution, > so that's what I used, but changing the ping time to 1 minute or > more will > result in the same effect except it will take longer for the "cycle" > to > occur. > > What I have in mind was the following: > > 1) In NCClient.addNeighbor() would be changed to (in pseudocode) > protected boolean addNeighbor(RemoteState<T> guy) { > boolean added = false; > //if neighbors.size() >= MAX_NEIGHBORS > //Check last update time of each neighbor in > neighbors > //neighbor.getLastUpdateTime() > //if any of them is older than some number TIMEOUT, > remove the oldest one, then add "guy" to neighbors > //else don't add "guy" to neighbors > //else add "guy" to neighbors > } > IMO this shouldn't be the behavior of this function because it is called "addNeighbor." Instead what could happen is that part of the performMaintenance() function could remove old neighbors. > 2) In NCManager, getGossipNode() would just pick a random node from > localNC.primaryNC.getNeighbors() to send a request to every 10 > seconds. > > 3) Pyxida will still keep track of all GossipNodes in the network. > It could > also (say every 10 minutes) check its NCClient.neighbors list and > see if > any neighbor is older than TIMEOUT, remove it, then pick a random > GossipNode to send a ping request. > > The main problem with this scheme is that one Pyxida node would most > likely > stay with the same set of neighbors all the time. Its coordinates > would be > greatly affected by those 32 nodes (I am assuming MAX_NEIGHBORS = 32). > Instead of (2), why not have the getNeighborToPing function determine the behavior? If it returns a node, then ping it. If it returns null, then have the NCManager pick some other node to use? If things are hunky-dory, getNeighborToPing() would return a node (a current member of the neighbor set), if we're short on neighbors, then it would return null. The periodic maintenance would have already tossed old nodes, so this would be self-sustaining. The "problem" you mention is fairly central to network coordinates in general. Sure the coordinates will be a little better the more neighbors you have. But in a very large system talking to everybody isn't feasible. Peter and I have a graph in one of the workshop papers which shows the effect of increasing the number of neighbors > I don't think the network would be partitioned, but it could happen. > For > example, the boostrap nodes have each other in their neighbors list. > Every > new node bootstrapping to the network would contact those > bootstrapping > nodes first and add them to their neighbors. So every Pyxida node > could > have as its neighbors the "Bootstrap nodes", which would make Pyxida > become > a centralized system. > As long as the initial bootstrap list is large enough, I think you are right, that partitions won't be a problem. > One last thing: When Pyxida receives a ping request from a node, it > does > not update its coordinates. Is there any particular reason for that? > Is it > because Pyxida will not be able to calculate the RTT? > Yes. > Also, there is no "gossiping" of other nodes' coordinates, right? So > if A > has B in its neighbors list, it will from time to time tell other > nodes > about B's coordinates. I realize this has a lot of complications but > the > original Vivaldi paper was very fuzzy about this point and the > previous > point that it seems it is up to the implementors to decide what to do. > I don't think we are doing this. It could be useful in certain circumstance (e.g., picking potential new neighbors), but we're not, IIRC. > I will have to think some more about this as there are a lot of nasty > little side effects that could happen, but these are my thoughts. > > Thanks, > > -Eric > Sounds good. -jonathan > On Nov 3 2008, Jonathan Ledlie wrote: > >> >> Eric, >> >> Based on a look at the code just now, I see where you are coming >> from: >> with these params on PL, there isn't anything stopping the neighbor >> list from circulating pretty quickly through all nodes. In the >> Azureus implementation, the guys fed to the alg (fed to >> processSample) >> weren't changing much because they were coming from the routing >> finger >> tables. I don't remember observing this behavior in our PL >> deployment, but that could have been because we had the ping time set >> to 1-2 minutes, IIRC. >> >> Do you want to insert some code that will fix this? Basically we >> want >> to only add a new neighbor if our oldest neighbor should be tossed >> (or >> if we haven't reached MAX_NEIGHBORS). It's probably best from a >> clarity standpoint if addNeighbor does continue to actually add the >> neighbor in all cases. Probably the best thing to do is to have >> NCClient drop old neighbors internally and have NCManager ask it >> which >> nodes should be pinged via getNeighborToPing. So NCClient would >> continue to have the same behavior on processSample (it would not >> reject new information) and NCManager and NCClient would cooperate to >> measure RTTs to current neighbors. We do want to make sure, however, >> that the gossip set stays large enough so that partitions don't >> occur. >> >> Do you want to be added to the SourceForge Pyxida developers to make >> this change? (Please test it out carefully first.) >> >> -jonathan >> >> On Oct 30, 2008, at 11:32 AM, ext Eric Chan-Tin wrote: >> >>> Jonathan, >>> >>> From what I saw, eventually, you learn about all the nodes in the >>> network. Every 10 seconds, Pyxida picks a random "Gossip" node from >>> its >>> list and sends a request. The Gossip node will reply back with an >>> "upNeighbor", that is, a "Gossip" node that is "up" or with 10% >>> probability, a node that is "pending". Since naturally, all the >>> nodes >>> bootstrap off the same set of nodes, every node will eventually know >>> of >>> every other node. >>> >>> Pyxida might not be rotating through everybody continuously >>> (although >>> eventually, since it knows everybody, it will have added everybody >>> to >>> its Neighbors list at some point), but could be removing a node that >>> is >>> "alive". So, if MAX_NEIGHBORS = 32, let's call them A_1, A_2, ..., >>> A_32. >>> Pyxida picks a random Gossip node B, sends it a request and B >>> replies >>> back. Pyxida will add B to its Neighbors list, see that size of >>> Neighbors > MAX_NEIGHBORS, kick A_1 out. Then from those 32 >>> neighbors >>> A_2, ...., A_32, B, it will calculate the force, and apply it to the >>> current system coordinates. Thus, if every 10 seconds, Pyxida >>> picks a >>> different set of nodes C, D, E, etc... it would keep kicking nodes >>> out >>> of the neighbors list. >>> >>> It's the "kicking A_1 out" part that seemed a bit confusing. Churn >>> is a >>> good reason but you don't check that A_1 might still be alive or >>> maybe >>> A_1 was just added. I have been printing the neighbors list every 10 >>> minutes, and they are completely different between those 10 minutes. >>> >>> One possible problem might be that I am choosing the value of >>> MAX_NEIGHBORS to be too small. Originally it was set to 512 (which >>> was >>> meant for Azureus). Since I am using Planetlab where there are about >>> 400-500 "stable" nodes, I changed the value to 32. But I don't think >>> that should be a problem since log(n) = log(512) = 9. >>> >>> Thanks, >>> >>> -Eric >>> >>> Jonathan Ledlie wrote: >>>> >>>> Eric, >>>> >>>> Glad to hear it. Let Peter and me know if you publish anything >>>> using >>>> Pyxida and we can put it on the web page. >>>> >>>> "Gossip" nodes are for learning about the network. "Neighbors" are >>>> for >>>> constructing the coordinate. >>>> >>>> So, IIRC, the gossip process learns about nodes that may or may not >>>> be >>>> applied to the neighbor list. That is, we may eventually learn >>>> about a >>>> large portion of the network, via gossiping, but should only be >>>> using >>>> MAX_NEIGHBORS at any given time for coordinate computation. The >>>> reason >>>> we don't want to use a fixed set of neighbors for coordinate >>>> computation >>>> is because nodes may come and go from the system. However, it >>>> should >>>> not be the case that we just rotate through everybody, because >>>> then, as >>>> you note, we'd be computing against everyone's coordinate/latency, >>>> and >>>> this is counter to the point of network coordinates. With this >>>> gradual >>>> transience as a desirable quality in mind, I am pretty sure we >>>> maintain >>>> neighbors in a fairly fixed manner. Does this not seem to be the >>>> case >>>> to you? >>>> >>>> -jonathan >>>> >>>> On Oct 29, 2008, at 9:20 PM, Eric Chan-Tin wrote: >>>> >>>>> Hi Jonathan and Peter, >>>>> >>>>> First of all, just wanted to say that Pyxida has been working >>>>> great on >>>>> Planetlab for the past 4 months or so. Getting good data and >>>>> experiments >>>>> going well. >>>>> >>>>> However, I do have one question/comment. >>>>> >>>>> Every Pyxida node contains a list of Neighbors (in NCClient) and a >>>>> list >>>>> of Gossip nodes (they are called neighbors in NCManager but I will >>>>> call >>>>> them gossip for now so as not to confuse with the 'real' neighbors >>>>> in >>>>> NCClient). Every 10 seconds, Pyxida will pick a random node (call >>>>> it A) >>>>> from Gossip and send a gossip request (basically asking for rtt, >>>>> coordinate, error). But Gossip contains all the nodes in the >>>>> network. >>>>> The comment in the code said that was to avoid Pyxida becoming >>>>> lonely >>>>> and no-one to gossip with, which makes sense. >>>>> >>>>> Once A responds, Pyxida will add A to its Neighbors (if Neighbors >>>>> does >>>>> not contain A) and remove the first element from the Neighbors >>>>> list. >>>>> This seems a little bit odd. Was there any particular reasoning >>>>> behind >>>>> this? Because it would seem like you're getting the coordinate >>>>> from >>>>> every node in the network eventually. >>>>> >>>>> Thanks a lot, >>>>> >>>>> -Eric >>>>> >>>>> Jonathan Ledlie wrote: >>>>>> >>>>>> Your inclination is correct. See forwarded message. >>>>>> >>>>>> -jonathan >>>>>> >>>>>> On Apr 9, 2008, at 1:53 PM, ext Eric Chan-Tin wrote: >>>>>> >>>>>>> Thank you for the clarification. >>>>>>> >>>>>>> I've been reading through your published papers and code and I >>>>>>> must >>>>>>> say I learned a lot about how Pyxida works :) >>>>>>> >>>>>>> I still have a quick and easy (hopefully) question. Each Pyxida >>>>>>> node >>>>>>> has a small amount of neighbors and learn about other nodes >>>>>>> through >>>>>>> other nodes. I am running on about 200 machines, and it seems >>>>>>> like all >>>>>>> the Pyxida nodes are adding each other to their upNeighbors set. >>>>>>> NCClient.java has the MAX_NEIGHBORS set to 512. >>>>>>> >>>>>>> I wanted to make sure that I am not misreading the code, but >>>>>>> shouldn't >>>>>>> each Pyxida node keep a small number of neighbors in its >>>>>>> upNeighbors >>>>>>> set. Else, it will be a lot of updates to send a gossipMessage >>>>>>> to all >>>>>>> 200 neighbors, and it also means that a Pyxida node has the >>>>>>> whole >>>>>>> network in its upNeighbors set - that seems wrong. I am inclined >>>>>>> to >>>>>>> change the value of MAX_NEIGHBORS but wanted to make sure that >>>>>>> my >>>>>>> understanding is correct first. >>>>>>> >>>>>>> Thank you, >>>>>>> >>>>>>> Eric >>>>>>> >>>>> >>>>> >>>>> >>>>> ------------------------------------------------------------------------- >>>>> This SF.Net email is sponsored by the Moblin Your Move Developer's >>>>> challenge Build the coolest Linux based applications with Moblin >>>>> SDK & >>>>> win great prizes Grand prize is a trip for two to an Open Source >>>>> event >>>>> anywhere in the world >>>>> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >>>>> _______________________________________________ Pyxida-users >>>>> mailing >>>>> list Pyx...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/pyxida-users >> > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's > challenge > Build the coolest Linux based applications with Moblin SDK & win > great prizes > Grand prize is a trip for two to an Open Source event anywhere in > the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Pyxida-users mailing list > Pyx...@li... > https://lists.sourceforge.net/lists/listinfo/pyxida-users |
From: Eric Chan-T. <dch...@cs...> - 2008-11-03 20:46:21
|
Jonathan, I can insert some code to fix this and it doesn't look too hard but I do have some concerns. >I don't remember observing this behavior in our PL >deployment, but that could have been because we had the ping time set >to 1-2 minutes, IIRC. The default ping time was set to 10 seconds + some Gaussian distribution, so that's what I used, but changing the ping time to 1 minute or more will result in the same effect except it will take longer for the "cycle" to occur. What I have in mind was the following: 1) In NCClient.addNeighbor() would be changed to (in pseudocode) protected boolean addNeighbor(RemoteState<T> guy) { boolean added = false; //if neighbors.size() >= MAX_NEIGHBORS //Check last update time of each neighbor in neighbors //neighbor.getLastUpdateTime() //if any of them is older than some number TIMEOUT, remove the oldest one, then add "guy" to neighbors //else don't add "guy" to neighbors //else add "guy" to neighbors } 2) In NCManager, getGossipNode() would just pick a random node from localNC.primaryNC.getNeighbors() to send a request to every 10 seconds. 3) Pyxida will still keep track of all GossipNodes in the network. It could also (say every 10 minutes) check its NCClient.neighbors list and see if any neighbor is older than TIMEOUT, remove it, then pick a random GossipNode to send a ping request. The main problem with this scheme is that one Pyxida node would most likely stay with the same set of neighbors all the time. Its coordinates would be greatly affected by those 32 nodes (I am assuming MAX_NEIGHBORS = 32). I don't think the network would be partitioned, but it could happen. For example, the boostrap nodes have each other in their neighbors list. Every new node bootstrapping to the network would contact those bootstrapping nodes first and add them to their neighbors. So every Pyxida node could have as its neighbors the "Bootstrap nodes", which would make Pyxida become a centralized system. One last thing: When Pyxida receives a ping request from a node, it does not update its coordinates. Is there any particular reason for that? Is it because Pyxida will not be able to calculate the RTT? Also, there is no "gossiping" of other nodes' coordinates, right? So if A has B in its neighbors list, it will from time to time tell other nodes about B's coordinates. I realize this has a lot of complications but the original Vivaldi paper was very fuzzy about this point and the previous point that it seems it is up to the implementors to decide what to do. I will have to think some more about this as there are a lot of nasty little side effects that could happen, but these are my thoughts. Thanks, -Eric On Nov 3 2008, Jonathan Ledlie wrote: > >Eric, > >Based on a look at the code just now, I see where you are coming from: >with these params on PL, there isn't anything stopping the neighbor >list from circulating pretty quickly through all nodes. In the >Azureus implementation, the guys fed to the alg (fed to processSample) >weren't changing much because they were coming from the routing finger >tables. I don't remember observing this behavior in our PL >deployment, but that could have been because we had the ping time set >to 1-2 minutes, IIRC. > >Do you want to insert some code that will fix this? Basically we want >to only add a new neighbor if our oldest neighbor should be tossed (or >if we haven't reached MAX_NEIGHBORS). It's probably best from a >clarity standpoint if addNeighbor does continue to actually add the >neighbor in all cases. Probably the best thing to do is to have >NCClient drop old neighbors internally and have NCManager ask it which >nodes should be pinged via getNeighborToPing. So NCClient would >continue to have the same behavior on processSample (it would not >reject new information) and NCManager and NCClient would cooperate to >measure RTTs to current neighbors. We do want to make sure, however, >that the gossip set stays large enough so that partitions don't occur. > >Do you want to be added to the SourceForge Pyxida developers to make >this change? (Please test it out carefully first.) > >-jonathan > >On Oct 30, 2008, at 11:32 AM, ext Eric Chan-Tin wrote: > >> Jonathan, >> >> From what I saw, eventually, you learn about all the nodes in the >> network. Every 10 seconds, Pyxida picks a random "Gossip" node from >> its >> list and sends a request. The Gossip node will reply back with an >> "upNeighbor", that is, a "Gossip" node that is "up" or with 10% >> probability, a node that is "pending". Since naturally, all the nodes >> bootstrap off the same set of nodes, every node will eventually know >> of >> every other node. >> >> Pyxida might not be rotating through everybody continuously (although >> eventually, since it knows everybody, it will have added everybody to >> its Neighbors list at some point), but could be removing a node that >> is >> "alive". So, if MAX_NEIGHBORS = 32, let's call them A_1, A_2, ..., >> A_32. >> Pyxida picks a random Gossip node B, sends it a request and B replies >> back. Pyxida will add B to its Neighbors list, see that size of >> Neighbors > MAX_NEIGHBORS, kick A_1 out. Then from those 32 neighbors >> A_2, ...., A_32, B, it will calculate the force, and apply it to the >> current system coordinates. Thus, if every 10 seconds, Pyxida picks a >> different set of nodes C, D, E, etc... it would keep kicking nodes out >> of the neighbors list. >> >> It's the "kicking A_1 out" part that seemed a bit confusing. Churn >> is a >> good reason but you don't check that A_1 might still be alive or maybe >> A_1 was just added. I have been printing the neighbors list every 10 >> minutes, and they are completely different between those 10 minutes. >> >> One possible problem might be that I am choosing the value of >> MAX_NEIGHBORS to be too small. Originally it was set to 512 (which was >> meant for Azureus). Since I am using Planetlab where there are about >> 400-500 "stable" nodes, I changed the value to 32. But I don't think >> that should be a problem since log(n) = log(512) = 9. >> >> Thanks, >> >> -Eric >> >> Jonathan Ledlie wrote: >>> >>> Eric, >>> >>> Glad to hear it. Let Peter and me know if you publish anything using >>> Pyxida and we can put it on the web page. >>> >>> "Gossip" nodes are for learning about the network. "Neighbors" are >>> for >>> constructing the coordinate. >>> >>> So, IIRC, the gossip process learns about nodes that may or may not >>> be >>> applied to the neighbor list. That is, we may eventually learn >>> about a >>> large portion of the network, via gossiping, but should only be using >>> MAX_NEIGHBORS at any given time for coordinate computation. The >>> reason >>> we don't want to use a fixed set of neighbors for coordinate >>> computation >>> is because nodes may come and go from the system. However, it should >>> not be the case that we just rotate through everybody, because >>> then, as >>> you note, we'd be computing against everyone's coordinate/latency, >>> and >>> this is counter to the point of network coordinates. With this >>> gradual >>> transience as a desirable quality in mind, I am pretty sure we >>> maintain >>> neighbors in a fairly fixed manner. Does this not seem to be the >>> case >>> to you? >>> >>> -jonathan >>> >>> On Oct 29, 2008, at 9:20 PM, Eric Chan-Tin wrote: >>> >>>> Hi Jonathan and Peter, >>>> >>>> First of all, just wanted to say that Pyxida has been working >>>> great on >>>> Planetlab for the past 4 months or so. Getting good data and >>>> experiments >>>> going well. >>>> >>>> However, I do have one question/comment. >>>> >>>> Every Pyxida node contains a list of Neighbors (in NCClient) and a >>>> list >>>> of Gossip nodes (they are called neighbors in NCManager but I will >>>> call >>>> them gossip for now so as not to confuse with the 'real' neighbors >>>> in >>>> NCClient). Every 10 seconds, Pyxida will pick a random node (call >>>> it A) >>>> from Gossip and send a gossip request (basically asking for rtt, >>>> coordinate, error). But Gossip contains all the nodes in the >>>> network. >>>> The comment in the code said that was to avoid Pyxida becoming >>>> lonely >>>> and no-one to gossip with, which makes sense. >>>> >>>> Once A responds, Pyxida will add A to its Neighbors (if Neighbors >>>> does >>>> not contain A) and remove the first element from the Neighbors list. >>>> This seems a little bit odd. Was there any particular reasoning >>>> behind >>>> this? Because it would seem like you're getting the coordinate from >>>> every node in the network eventually. >>>> >>>> Thanks a lot, >>>> >>>> -Eric >>>> >>>> Jonathan Ledlie wrote: >>>>> >>>>> Your inclination is correct. See forwarded message. >>>>> >>>>> -jonathan >>>>> >>>>> On Apr 9, 2008, at 1:53 PM, ext Eric Chan-Tin wrote: >>>>> >>>>>> Thank you for the clarification. >>>>>> >>>>>> I've been reading through your published papers and code and I >>>>>> must >>>>>> say I learned a lot about how Pyxida works :) >>>>>> >>>>>> I still have a quick and easy (hopefully) question. Each Pyxida >>>>>> node >>>>>> has a small amount of neighbors and learn about other nodes >>>>>> through >>>>>> other nodes. I am running on about 200 machines, and it seems >>>>>> like all >>>>>> the Pyxida nodes are adding each other to their upNeighbors set. >>>>>> NCClient.java has the MAX_NEIGHBORS set to 512. >>>>>> >>>>>> I wanted to make sure that I am not misreading the code, but >>>>>> shouldn't >>>>>> each Pyxida node keep a small number of neighbors in its >>>>>> upNeighbors >>>>>> set. Else, it will be a lot of updates to send a gossipMessage >>>>>> to all >>>>>> 200 neighbors, and it also means that a Pyxida node has the whole >>>>>> network in its upNeighbors set - that seems wrong. I am inclined >>>>>> to >>>>>> change the value of MAX_NEIGHBORS but wanted to make sure that my >>>>>> understanding is correct first. >>>>>> >>>>>> Thank you, >>>>>> >>>>>> Eric >>>>>> >>>> >>>> >>>> >>>> ------------------------------------------------------------------------- >>>> This SF.Net email is sponsored by the Moblin Your Move Developer's >>>> challenge Build the coolest Linux based applications with Moblin SDK & >>>> win great prizes Grand prize is a trip for two to an Open Source event >>>> anywhere in the world >>>> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >>>> _______________________________________________ Pyxida-users mailing >>>> list Pyx...@li... >>>> https://lists.sourceforge.net/lists/listinfo/pyxida-users > |
From: Jonathan L. <jon...@no...> - 2008-11-03 11:46:30
|
Eric, Based on a look at the code just now, I see where you are coming from: with these params on PL, there isn't anything stopping the neighbor list from circulating pretty quickly through all nodes. In the Azureus implementation, the guys fed to the alg (fed to processSample) weren't changing much because they were coming from the routing finger tables. I don't remember observing this behavior in our PL deployment, but that could have been because we had the ping time set to 1-2 minutes, IIRC. Do you want to insert some code that will fix this? Basically we want to only add a new neighbor if our oldest neighbor should be tossed (or if we haven't reached MAX_NEIGHBORS). It's probably best from a clarity standpoint if addNeighbor does continue to actually add the neighbor in all cases. Probably the best thing to do is to have NCClient drop old neighbors internally and have NCManager ask it which nodes should be pinged via getNeighborToPing. So NCClient would continue to have the same behavior on processSample (it would not reject new information) and NCManager and NCClient would cooperate to measure RTTs to current neighbors. We do want to make sure, however, that the gossip set stays large enough so that partitions don't occur. Do you want to be added to the SourceForge Pyxida developers to make this change? (Please test it out carefully first.) -jonathan On Oct 30, 2008, at 11:32 AM, ext Eric Chan-Tin wrote: > Jonathan, > > From what I saw, eventually, you learn about all the nodes in the > network. Every 10 seconds, Pyxida picks a random "Gossip" node from > its > list and sends a request. The Gossip node will reply back with an > "upNeighbor", that is, a "Gossip" node that is "up" or with 10% > probability, a node that is "pending". Since naturally, all the nodes > bootstrap off the same set of nodes, every node will eventually know > of > every other node. > > Pyxida might not be rotating through everybody continuously (although > eventually, since it knows everybody, it will have added everybody to > its Neighbors list at some point), but could be removing a node that > is > "alive". So, if MAX_NEIGHBORS = 32, let's call them A_1, A_2, ..., > A_32. > Pyxida picks a random Gossip node B, sends it a request and B replies > back. Pyxida will add B to its Neighbors list, see that size of > Neighbors > MAX_NEIGHBORS, kick A_1 out. Then from those 32 neighbors > A_2, ...., A_32, B, it will calculate the force, and apply it to the > current system coordinates. Thus, if every 10 seconds, Pyxida picks a > different set of nodes C, D, E, etc... it would keep kicking nodes out > of the neighbors list. > > It's the "kicking A_1 out" part that seemed a bit confusing. Churn > is a > good reason but you don't check that A_1 might still be alive or maybe > A_1 was just added. I have been printing the neighbors list every 10 > minutes, and they are completely different between those 10 minutes. > > One possible problem might be that I am choosing the value of > MAX_NEIGHBORS to be too small. Originally it was set to 512 (which was > meant for Azureus). Since I am using Planetlab where there are about > 400-500 "stable" nodes, I changed the value to 32. But I don't think > that should be a problem since log(n) = log(512) = 9. > > Thanks, > > -Eric > > Jonathan Ledlie wrote: >> >> Eric, >> >> Glad to hear it. Let Peter and me know if you publish anything using >> Pyxida and we can put it on the web page. >> >> "Gossip" nodes are for learning about the network. "Neighbors" are >> for >> constructing the coordinate. >> >> So, IIRC, the gossip process learns about nodes that may or may not >> be >> applied to the neighbor list. That is, we may eventually learn >> about a >> large portion of the network, via gossiping, but should only be using >> MAX_NEIGHBORS at any given time for coordinate computation. The >> reason >> we don't want to use a fixed set of neighbors for coordinate >> computation >> is because nodes may come and go from the system. However, it should >> not be the case that we just rotate through everybody, because >> then, as >> you note, we'd be computing against everyone's coordinate/latency, >> and >> this is counter to the point of network coordinates. With this >> gradual >> transience as a desirable quality in mind, I am pretty sure we >> maintain >> neighbors in a fairly fixed manner. Does this not seem to be the >> case >> to you? >> >> -jonathan >> >> On Oct 29, 2008, at 9:20 PM, Eric Chan-Tin wrote: >> >>> Hi Jonathan and Peter, >>> >>> First of all, just wanted to say that Pyxida has been working >>> great on >>> Planetlab for the past 4 months or so. Getting good data and >>> experiments >>> going well. >>> >>> However, I do have one question/comment. >>> >>> Every Pyxida node contains a list of Neighbors (in NCClient) and a >>> list >>> of Gossip nodes (they are called neighbors in NCManager but I will >>> call >>> them gossip for now so as not to confuse with the 'real' neighbors >>> in >>> NCClient). Every 10 seconds, Pyxida will pick a random node (call >>> it A) >>> from Gossip and send a gossip request (basically asking for rtt, >>> coordinate, error). But Gossip contains all the nodes in the >>> network. >>> The comment in the code said that was to avoid Pyxida becoming >>> lonely >>> and no-one to gossip with, which makes sense. >>> >>> Once A responds, Pyxida will add A to its Neighbors (if Neighbors >>> does >>> not contain A) and remove the first element from the Neighbors list. >>> This seems a little bit odd. Was there any particular reasoning >>> behind >>> this? Because it would seem like you're getting the coordinate from >>> every node in the network eventually. >>> >>> Thanks a lot, >>> >>> -Eric >>> >>> Jonathan Ledlie wrote: >>>> >>>> Your inclination is correct. See forwarded message. >>>> >>>> -jonathan >>>> >>>> On Apr 9, 2008, at 1:53 PM, ext Eric Chan-Tin wrote: >>>> >>>>> Thank you for the clarification. >>>>> >>>>> I've been reading through your published papers and code and I >>>>> must >>>>> say I learned a lot about how Pyxida works :) >>>>> >>>>> I still have a quick and easy (hopefully) question. Each Pyxida >>>>> node >>>>> has a small amount of neighbors and learn about other nodes >>>>> through >>>>> other nodes. I am running on about 200 machines, and it seems >>>>> like all >>>>> the Pyxida nodes are adding each other to their upNeighbors set. >>>>> NCClient.java has the MAX_NEIGHBORS set to 512. >>>>> >>>>> I wanted to make sure that I am not misreading the code, but >>>>> shouldn't >>>>> each Pyxida node keep a small number of neighbors in its >>>>> upNeighbors >>>>> set. Else, it will be a lot of updates to send a gossipMessage >>>>> to all >>>>> 200 neighbors, and it also means that a Pyxida node has the whole >>>>> network in its upNeighbors set - that seems wrong. I am inclined >>>>> to >>>>> change the value of MAX_NEIGHBORS but wanted to make sure that my >>>>> understanding is correct first. >>>>> >>>>> Thank you, >>>>> >>>>> Eric >>>>> >>> >>> ------------------------------------------------------------------------- >>> This SF.Net email is sponsored by the Moblin Your Move Developer's >>> challenge >>> Build the coolest Linux based applications with Moblin SDK & win >>> great >>> prizes >>> Grand prize is a trip for two to an Open Source event anywhere in >>> the >>> world >>> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >>> _______________________________________________ >>> Pyxida-users mailing list >>> Pyx...@li... >>> https://lists.sourceforge.net/lists/listinfo/pyxida-users |
From: Eric Chan-T. <dch...@cs...> - 2008-10-30 15:32:08
|
Jonathan, >From what I saw, eventually, you learn about all the nodes in the network. Every 10 seconds, Pyxida picks a random "Gossip" node from its list and sends a request. The Gossip node will reply back with an "upNeighbor", that is, a "Gossip" node that is "up" or with 10% probability, a node that is "pending". Since naturally, all the nodes bootstrap off the same set of nodes, every node will eventually know of every other node. Pyxida might not be rotating through everybody continuously (although eventually, since it knows everybody, it will have added everybody to its Neighbors list at some point), but could be removing a node that is "alive". So, if MAX_NEIGHBORS = 32, let's call them A_1, A_2, ..., A_32. Pyxida picks a random Gossip node B, sends it a request and B replies back. Pyxida will add B to its Neighbors list, see that size of Neighbors > MAX_NEIGHBORS, kick A_1 out. Then from those 32 neighbors A_2, ...., A_32, B, it will calculate the force, and apply it to the current system coordinates. Thus, if every 10 seconds, Pyxida picks a different set of nodes C, D, E, etc... it would keep kicking nodes out of the neighbors list. It's the "kicking A_1 out" part that seemed a bit confusing. Churn is a good reason but you don't check that A_1 might still be alive or maybe A_1 was just added. I have been printing the neighbors list every 10 minutes, and they are completely different between those 10 minutes. One possible problem might be that I am choosing the value of MAX_NEIGHBORS to be too small. Originally it was set to 512 (which was meant for Azureus). Since I am using Planetlab where there are about 400-500 "stable" nodes, I changed the value to 32. But I don't think that should be a problem since log(n) = log(512) = 9. Thanks, -Eric Jonathan Ledlie wrote: > > Eric, > > Glad to hear it. Let Peter and me know if you publish anything using > Pyxida and we can put it on the web page. > > "Gossip" nodes are for learning about the network. "Neighbors" are for > constructing the coordinate. > > So, IIRC, the gossip process learns about nodes that may or may not be > applied to the neighbor list. That is, we may eventually learn about a > large portion of the network, via gossiping, but should only be using > MAX_NEIGHBORS at any given time for coordinate computation. The reason > we don't want to use a fixed set of neighbors for coordinate computation > is because nodes may come and go from the system. However, it should > not be the case that we just rotate through everybody, because then, as > you note, we'd be computing against everyone's coordinate/latency, and > this is counter to the point of network coordinates. With this gradual > transience as a desirable quality in mind, I am pretty sure we maintain > neighbors in a fairly fixed manner. Does this not seem to be the case > to you? > > -jonathan > > On Oct 29, 2008, at 9:20 PM, Eric Chan-Tin wrote: > >> Hi Jonathan and Peter, >> >> First of all, just wanted to say that Pyxida has been working great on >> Planetlab for the past 4 months or so. Getting good data and experiments >> going well. >> >> However, I do have one question/comment. >> >> Every Pyxida node contains a list of Neighbors (in NCClient) and a list >> of Gossip nodes (they are called neighbors in NCManager but I will call >> them gossip for now so as not to confuse with the 'real' neighbors in >> NCClient). Every 10 seconds, Pyxida will pick a random node (call it A) >> from Gossip and send a gossip request (basically asking for rtt, >> coordinate, error). But Gossip contains all the nodes in the network. >> The comment in the code said that was to avoid Pyxida becoming lonely >> and no-one to gossip with, which makes sense. >> >> Once A responds, Pyxida will add A to its Neighbors (if Neighbors does >> not contain A) and remove the first element from the Neighbors list. >> This seems a little bit odd. Was there any particular reasoning behind >> this? Because it would seem like you're getting the coordinate from >> every node in the network eventually. >> >> Thanks a lot, >> >> -Eric >> >> Jonathan Ledlie wrote: >>> >>> Your inclination is correct. See forwarded message. >>> >>> -jonathan >>> >>> On Apr 9, 2008, at 1:53 PM, ext Eric Chan-Tin wrote: >>> >>>> Thank you for the clarification. >>>> >>>> I've been reading through your published papers and code and I must >>>> say I learned a lot about how Pyxida works :) >>>> >>>> I still have a quick and easy (hopefully) question. Each Pyxida node >>>> has a small amount of neighbors and learn about other nodes through >>>> other nodes. I am running on about 200 machines, and it seems like all >>>> the Pyxida nodes are adding each other to their upNeighbors set. >>>> NCClient.java has the MAX_NEIGHBORS set to 512. >>>> >>>> I wanted to make sure that I am not misreading the code, but shouldn't >>>> each Pyxida node keep a small number of neighbors in its upNeighbors >>>> set. Else, it will be a lot of updates to send a gossipMessage to all >>>> 200 neighbors, and it also means that a Pyxida node has the whole >>>> network in its upNeighbors set - that seems wrong. I am inclined to >>>> change the value of MAX_NEIGHBORS but wanted to make sure that my >>>> understanding is correct first. >>>> >>>> Thank you, >>>> >>>> Eric >>>> >> >> ------------------------------------------------------------------------- >> This SF.Net email is sponsored by the Moblin Your Move Developer's >> challenge >> Build the coolest Linux based applications with Moblin SDK & win great >> prizes >> Grand prize is a trip for two to an Open Source event anywhere in the >> world >> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >> _______________________________________________ >> Pyxida-users mailing list >> Pyx...@li... >> https://lists.sourceforge.net/lists/listinfo/pyxida-users |