From: Jonathan L. <jon...@ee...> - 2008-12-19 01:23:12
|
Eric, I'm getting an error (an old favorite): WA 1229649353501 NetUtil : Could not deserialize java object: java.io.StreamCorruptedException: invalid type code: CC WA 1229649353501 ObjComm$ReadConnHa: No valid msgObject received from remoteAddr=planet1.scs.stanford.edu:26500. Ignoring. What java version are you running on PL? -jonathan On Dec 18, 2008, at 4:49 PM, Eric Chan-Tin wrote: > Do you mean the Pyxida code running on PlanetLab? I killed them after > running it overnight. I am currently restarting the experiments. > Try again > in about 15 minutes and it should be ready. > > Fix1 is on port 28500 (and api.port 28501). Fix2 is on port 26500 > (api.port > 26501). The node you can contact is "planet1.scs.stanford.edu" or > "planetlab01.cs.washington.edu". > > About your other email: Yes I know. The fix was to go through the list > again, but I didn't want to implement that since it seems to be > working > fine. So go through the list once to calculate weight, and add > neighbors > into another list (sorted by weights). Then go through that new > list. The > new list will be sorted by higher weights, so older neighbors will be > "tested" first. If you want it that way, I can do that. However, I > do not > think we need to. Let's assume iteration 0, the first node gets > picked. At > iteration 1, that first node will have a very small probability of > getting > picked again. > > -Eric > > > On Dec 18 2008, Jonathan Ledlie wrote: > >> >> Eric, >> >> Do you have a PL Pyxida slice that I can connect to to try this out? >> >> That is, I'll run it on my machine, but connect to your slice via the >> right ports. >> >> -jonathan >> >> On Dec 18, 2008, at 11:35 AM, Eric Chan-Tin wrote: >> >>> Sorry for the delay. Was busy with final exams and projects. >>> >>> I made the changes. They can be found at >>> >>> http://www-users.cs.umn.edu/~dchantin/NCClient.java >>> http://www-users.cs.umn.edu/~dchantin/NCManager.java >>> http://www-users.cs.umn.edu/~dchantin/RemoteState.java >>> >>> All the changes are enclosed within >>> //ERIC START >>> ... >>> //ERIC END >>> >>> I compared every reported "neighbors list" of each Pyxida node >>> (about 370 >>> total) and see how many neighbors did not change within 10 minutes. >>> Then >>> did the average for all the nodes. >>> >>> Original: 43.37% of the neighbors stayed the same on average. >>> >>> Fix0 (before we found the interesting situation): 74.73% >>> >>> Fix1 (with the weight probability of picking a neighbor to ping): >>> 95.79% >>> >>> Fix2 (with the weight probability and the MIN_UPDATE_PING): 98.81% >>> >>> Also, the reported error, rrl, etc.. were much lower in the Fixed >>> Pyxida >>> than in the original one (this could not be significant, since the >>> Original >>> experiment was performed 1-2 months ago), but as expected, it was >>> not >>> higher. >>> >>> You can take a look at the code and test it if you want. If >>> everything >>> looks fine, I'll remove some of the debugging code and other >>> "useless" >>> comments (like //ERIC). Also, let me know which "fix" you prefer. I >>> will >>> then email you the updated code. >>> >>> Thanks, >>> >>> -Eric >>> >>> >>> On Dec 10 2008, Jonathan Ledlie wrote: >>> >>>> >>>> Seems reasonable. >>>> >>>> Both ideas (lastPingTime and a weighted probability based on age) >>>> seem >>>> to be independent. >>>> >>>> Some of the code is more CPU intensive than it should be (e.g., >>>> looping through potentially long lists), and that is a slight >>>> downside >>>> to the function approach vs a set, but this probably isn't a big >>>> deal. >>>> >>>> -jonathan >>>> >>>> On Dec 10, 2008, at 10:40 PM, Eric Chan-Tin wrote: >>>> >>>>> okay. I think I might have misunderstood you. Your idea of >>>>> lastPingTime >>>>> looks interesting. >>>>> >>>>> My idea is to introduce a probability function. Right now, it just >>>>> randomly picks one of the neighbors. What I am thinking it might >>>>> do is >>>>> that instead of just randomly picking a neighbor, pick a neighbor >>>>> based >>>>> on some function. So, the neighbor whose lastUpdateTime is oldest >>>>> will >>>>> have a higher chance of being picked than a neighbor who was just >>>>> updated. >>>>> >>>>> Does that make more sense? >>>>> >>>>> -Eric >>>>> >>>>> Jonathan Ledlie wrote: >>>>>> >>>>>> Hmm. The advantage of lastPingTime is that you learn whether or >>>>>> not >>>>>> you've tried to contact a neighbor and they haven't responded. >>>>>> Then you >>>>>> can use that knowledge to determine whether or not to toss them. >>>>>> >>>>>> Maybe I'm not understanding your suggestion then. >>>>>> >>>>>> -jonathan >>>>>> >>>>>> On Dec 10, 2008, at 12:51 PM, Eric Chan-Tin wrote: >>>>>> >>>>>>> That's a good idea. Here is a little twist on it. >>>>>>> >>>>>>> Put a "weight" on each neighbor, based on the lastUpdateTime. So >>>>>>> if >>>>>>> currentTime - neighbor's lastUpdateTime, then the weight is >>>>>>> higher. >>>>>>> Then in getNeighborToPing, neighbor's with higher weight have a >>>>>>> higher >>>>>>> probability of getting picked than neighbor's with lower weight. >>>>>>> >>>>>>> The weight could be ( (currentTime - lastUpdateTime) / 15 >>>>>>> minutes). >>>>>>> It's not 10 minutes so as to avoid a probability of 1 if the >>>>>>> lastUpdateTime was the previous performMaintenance. This should >>>>>>> also >>>>>>> avoid the problem of always pinging the oldest neighbor since >>>>>>> it's >>>>>>> about probability (weight). >>>>>>> >>>>>>> It seems a little bit less complicated than having lastPingTime >>>>>>> and >>>>>>> MIN_UPDATE_TIME. >>>>>>> >>>>>>> -Eric >>>>>>> >>>>>>> On Dec 10 2008, Jonathan Ledlie wrote: >>>>>>> >>>>>>>> >>>>>>>> This is an interesting situation. Let's think about the goal. >>>>>>>> The >>>>>>>> goal is to keep pinging the same nodes as long as they are >>>>>>>> alive, >>>>>>>> and to replace a node soon after it dies, keeping the total >>>>>>>> number >>>>>>>> of up neighbors at approximately MAX_NEIGHBORS. >>>>>>>> >>>>>>>> How about the following: >>>>>>>> - when you pick a neighbor to ping, you set a time on that >>>>>>>> event >>>>>>>> (in >>>>>>>> a new per-neighbor timestamp LastPingTime) >>>>>>>> - when you first insert a neighbor, set this timestamp to 0. >>>>>>>> - when you loop in performMaintenance, remove nodes whose >>>>>>>> LastPingTime - LastUpdateTime > threshold (like 10 minutes) >>>>>>>> >>>>>>>> - when picking a node, pick the guy with the oldest >>>>>>>> LastPingTime >>>>>>>> (not sure about this one -- I don't like situations where you >>>>>>>> pick >>>>>>>> the absolute *oldest* because then we could get in a situation >>>>>>>> where >>>>>>>> an old guy sits there, waiting to be kicked out, and we keep >>>>>>>> picking >>>>>>>> him to be pinged; I'd prefer to pick randomly, or random older >>>>>>>> guys). >>>>>>>> - a simple way around this is to simply have a self-adjusting >>>>>>>> threshold: so, we walk through the nodes, and everyone whose >>>>>>>> LastPingTime is older than MIN_UPDATE_TIME_TO_PING ago (a >>>>>>>> relative >>>>>>>> time) get added to a set; we pick a guy randomly from the >>>>>>>> set to >>>>>>>> ping. If the set is empty, we increase >>>>>>>> MIN_UPDATE_TIME_TO_PING, >>>>>>>> and >>>>>>>> wait for the next time the function is called. If the set >>>>>>>> is too >>>>>>>> large (relative to a fraction of MAX_NEIGHBORS), we decrease >>>>>>>> it. >>>>>>>> >>>>>>>> We still might get some churn from lost UDP packets. You might >>>>>>>> also >>>>>>>> give nodes a second chance to be pinged by including >>>>>>>> LastPingTime - >>>>>>>> LastUpdateTime > 1 minute in the set. Note that increases and >>>>>>>> decreases in MIN_UPDATE_TIME_TO_PING shouldn't be affected >>>>>>>> by the >>>>>>>> number of nodes that fall into this category. >>>>>>>> >>>>>>>> We still probably need a way for new potential neighbors to be >>>>>>>> rejected if our current set of neighbors is fine. Because I >>>>>>>> still >>>>>>>> want addNeigbor to really add the neighbor, how about putting a >>>>>>>> check before calling that to see if we need more neighbors. >>>>>>>> >>>>>>>> What thinks? >>>>>>>> >>>>>>>> -jonathan >>>>>>>> >>>>>>>> On Dec 9, 2008, at 11:37 PM, ext Eric Chan-Tin wrote: >>>>>>>> >>>>>>>>> Peter, >>>>>>>>> >>>>>>>>> Thanks. That fixed it. >>>>>>>>> >>>>>>>>> However, here is another problem I found while deploying >>>>>>>>> Pyxida >>>>>>>>> and >>>>>>>>> running them for a while. >>>>>>>>> >>>>>>>>> In performMaintenance(), I loop through every neighbor and if >>>>>>>>> (neighbor.getLastUpdateTime() < current_time - 10 minutes), >>>>>>>>> then I >>>>>>>>> remove that neighbor. >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> In NCManager, I call getNeighborToPing, which randomly >>>>>>>>> picks one >>>>>>>>> of >>>>>>>>> the >>>>>>>>> neighbors (32 in this case). Pinging is performed every 10 >>>>>>>>> seconds, so >>>>>>>>> in 10 minutes, you can ping 60 nodes. However, if a neighbor A >>>>>>>>> is not >>>>>>>>> among those 60 nodes, it will get kicked out in >>>>>>>>> performMaintenance(), >>>>>>>>> although A is still alive (it just hasn't been contacted yet). >>>>>>>>> >>>>>>>>> There are some possible options. >>>>>>>>> >>>>>>>>> 1) Increase the timeout period from 10 minutes to ?? minutes. >>>>>>>>> This >>>>>>>>> doesn't quite work because say in Azureus, the number of >>>>>>>>> neighbors is >>>>>>>>> 512, so the timeout needs to be increased even more... It >>>>>>>>> could >>>>>>>>> be a >>>>>>>>> function of MAX_NEIGHBORS... >>>>>>>>> >>>>>>>>> 2) When Pyxida replies to a ping request, it will update the >>>>>>>>> neighbor's >>>>>>>>> lastUpdateTime. This does not quite work either because it >>>>>>>>> assumes a >>>>>>>>> symmetric relationship which is most likely not the case. >>>>>>>>> >>>>>>>>> 3) Introduce some more logic to Pyxida so it remembers what >>>>>>>>> nodes it >>>>>>>>> contacted and does not send, say, more than 2 requests in 10 >>>>>>>>> minutes to >>>>>>>>> the same node. This solution isn't that great either. >>>>>>>>> >>>>>>>>> Let me know what you guys think. I am currently leaning >>>>>>>>> towards >>>>>>>>> option >>>>>>>>> 1) but make the timeout period a function of MAX_NEIGHBORS, >>>>>>>>> although I >>>>>>>>> haven't quite found the perfect function yet. >>>>>>>>> >>>>>>>>> Here are some preliminary results. I compared the number of >>>>>>>>> neighbors >>>>>>>>> which are the same at each reported time (10 minutes). Before >>>>>>>>> the fix, >>>>>>>>> more than 50% of the neighbors changed within 10 minutes. >>>>>>>>> After >>>>>>>>> the >>>>>>>>> fix, >>>>>>>>> only about 30% of the neighbors changed within 10 minutes. >>>>>>>>> It is >>>>>>>>> more >>>>>>>>> stable but I think 30% is still quite a big number. >>>>>>>>> >>>>>>>>> Sorry for the long email :) >>>>>>>>> >>>>>>>>> -Eric >>>>>>>>> >>>>>>>>> Peter Pietzuch wrote: >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> Eric Chan-Tin wrote: >>>>>>>>>>> for (RemoteState<T> neighbor : neighbors) { //this is line >>>>>>>>>>> 1029 >>>>>>>>>>> if (neighbor.getLastUpdateTime() < (curr_time - >>>>>>>>>>> MAINTENANCE_PERIOD)) { >>>>>>>>>>> neighbors.remove(neighbor); >>>>>>>>>>> } >>>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> IIRC you can't modify a collection while iterating over it >>>>>>>>>> using the >>>>>>>>>> above for loop. Instead you have to use a proper >>>>>>>>>> Iterator<RemoteState<T>> and use it's remove method. >>>>>>>>>> >>>>>>>>>> Cheers, >>>>>>>>>> Peter >>>>>>>>>> >>>>>>>>>>> Jonathan Ledlie wrote: >>>>>>>>>>>> On Nov 3, 2008, at 3:46 PM, Eric Chan-Tin wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Jonathan, >>>>>>>>>>>>> >>>>>>>>>>>>> I can insert some code to fix this and it doesn't look too >>>>>>>>>>>>> hard >>>>>>>>>>>>> but I do >>>>>>>>>>>>> have some concerns. >>>>>>>>>>>>> >>>>>>>>>>>>>> I don't remember observing this behavior in our PL >>>>>>>>>>>>>> deployment, but that could have been because we had the >>>>>>>>>>>>>> ping >>>>>>>>>>>>>> time set >>>>>>>>>>>>>> to 1-2 minutes, IIRC. >>>>>>>>>>>>> The default ping time was set to 10 seconds + some >>>>>>>>>>>>> Gaussian >>>>>>>>>>>>> distribution, >>>>>>>>>>>>> so that's what I used, but changing the ping time to 1 >>>>>>>>>>>>> minute >>>>>>>>>>>>> or more >>>>>>>>>>>>> will >>>>>>>>>>>>> result in the same effect except it will take longer >>>>>>>>>>>>> for the >>>>>>>>>>>>> "cycle" to >>>>>>>>>>>>> occur. >>>>>>>>>>>>> >>>>>>>>>>>>> What I have in mind was the following: >>>>>>>>>>>>> >>>>>>>>>>>>> 1) In NCClient.addNeighbor() would be changed to (in >>>>>>>>>>>>> pseudocode) >>>>>>>>>>>>> protected boolean addNeighbor(RemoteState<T> guy) { >>>>>>>>>>>>> boolean added = false; >>>>>>>>>>>>> //if neighbors.size() >= MAX_NEIGHBORS >>>>>>>>>>>>> //Check last update time of each neighbor in >>>>>>>>>>>>> neighbors >>>>>>>>>>>>> //neighbor.getLastUpdateTime() >>>>>>>>>>>>> //if any of them is older than some number >>>>>>>>>>>>> TIMEOUT, >>>>>>>>>>>>> remove the oldest one, then add "guy" to neighbors >>>>>>>>>>>>> //else don't add "guy" to neighbors >>>>>>>>>>>>> //else add "guy" to neighbors >>>>>>>>>>>>> } >>>>>>>>>>>>> >>>>>>>>>>>> IMO this shouldn't be the behavior of this function because >>>>>>>>>>>> it >>>>>>>>>>>> is called >>>>>>>>>>>> "addNeighbor." Instead what could happen is that part >>>>>>>>>>>> of the >>>>>>>>>>>> performMaintenance() function could remove old neighbors. >>>>>>>>>>>> >>>>>>>>>>>>> 2) In NCManager, getGossipNode() would just pick a random >>>>>>>>>>>>> node >>>>>>>>>>>>> from >>>>>>>>>>>>> localNC.primaryNC.getNeighbors() to send a request to >>>>>>>>>>>>> every 10 >>>>>>>>>>>>> seconds. >>>>>>>>>>>>> >>>>>>>>>>>>> 3) Pyxida will still keep track of all GossipNodes in the >>>>>>>>>>>>> network. It >>>>>>>>>>>>> could >>>>>>>>>>>>> also (say every 10 minutes) check its NCClient.neighbors >>>>>>>>>>>>> list >>>>>>>>>>>>> and see if >>>>>>>>>>>>> any neighbor is older than TIMEOUT, remove it, then pick a >>>>>>>>>>>>> random >>>>>>>>>>>>> GossipNode to send a ping request. >>>>>>>>>>>>> >>>>>>>>>>>>> The main problem with this scheme is that one Pyxida node >>>>>>>>>>>>> would >>>>>>>>>>>>> most >>>>>>>>>>>>> likely >>>>>>>>>>>>> stay with the same set of neighbors all the time. Its >>>>>>>>>>>>> coordinates >>>>>>>>>>>>> would be >>>>>>>>>>>>> greatly affected by those 32 nodes (I am assuming >>>>>>>>>>>>> MAX_NEIGHBORS >>>>>>>>>>>>> = 32). >>>>>>>>>>>>> >>>>>>>>>>>> Instead of (2), why not have the getNeighborToPing function >>>>>>>>>>>> determine >>>>>>>>>>>> the behavior? If it returns a node, then ping it. If it >>>>>>>>>>>> returns null, >>>>>>>>>>>> then have the NCManager pick some other node to use? If >>>>>>>>>>>> things are >>>>>>>>>>>> hunky-dory, getNeighborToPing() would return a node (a >>>>>>>>>>>> current >>>>>>>>>>>> member of >>>>>>>>>>>> the neighbor set), if we're short on neighbors, then it >>>>>>>>>>>> would >>>>>>>>>>>> return >>>>>>>>>>>> null. The periodic maintenance would have already >>>>>>>>>>>> tossed old >>>>>>>>>>>> nodes, so >>>>>>>>>>>> this would be self-sustaining. >>>>>>>>>>>> >>>>>>>>>>>> The "problem" you mention is fairly central to network >>>>>>>>>>>> coordinates in >>>>>>>>>>>> general. Sure the coordinates will be a little better the >>>>>>>>>>>> more >>>>>>>>>>>> neighbors you have. But in a very large system talking to >>>>>>>>>>>> everybody >>>>>>>>>>>> isn't feasible. Peter and I have a graph in one of the >>>>>>>>>>>> workshop >>>>>>>>>>>> papers >>>>>>>>>>>> which shows the effect of increasing the number of >>>>>>>>>>>> neighbors >>>>>>>>>>>> >>>>>>>>>>>>> I don't think the network would be partitioned, but it >>>>>>>>>>>>> could >>>>>>>>>>>>> happen. For >>>>>>>>>>>>> example, the boostrap nodes have each other in their >>>>>>>>>>>>> neighbors >>>>>>>>>>>>> list. >>>>>>>>>>>>> Every >>>>>>>>>>>>> new node bootstrapping to the network would contact those >>>>>>>>>>>>> bootstrapping >>>>>>>>>>>>> nodes first and add them to their neighbors. So every >>>>>>>>>>>>> Pyxida >>>>>>>>>>>>> node could >>>>>>>>>>>>> have as its neighbors the "Bootstrap nodes", which would >>>>>>>>>>>>> make >>>>>>>>>>>>> Pyxida >>>>>>>>>>>>> become >>>>>>>>>>>>> a centralized system. >>>>>>>>>>>>> >>>>>>>>>>>> As long as the initial bootstrap list is large enough, I >>>>>>>>>>>> think >>>>>>>>>>>> you are >>>>>>>>>>>> right, that partitions won't be a problem. >>>>>>>>>>>> >>>>>>>>>>>>> One last thing: When Pyxida receives a ping request from a >>>>>>>>>>>>> node, it does >>>>>>>>>>>>> not update its coordinates. Is there any particular reason >>>>>>>>>>>>> for >>>>>>>>>>>>> that? >>>>>>>>>>>>> Is it >>>>>>>>>>>>> because Pyxida will not be able to calculate the RTT? >>>>>>>>>>>>> >>>>>>>>>>>> Yes. >>>>>>>>>>>> >>>>>>>>>>>>> Also, there is no "gossiping" of other nodes' coordinates, >>>>>>>>>>>>> right? So if A >>>>>>>>>>>>> has B in its neighbors list, it will from time to time >>>>>>>>>>>>> tell >>>>>>>>>>>>> other nodes >>>>>>>>>>>>> about B's coordinates. I realize this has a lot of >>>>>>>>>>>>> complications but the >>>>>>>>>>>>> original Vivaldi paper was very fuzzy about this point and >>>>>>>>>>>>> the >>>>>>>>>>>>> previous >>>>>>>>>>>>> point that it seems it is up to the implementors to decide >>>>>>>>>>>>> what >>>>>>>>>>>>> to do. >>>>>>>>>>>>> >>>>>>>>>>>> I don't think we are doing this. It could be useful in >>>>>>>>>>>> certain >>>>>>>>>>>> circumstance (e.g., picking potential new neighbors), but >>>>>>>>>>>> we're >>>>>>>>>>>> not, IIRC. >>>>>>>>>>>> >>>>>>>>>>>>> I will have to think some more about this as there are >>>>>>>>>>>>> a lot >>>>>>>>>>>>> of >>>>>>>>>>>>> nasty >>>>>>>>>>>>> little side effects that could happen, but these are my >>>>>>>>>>>>> thoughts. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> >>>>>>>>>>>>> -Eric >>>>>>>>>>>>> >>>>>>>>>>>> Sounds good. >>>>>>>>>>>> >>>>>>>>>>>> -jonathan >>>>>>>>>>>> >>>>>>>>>>>>> On Nov 3 2008, Jonathan Ledlie wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Eric, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Based on a look at the code just now, I see where you are >>>>>>>>>>>>>> coming from: >>>>>>>>>>>>>> with these params on PL, there isn't anything stopping >>>>>>>>>>>>>> the >>>>>>>>>>>>>> neighbor >>>>>>>>>>>>>> list from circulating pretty quickly through all >>>>>>>>>>>>>> nodes. In >>>>>>>>>>>>>> the >>>>>>>>>>>>>> Azureus implementation, the guys fed to the alg (fed to >>>>>>>>>>>>>> processSample) >>>>>>>>>>>>>> weren't changing much because they were coming from the >>>>>>>>>>>>>> routing finger >>>>>>>>>>>>>> tables. I don't remember observing this behavior in >>>>>>>>>>>>>> our PL >>>>>>>>>>>>>> deployment, but that could have been because we had the >>>>>>>>>>>>>> ping >>>>>>>>>>>>>> time set >>>>>>>>>>>>>> to 1-2 minutes, IIRC. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Do you want to insert some code that will fix this? >>>>>>>>>>>>>> Basically >>>>>>>>>>>>>> we want >>>>>>>>>>>>>> to only add a new neighbor if our oldest neighbor >>>>>>>>>>>>>> should be >>>>>>>>>>>>>> tossed (or >>>>>>>>>>>>>> if we haven't reached MAX_NEIGHBORS). It's probably best >>>>>>>>>>>>>> from a >>>>>>>>>>>>>> clarity standpoint if addNeighbor does continue to >>>>>>>>>>>>>> actually >>>>>>>>>>>>>> add the >>>>>>>>>>>>>> neighbor in all cases. Probably the best thing to do >>>>>>>>>>>>>> is to >>>>>>>>>>>>>> have >>>>>>>>>>>>>> NCClient drop old neighbors internally and have NCManager >>>>>>>>>>>>>> ask >>>>>>>>>>>>>> it which >>>>>>>>>>>>>> nodes should be pinged via getNeighborToPing. So >>>>>>>>>>>>>> NCClient >>>>>>>>>>>>>> would >>>>>>>>>>>>>> continue to have the same behavior on processSample (it >>>>>>>>>>>>>> would not >>>>>>>>>>>>>> reject new information) and NCManager and NCClient would >>>>>>>>>>>>>> cooperate to >>>>>>>>>>>>>> measure RTTs to current neighbors. We do want to make >>>>>>>>>>>>>> sure, >>>>>>>>>>>>>> however, >>>>>>>>>>>>>> that the gossip set stays large enough so that partitions >>>>>>>>>>>>>> don't occur. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Do you want to be added to the SourceForge Pyxida >>>>>>>>>>>>>> developers >>>>>>>>>>>>>> to make >>>>>>>>>>>>>> this change? (Please test it out carefully first.) >>>>>>>>>>>>>> >>>>>>>>>>>>>> -jonathan >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Oct 30, 2008, at 11:32 AM, ext Eric Chan-Tin wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Jonathan, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> From what I saw, eventually, you learn about all the >>>>>>>>>>>>>>> nodes >>>>>>>>>>>>>>> in >>>>>>>>>>>>>>> the >>>>>>>>>>>>>>> network. Every 10 seconds, Pyxida picks a random >>>>>>>>>>>>>>> "Gossip" >>>>>>>>>>>>>>> node from >>>>>>>>>>>>>>> its >>>>>>>>>>>>>>> list and sends a request. The Gossip node will reply >>>>>>>>>>>>>>> back >>>>>>>>>>>>>>> with an >>>>>>>>>>>>>>> "upNeighbor", that is, a "Gossip" node that is "up" or >>>>>>>>>>>>>>> with 10% >>>>>>>>>>>>>>> probability, a node that is "pending". Since naturally, >>>>>>>>>>>>>>> all >>>>>>>>>>>>>>> the nodes >>>>>>>>>>>>>>> bootstrap off the same set of nodes, every node will >>>>>>>>>>>>>>> eventually know >>>>>>>>>>>>>>> of >>>>>>>>>>>>>>> every other node. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Pyxida might not be rotating through everybody >>>>>>>>>>>>>>> continuously >>>>>>>>>>>>>>> (although >>>>>>>>>>>>>>> eventually, since it knows everybody, it will have added >>>>>>>>>>>>>>> everybody to >>>>>>>>>>>>>>> its Neighbors list at some point), but could be >>>>>>>>>>>>>>> removing a >>>>>>>>>>>>>>> node that >>>>>>>>>>>>>>> is >>>>>>>>>>>>>>> "alive". So, if MAX_NEIGHBORS = 32, let's call them A_1, >>>>>>>>>>>>>>> A_2, >>>>>>>>>>>>>>> ..., >>>>>>>>>>>>>>> A_32. >>>>>>>>>>>>>>> Pyxida picks a random Gossip node B, sends it a request >>>>>>>>>>>>>>> and B >>>>>>>>>>>>>>> replies >>>>>>>>>>>>>>> back. Pyxida will add B to its Neighbors list, see that >>>>>>>>>>>>>>> size of >>>>>>>>>>>>>>> Neighbors > MAX_NEIGHBORS, kick A_1 out. Then from those >>>>>>>>>>>>>>> 32 >>>>>>>>>>>>>>> neighbors >>>>>>>>>>>>>>> A_2, ...., A_32, B, it will calculate the force, and >>>>>>>>>>>>>>> apply >>>>>>>>>>>>>>> it >>>>>>>>>>>>>>> to the >>>>>>>>>>>>>>> current system coordinates. Thus, if every 10 seconds, >>>>>>>>>>>>>>> Pyxida >>>>>>>>>>>>>>> picks a >>>>>>>>>>>>>>> different set of nodes C, D, E, etc... it would keep >>>>>>>>>>>>>>> kicking >>>>>>>>>>>>>>> nodes out >>>>>>>>>>>>>>> of the neighbors list. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> It's the "kicking A_1 out" part that seemed a bit >>>>>>>>>>>>>>> confusing. >>>>>>>>>>>>>>> Churn >>>>>>>>>>>>>>> is a >>>>>>>>>>>>>>> good reason but you don't check that A_1 might still be >>>>>>>>>>>>>>> alive >>>>>>>>>>>>>>> or maybe >>>>>>>>>>>>>>> A_1 was just added. I have been printing the neighbors >>>>>>>>>>>>>>> list >>>>>>>>>>>>>>> every 10 >>>>>>>>>>>>>>> minutes, and they are completely different between those >>>>>>>>>>>>>>> 10 >>>>>>>>>>>>>>> minutes. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> One possible problem might be that I am choosing the >>>>>>>>>>>>>>> value >>>>>>>>>>>>>>> of >>>>>>>>>>>>>>> MAX_NEIGHBORS to be too small. Originally it was set to >>>>>>>>>>>>>>> 512 >>>>>>>>>>>>>>> (which was >>>>>>>>>>>>>>> meant for Azureus). Since I am using Planetlab where >>>>>>>>>>>>>>> there >>>>>>>>>>>>>>> are about >>>>>>>>>>>>>>> 400-500 "stable" nodes, I changed the value to 32. But I >>>>>>>>>>>>>>> don't think >>>>>>>>>>>>>>> that should be a problem since log(n) = log(512) = 9. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -Eric >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Jonathan Ledlie wrote: >>>>>>>>>>>>>>>> Eric, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Glad to hear it. Let Peter and me know if you publish >>>>>>>>>>>>>>>> anything using >>>>>>>>>>>>>>>> Pyxida and we can put it on the web page. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> "Gossip" nodes are for learning about the network. >>>>>>>>>>>>>>>> "Neighbors" are >>>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>> constructing the coordinate. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> So, IIRC, the gossip process learns about nodes that >>>>>>>>>>>>>>>> may or >>>>>>>>>>>>>>>> may not >>>>>>>>>>>>>>>> be >>>>>>>>>>>>>>>> applied to the neighbor list. That is, we may >>>>>>>>>>>>>>>> eventually >>>>>>>>>>>>>>>> learn >>>>>>>>>>>>>>>> about a >>>>>>>>>>>>>>>> large portion of the network, via gossiping, but should >>>>>>>>>>>>>>>> only >>>>>>>>>>>>>>>> be using >>>>>>>>>>>>>>>> MAX_NEIGHBORS at any given time for coordinate >>>>>>>>>>>>>>>> computation. >>>>>>>>>>>>>>>> The >>>>>>>>>>>>>>>> reason >>>>>>>>>>>>>>>> we don't want to use a fixed set of neighbors for >>>>>>>>>>>>>>>> coordinate >>>>>>>>>>>>>>>> computation >>>>>>>>>>>>>>>> is because nodes may come and go from the system. >>>>>>>>>>>>>>>> However, >>>>>>>>>>>>>>>> it should >>>>>>>>>>>>>>>> not be the case that we just rotate through everybody, >>>>>>>>>>>>>>>> because >>>>>>>>>>>>>>>> then, as >>>>>>>>>>>>>>>> you note, we'd be computing against everyone's >>>>>>>>>>>>>>>> coordinate/ >>>>>>>>>>>>>>>> latency, >>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>> this is counter to the point of network coordinates. >>>>>>>>>>>>>>>> With this >>>>>>>>>>>>>>>> gradual >>>>>>>>>>>>>>>> transience as a desirable quality in mind, I am pretty >>>>>>>>>>>>>>>> sure we >>>>>>>>>>>>>>>> maintain >>>>>>>>>>>>>>>> neighbors in a fairly fixed manner. Does this not seem >>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>> be the >>>>>>>>>>>>>>>> case >>>>>>>>>>>>>>>> to you? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -jonathan >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Oct 29, 2008, at 9:20 PM, Eric Chan-Tin wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hi Jonathan and Peter, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> First of all, just wanted to say that Pyxida has been >>>>>>>>>>>>>>>>> working >>>>>>>>>>>>>>>>> great on >>>>>>>>>>>>>>>>> Planetlab for the past 4 months or so. Getting good >>>>>>>>>>>>>>>>> data >>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>> experiments >>>>>>>>>>>>>>>>> going well. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> However, I do have one question/comment. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Every Pyxida node contains a list of Neighbors (in >>>>>>>>>>>>>>>>> NCClient) and a >>>>>>>>>>>>>>>>> list >>>>>>>>>>>>>>>>> of Gossip nodes (they are called neighbors in >>>>>>>>>>>>>>>>> NCManager >>>>>>>>>>>>>>>>> but >>>>>>>>>>>>>>>>> I will >>>>>>>>>>>>>>>>> call >>>>>>>>>>>>>>>>> them gossip for now so as not to confuse with the >>>>>>>>>>>>>>>>> 'real' >>>>>>>>>>>>>>>>> neighbors >>>>>>>>>>>>>>>>> in >>>>>>>>>>>>>>>>> NCClient). Every 10 seconds, Pyxida will pick a random >>>>>>>>>>>>>>>>> node >>>>>>>>>>>>>>>>> (call >>>>>>>>>>>>>>>>> it A) >>>>>>>>>>>>>>>>> from Gossip and send a gossip request (basically >>>>>>>>>>>>>>>>> asking >>>>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>>> rtt, >>>>>>>>>>>>>>>>> coordinate, error). But Gossip contains all the >>>>>>>>>>>>>>>>> nodes in >>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>> network. >>>>>>>>>>>>>>>>> The comment in the code said that was to avoid Pyxida >>>>>>>>>>>>>>>>> becoming >>>>>>>>>>>>>>>>> lonely >>>>>>>>>>>>>>>>> and no-one to gossip with, which makes sense. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Once A responds, Pyxida will add A to its Neighbors >>>>>>>>>>>>>>>>> (if >>>>>>>>>>>>>>>>> Neighbors >>>>>>>>>>>>>>>>> does >>>>>>>>>>>>>>>>> not contain A) and remove the first element from the >>>>>>>>>>>>>>>>> Neighbors list. >>>>>>>>>>>>>>>>> This seems a little bit odd. Was there any particular >>>>>>>>>>>>>>>>> reasoning >>>>>>>>>>>>>>>>> behind >>>>>>>>>>>>>>>>> this? Because it would seem like you're getting the >>>>>>>>>>>>>>>>> coordinate from >>>>>>>>>>>>>>>>> every node in the network eventually. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks a lot, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> -Eric >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Jonathan Ledlie wrote: >>>>>>>>>>>>>>>>>> Your inclination is correct. See forwarded message. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> -jonathan >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Apr 9, 2008, at 1:53 PM, ext Eric Chan-Tin wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Thank you for the clarification. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I've been reading through your published papers and >>>>>>>>>>>>>>>>>>> code >>>>>>>>>>>>>>>>>>> and I >>>>>>>>>>>>>>>>>>> must >>>>>>>>>>>>>>>>>>> say I learned a lot about how Pyxida works :) >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I still have a quick and easy (hopefully) question. >>>>>>>>>>>>>>>>>>> Each >>>>>>>>>>>>>>>>>>> Pyxida >>>>>>>>>>>>>>>>>>> node >>>>>>>>>>>>>>>>>>> has a small amount of neighbors and learn about >>>>>>>>>>>>>>>>>>> other >>>>>>>>>>>>>>>>>>> nodes >>>>>>>>>>>>>>>>>>> through >>>>>>>>>>>>>>>>>>> other nodes. I am running on about 200 machines, and >>>>>>>>>>>>>>>>>>> it >>>>>>>>>>>>>>>>>>> seems >>>>>>>>>>>>>>>>>>> like all >>>>>>>>>>>>>>>>>>> the Pyxida nodes are adding each other to their >>>>>>>>>>>>>>>>>>> upNeighbors set. >>>>>>>>>>>>>>>>>>> NCClient.java has the MAX_NEIGHBORS set to 512. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I wanted to make sure that I am not misreading the >>>>>>>>>>>>>>>>>>> code, but >>>>>>>>>>>>>>>>>>> shouldn't >>>>>>>>>>>>>>>>>>> each Pyxida node keep a small number of neighbors in >>>>>>>>>>>>>>>>>>> its >>>>>>>>>>>>>>>>>>> upNeighbors >>>>>>>>>>>>>>>>>>> set. Else, it will be a lot of updates to send a >>>>>>>>>>>>>>>>>>> gossipMessage >>>>>>>>>>>>>>>>>>> to all >>>>>>>>>>>>>>>>>>> 200 neighbors, and it also means that a Pyxida node >>>>>>>>>>>>>>>>>>> has >>>>>>>>>>>>>>>>>>> the whole >>>>>>>>>>>>>>>>>>> network in its upNeighbors set - that seems wrong. I >>>>>>>>>>>>>>>>>>> am >>>>>>>>>>>>>>>>>>> inclined >>>>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>> change the value of MAX_NEIGHBORS but wanted to make >>>>>>>>>>>>>>>>>>> sure >>>>>>>>>>>>>>>>>>> that my >>>>>>>>>>>>>>>>>>> understanding is correct first. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Thank you, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Eric >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> ------------------------------------------------------ >>>>>>>>>>>>>>>>> ------------------- >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> This SF.Net email is sponsored by the Moblin Your Move >>>>>>>>>>>>>>>>> Developer's >>>>>>>>>>>>>>>>> challenge Build the coolest Linux based applications >>>>>>>>>>>>>>>>> with >>>>>>>>>>>>>>>>> Moblin >>>>>>>>>>>>>>>>> SDK & >>>>>>>>>>>>>>>>> win great prizes Grand prize is a trip for two to an >>>>>>>>>>>>>>>>> Open >>>>>>>>>>>>>>>>> Source >>>>>>>>>>>>>>>>> event >>>>>>>>>>>>>>>>> anywhere in the world >>>>>>>>>>>>>>>>> http://moblin-contest.org/redirect.php? >>>>>>>>>>>>>>>>> banner_id=100&url=/ >>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>> Pyxida-users mailing >>>>>>>>>>>>>>>>> list Pyx...@li... >>>>>>>>>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/pyxida- >>>>>>>>>>>>>>>>> users >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> ---------------------------------------------------------- >>>>>>>>>>>>> --------------- >>>>>>>>>>>>> This SF.Net email is sponsored by the Moblin Your Move >>>>>>>>>>>>> Developer's challenge Build the coolest Linux based >>>>>>>>>>>>> applications with Moblin SDK & win great prizes Grand >>>>>>>>>>>>> prize is >>>>>>>>>>>>> a trip for two to an Open Source event anywhere in the >>>>>>>>>>>>> world >>>>>>>>>>>>> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >>>>>>>>>>>>> _______________________________________________ Pyxida- >>>>>>>>>>>>> users >>>>>>>>>>>>> mailing list Pyx...@li... >>>>>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/pyxida-users >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> ------------------------------------------------------------ >>>>>>>>>>> ------------------ >>>>>>>>>>> SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las >>>>>>>>>>> Vegas, Nevada. The future of the web can't happen without >>>>>>>>>>> you. >>>>>>>>>>> Join us at MIX09 to help pave the way to the Next Web >>>>>>>>>>> now. Learn >>>>>>>>>>> more and register at >>>>>>>>>>> http://ad.doubleclick.net/clk;208669438;13503038;i?http:// >>>>>>>>>>> 2009.visitmix.com/ >>>>>>>>>>> _______________________________________________ Pyxida-users >>>>>>>>>>> mailing list Pyx...@li... >>>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/pyxida-users >>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>> >>> >>> >>> >>> >>> -------------------------------------------------------------------- >>> ---------- >>> SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, >>> Nevada. The future of the web can't happen without you. Join us >>> at MIX09 >>> to help pave the way to the Next Web now. Learn more and register at >>> http://ad.doubleclick.net/clk;208669438;13503038;i?http:// >>> 2009.visitmix.com/ >>> _______________________________________________ Pyxida-users mailing >>> list Pyx...@li... >>> https://lists.sourceforge.net/lists/listinfo/pyxida-users >> > > > > ---------------------------------------------------------------------- > -------- > SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, > Nevada. > The future of the web can't happen without you. Join us at MIX09 > to help > pave the way to the Next Web now. Learn more and register at > http://ad.doubleclick.net/clk;208669438;13503038;i?http:// > 2009.visitmix.com/ > _______________________________________________ > Pyxida-users mailing list > Pyx...@li... > https://lists.sourceforge.net/lists/listinfo/pyxida-users |