|
From: Julian S. <js...@ac...> - 2002-11-20 22:14:02
|
[I was going to put two topics in here, but decided it's simpler to put each in its own msg.] You'll have to forgive me for not quite keeping up with you young 'uns. I see the stuff on extended bbs but haven't grokked it properly yet. Besides, I spent half the day in meetings with CPU architects and if that isn't enough to put one's head in a spin, I don't know what is. Jeremy. I've built it with the 47-chained-bb patch and it works. A questions: What's the deal on the LRU mechanism when chaining is going, now? Do chained translations get marked when they get used, now? Not afaics ... but this is going to present something of a problem when LRUing because chained ones will appear to not have been used, and so will be much more likely to be dumped. Perhaps we can do something clever based on the idea that if a T has been chained it must have been used at least once since the last LRU pass. Not critical, but something which we need to have a plausible story on. J |
|
From: Jeremy F. <je...@go...> - 2002-11-21 00:48:29
|
On Wed, 2002-11-20 at 14:20, Julian Seward wrote:
> A questions: What's the deal on the LRU mechanism when
> chaining is going, now? Do chained translations get marked when they
> get used, now? Not afaics ... but this is going to present something
> of a problem when LRUing because chained ones will appear to not have
> been used, and so will be much more likely to be dumped. Perhaps we can
> do something clever based on the idea that if a T has been chained
> it must have been used at least once since the last LRU pass.
I did notice the problem and made a small attempt to address it:
patch_me updates the epoch field in the same way as the dispatch loop.
It doesn't actually help much, but it looks like I paid attention to it.
I was thinking of a scheme like an OS VM scheme: have a loop go through
the TTEs at some low rate and unchaining blocks; if on the 2nd pass the
block is still unchained and still hasn't updated the LRU entry, then
consider it old.
J
|
|
From: Nicholas N. <nj...@ca...> - 2002-11-21 10:03:28
|
On 20 Nov 2002, Jeremy Fitzhardinge wrote: > > A questions: What's the deal on the LRU mechanism when > > chaining is going, now? > > I did notice the problem and made a small attempt to address it: > patch_me updates the epoch field in the same way as the dispatch loop. > It doesn't actually help much, but it looks like I paid attention to it. > > I was thinking of a scheme like an OS VM scheme: have a loop go through > the TTEs at some low rate and unchaining blocks; if on the 2nd pass the > block is still unchained and still hasn't updated the LRU entry, then > consider it old. It might be worth re-mentioning the approach used by Dynamo, Walkabout, Strata and seemingly all the other dynamic translators for handling this situation -- they just flush the code cache. But they have smaller translations and smaller code caches, and spend less time on translations. And I think Julian's LRU mechanism is extremely fast. So maybe just ignore this. N |
|
From: Julian S. <js...@ac...> - 2002-11-21 23:26:07
|
> > > A questions: What's the deal on the LRU mechanism when > > > chaining is going, now? > > > > I did notice the problem and made a small attempt to address it: > > patch_me updates the epoch field in the same way as the dispatch loop. > > It doesn't actually help much, but it looks like I paid attention to it. > > > > I was thinking of a scheme like an OS VM scheme: have a loop go through > > the TTEs at some low rate and unchaining blocks; if on the 2nd pass the > > block is still unchained and still hasn't updated the LRU entry, then > > consider it old. Ha, that sounds like it might work well enough. I sometimes wondered how OSs implement the pseudo-LRU stuff for VMs. Is there a standard description of this, that you know of? > It might be worth re-mentioning the approach used by Dynamo, Walkabout, > Strata and seemingly all the other dynamic translators for handling this > situation -- they just flush the code cache. > > But they have smaller translations and smaller code caches, and spend less > time on translations. And I think Julian's LRU mechanism is extremely > fast. So maybe just ignore this. Yes, that is the simplest thing, but I fear it would be very expensive. A less radical similar scheme is to randomly discard some fraction of the translations. Then you get into schemes where you randomly move some fraction of the translations into a "victimisation cache" (in hardware speak). If they turn out to be needed again soon, they can be moved back from the cache back into the main set of active translations. If the V-cache gets full _it_ then is flushed. This scheme has the effect of somewhat mitigating the situation where a translation is randomly selected for demotion and then used again soon afterwards. But it's all a bit complex. J |
|
From: Jeremy F. <je...@go...> - 2002-11-22 02:53:21
|
On Thu, 2002-11-21 at 15:31, Julian Seward wrote:
> Ha, that sounds like it might work well enough. I sometimes wondered
> how OSs implement the pseudo-LRU stuff for VMs. Is there a standard
> description of this, that you know of?
This is called the one-hand clock algorithm. The two-hand variant is
used when the scan over the whole of memory would take too long, and you
want to test before a complete pass has been made. I don't know what
the standard reference is, but I think pretty much all OS books talk
about it (Tannenbaum, for example).
> Yes, that is the simplest thing, but I fear it would be very expensive.
> A less radical similar scheme is to randomly discard some fraction of
> the translations. Then you get into schemes where you randomly move
> some fraction of the translations into a "victimisation cache" (in
> hardware speak). If they turn out to be needed again soon, they can
> be moved back from the cache back into the main set of active translations.
> If the V-cache gets full _it_ then is flushed. This scheme has the
> effect of somewhat mitigating the situation where a translation is
> randomly selected for demotion and then used again soon afterwards.
> But it's all a bit complex.
Dynamo doesn't quite dump everything when the cache gets full. It keeps
track of the rate of creation of new basic blocks, and dumps the cache
when this starts getting high, on the grounds that the program is
entering a new working set and the cache is stale anyway. You could
define "too high" in terms of how full the cache is in proportion to the
size you'd like it to be, since you'd obviously like a high cache fill
rate when the cache is empty.
A generational caching scheme sounds good, so long as it doesn't
actually involve physically moving the code.
J
|