|
From: Paul M. <pa...@sa...> - 2005-06-22 22:27:29
|
Julian Seward writes: > Re invalidations, I was going to ask: throwing stuff out of the > (V's) translation cache is tremendously expensive as it involves > a linear search of all translations. That means performance will > suffer badly if the client does a lot of icbis. Is that something > you noticed to be a problem with 2.4.0-ppc ? The translation > cache stuff could be redesigned (I suppose) to convert invalidation > cost from O(N) to O(log N) or O(handwaving-hashtable-cost N) > kinda thing, but that's significant hassle that I'd rather avoid > if not necessary. We get a lot of icbis when doing dynamic linking, because the PLT contains code which gets modified when an entry gets resolved, which is done lazily. In other words, if a program calls a function that turns out to be in a shared library, the linker will create a PLT entry for it and make the bl instruction jump to the PLT entry. The dynamic linker initially sets up the PLT entry to contain instructions to load a constant into r11 (IIRC) and jump to the resolver. The resolver then works out the actual address of the function and modifies the PLT entry to contain instructions to jump directly to the function. It then has to use dcbst and icbi on the PLT entry to make sure the CPU sees the new instructions. That is good for us because it gives us an explicit signal to go off and invalidate the translation of the PLT entry, but it does mean that icbis are relatively common. We are in the process of changing the way that the PLT works because of the security concerns about having memory that is both writable and executable, so eventually the need for efficient icbis will diminish. But for now they need to go reasonably fast. I'm not sure whether the cost you're talking about is of the same order as the cost we have in 2.x for invalidating a translation, where we basically had to scan all other translations to find any that had chained to the one we're invalidating and unchain them. That loop was (IIRC) consuming about 90% of the several minutes that it took to start up mozilla, until I changed vg_transtab.c to create, for each translation, a linked list of translations that chained to it. With that the unchaining is very fast (because we know precisely which translations to unchain) and the overhead of icbi became negligible. Paul. |