|
From: Jeremy F. <je...@go...> - 2003-11-21 00:19:05
|
On Thu, 2003-11-20 at 16:27, Julian Seward wrote: > > Ideally, we want a cheap way of detecting s-m-c that can be on all the > > time (then we could get rid of the INVALIDATE_TRANSLATIONS macro). I have > > an extremely vague idea about tracking things at the page level, and > > possibly throwing out all translations that come from code within a page > > in certain circumstances. Or something. Hmm. > > Well, there are some options, but none are good. > > One is (or might be, depending on Jeremy's views) to mess with page level > permissions, so as to remove write permission for any page from which we've > taken a translation. Then V takes a page fault whenever writes to that page > happen; we catch the fault, note the page as dirty and throw away translations > from it as soon as possible. So we freeload on the host's memory protection > hardware and have zero run-time overhead. Well, taking an exception is fairly expensive, and this scheme won't work well at all with this case, since there are probably going to be lots of writes to the same page as the code itself, because it's the stack (which is something the P4 hates anyway...). But as I mentioned, we can somewhat special case this because we can see that we're fetching instructions from near the ESP, which means that 1) they're probably dynamically generated, and 2) have a fairly short lifespan. In fact, if we had one, they'd be a good candidate for interpretation rather than generating code at all. (A UCode interpreter might be pretty useful in other places too, and would be interesting to write.) Also, in the new memory management I've written as part of FV, I explicitly keep track of a CODE flag per segment, so we can easily tell which mapped parts of the address space have code in them. So that alone would be enough to tell if a segment suddenly grows some code. > Another is the pure-software approach in very old valgrinds. I think what I > had was an array of char indexed by addr>>12 or some such (that would be > 2^20 bytes on a 32-bit machine); and there was some kind of check or > something at each write. Not cheap, but you could drastically reduce the > cost by only testing some writes -- for example, writes from the FPU are > most unlikely to generate code, and writes happening as part of a > read-op-write operation x86 insn are also unlikely to. Also I think I > excluded writes of sizes > 1 since it doesn't make much sense to write > x86 code on a word-by-word basis, since it's really a byte stream. Unless someone memcpys some code onto the stack in word-sized chunks. Another not-fully-formed idea: If we're translating a piece of code where we think the code may get overwritten in the near future (ie, stack thunks), how about we generate some code into the BB itself, to check it's own validity. If it decides that the translated code is out of date, it would drop back into the scheduler asking for a re-translation. The tricky bit is deciding on a predicate which would allow it to determine if it is out of date or not. Looking at %esp is one (if its above the code, then its dead), and in memcheck, it could look at the V bits of its own orig_addr. Hm, but neither of these would work in this case... I guess a checksum of the orig code would do it - slow, but for this case it wouldn't matter much. > Personally I prefer the portability/system independence of the pure-sw > approach. IIRC the optimised version didn't give much overhead, but I > can't really remember any more, and I don't think I have a copy of the > code base that still has that stuff in it. Well, the FV stuff pretty much depends on virtual memory with demand paging and page protections anyway, so using it to do code invalidates doesn't require any more anyway - but I don't think it's all that useful anyway. > I wonder if the frequent %esp changes will give a performance problem for > stack writes. Let's see: if code is written into the stack and then > executed, the first time, we make a correct translation, and we mark that > stack page as dirty. The next write to that page gets the overhead of > discarding translations from that page. So it looks like, apart from the > cost of checking every write (subject to above filtering criteria), the cost > of supporting s-m-c is proportional to the number of translation discards > to be done, and the stack doesn't cause special problems. Stack writes seem to already be a problem for maintaining A and V bits in memcheck and addrcheck anyway, so I think we may end up special casing either stack writes or at least stack movement. But we'll see. > If it should ever come to pass that the vcpu stuff gets completely redesigned, > we could translate multiple bbs at once and find where the loops are. It may > then be possible to identify writes to memory locations in which we can see > that a subsequent write happens to the same location, before we lose track > of the control flow. In that case we know that the first write cannot be > generating code (since it's overwritten later, and we know all the places > where the program will execute in between the two writes) and so the s-m-c > check for the first write is redundant. So, the use of a cleverer > translation/analysis engine, originally intended to do better liveness > analysis, may also help here. Well, my first instinct is that you have to be careful about racing with other CPUs in this case - but of course there aren't any other CPUs, and we control thread scheduling. So that's OK. There's still some risk if we're sharing memory containing code with another process, and that other process changes the code, but I think that's really pathological... J |