|
From: Jeremy F. <je...@go...> - 2004-03-08 01:44:46
|
On Fri, 2004-03-05 at 09:10, KJK::Hyperion wrote: > At 01.14 02/03/2004, Jeremy Fitzhardinge wrote: > >Well, the thing which Valgrind ideally wants is two complete address > >spaces: one for the client, and one for Valgrind. > > are you 100% sure of what you're saying? if I understand correctly, at > least the JIT compiler needs to run in the same process as the client. > Otherwise we'd have an insane amount of inter-process memory copying, which > doesn't exactly come for free Well, there is the question of which address space the generated code should go into. You're right, it needs quick access to the client memory for client execution, and quick access to the shadow memory for the instrumentation. I guess in principle you could play segment games for this or something... I haven't thought about this too deeply. We definitely need the notion of multiple address spaces, but Valgrind's precise requirements somewhat more demanding than the normal uses of address spaces. It wants a distinct address space to keep the client in, and one for the core Valgrind code, but since the client code doesn't execute directly, we need some union space in which generated code can get to both the client and Valgrind address spaces equally easily. This is doable but awkward in a single linear address space: you just partition the address space at some point, and say everything below is client, and everything above is Valgrind - this is what we do now, and so long as client-visible objects don't have high fixed addresses, it all works. The main problem is that 4G is a bit of a squeeze. This model should work a lot better for 64-bit address spaces, since there's a lot more room to play with. (x86-64 is a bit odd, since the toolchain doesn't support code above 4G, and is "only" a 53/48? bit address space anyway.) Two linear address spaces (ie, two separate unix processes) would allow the client to have full run of one whole address space, but there's no clear way in which generated code could have efficient access to both address spaces. Some kind of segmentation scheme, in which we map two segments to two distinct linear address spaces would work, since client code could use some kind of segment prefix on memory operations to distinguish which address space it wants to access. Unfortunately, I don't think this works on x86 (since segmentation is layered on top of paging, so all segments ultimately map onto the same underlying 4G paged address space). PPC's notion of segments is somewhat different, but I don't think it does the right thing either. In other words, if someone wanted to make a CPU and OS optimised for running Valgrind, it would look somewhat different from the Unix model or the Windows model, I think. > this sounds like good news, finally. Is there an updated technical > overview? last time I looked, it said Valgrind didn't do multithreading I think that overview was obsolete then, then. Multithreading has been in there a long time. But it needs a lot more updating, since things changed quite a bit in 2.1.0, and 2.1.1 (when it appears). 2.1.2 will rearrange the source tree and probably include at least the FreeBSD port, but I don't think there'll be much deep architectural change. > >Does this mean that if we translate this code into the instrumented code > >cache, then things will care because the EIP isn't within the > >kernel/user/gdi.dll? > > no, this shouldn't be a problem, it only hurt full virtualization. The > kernel-mode windowing and graphics code does call back to user-mode, but it > only does so through a well-defined entry point - one of the entry points > Valgrind will have to catch anyway not to lose control I don't really understand what the problem with FV is. If kernel32.dll, user32.dll and gdi32.dll need to be loaded once at a fixed address, there's still no reason why the client and Valgrind couldn't share the same copy. All the client uses of code in those .dlls will be virtualized, of course. Hm, I guess if those libraries are holding state, we need to make sure the client version and Valgrind versions are distinct. If this is a case, it doesn't seem like FV itself is the problem - its the more general problem of how to multiplex one "process" state/context between two somewhat independent separate programs. > Apropos, some details for the other Windows guys: > - the entry points (exported by ntdll.dll) are: > - KiRaiseUserExceptionDispatcher > - KiUserApcDispatcher > - KiUserCallbackDispatcher > - KiUserExceptionDispatcher > KiUserApcDispatcher will always be hit at least once per thread, as an > APC is queued to all threads to run the initialization code. We won't need > to do special handling of any of them, though. We'll just switch to the JIT > upon entering one of them. The first thread entering the JIT will > initialize it, the others will spin until initialization is done. We could > detect new threads by checking the flat address of their TEB against an > internal table, or by allocating a TLS slot and checking for its value > (NULL -> new thread) What's an APC? Async procedure call? What does that mean? > - the entry points above aren't enough. Some control transfers happen > outside of them - luckily there aren't many. I've counted three: > ZwContinue, ZwSetContextThread and ZwVdmControl. The first two are easy, > the third is a mistery. I know it causes control transfers to and from V86 > virtual machines, but how does it do that is not known - luckily, only > NTVDM uses it I guess this would be an elaboration of the games we play currently with signals? That's the only place in Unix where the kernel asynchronously changes the process context. > - catching system calls is a mess. Hooking system calls directly in > kernel mode, as dangerous as it is, is the best way for several reasons. I > don't like how that strace for Windows does it, though. To signal a call > I'd raise an exception: it's semantically correct, so it will work well > with existing (and future) code Is there some distinct instruction or class of instructions used for calling into the kernel? int? lcall through some special call gate? Any of those we can identify at translation time and do the right thing. J |