|
From: Adun R. <adu...@ml...> - 2004-08-13 11:05:32
|
On Fri, 13 Aug 2004 02:24:53 +0200, "KJK::Hyperion" <no...@li...> said: > At 16.50 11/08/2004, you wrote: > >Indeed. APC is a main concern of ours [we'll need to be able to regain > >control]. > > no, APCs are easy - that's what I meant with "almost everything Valgrind > has to struggle with on Linux, on Windows is super-easy". APCs re-enter > user mode from ntdll!KiUserApcDispatcher, i.e. a single point, as opposed > to signals on Linux. You can even catch the first instruction executed in > any thread, that way (all threads begin execution inside an APC), thus > you > can profile the DllMains and even the internal startup code. The real > problem in regaining control is on inter-process calls to > SetThreadContext: > the local ones you can intercept easily, but what about the ones coming > from other processes? Hopefully, it isn't an operation that is performed > very often When we run the process, we can hide it, but from whom? as I see it, the client process will notify some already running process [could be a 3rd-party watchdog process that exsist on the system that waits for our client process to wake-up, yeah, right.] or create another process [remote - but we can try and run it debugged too, making the call local] or thread [which, since we'll debug all threads, is a local call, problem solved] that will later change a client process' thread context. Do you suggest to hook the APC set-up? > >>Windows debugger support lacks at least "entering system call" and > >>"leaving system call" notifications. > >Indeed, but since we generate the code, a special treatment could be made > >to the SYSENTER instruction that is used. > > To be precise: old Windows versions used int 0x2E, Windows XP and later > use > an unspecified instruction located at a fixed memory address (in a zone > of > virtual memory shared between all processes, that can't be written to nor > freed), which happens to be sysenter. Pretty easy to catch. Valgrind even > already supports it (Linux system calls had a similar evolution) > I ignored the int 0x2e, since there's not alot of difference for our goals. > That said, I have investigated a way to raise an exception every time a > system call is entered or left. It involves a kernel-mode driver, though, > and some non-portable trickery > > >>And the JIT engine (running in the client's context) should be separated > >>from the rest of Valgrind > >This is a change to the linux design of valgrind, but we can do a complete > >separation between the client and valgrind (both will run in different > >processes, where the valgrind process will control the client, debugger like). > > I don't think it can be done - how do you validate/profile each > instruction? Sure, you could put the thread in single-step mode and use > ReadProcessMemory, but won't that be a little too heavy? On a second > thought... we control all memory allocations by hooking the system calls, > so we could do a little caching. Hmmm, it could even work This is the general idea. We can also recompile the entire client process code [thus adding the valdgrind validating/profiling features], copy it into the client process address space, and run it from there [after we hook the IAT and the change SYSENTER/INT 0X2E calls in the recompiled code into a set of instructions that we'll assure our control] > >Not being able to relocate dll's doesn't seem like a problem to me, can > >you [again] elaborate? > > You can't implement something like Full Virtualization on Windows. FV > consists of running Valgrind and the client in the same process, each > with > its own instances of the C runtime and other libraries. This allows you > to > use any library and any language in the profiling tools, rather than just > C > with the subset of the runtime Valgrind used to provide. You can't do > this > on Windows because you can't load two instances of the base Windows DLLs. > But, if the debugger idea works, we won't need to worry about it Ok. [won't a multi-threaded version of the crt will do the trick?] Regards, Rauch Adun. -- Adun R. adu...@ml... |
|
From: KJK::Hyperion <no...@li...> - 2004-08-16 00:29:20
|
(hope this gets to the list, my STMP has been blacklisted somewhere) At 13.05 13/08/2004, Adun R. wrote: >When we run the process, we can hide it, but from whom? we can only hope it never happens (it could make us completely lose control of the client). Or page-guard everything even vaguely looking like executable memory >Do you suggest to hook the APC set-up? they cause uncontrolled and unpredictable jumps, totally out of the JIT's control. Since they're easy to catch (single dispatcher function at a known location), sure I do. Other such "abnormal" re-entry points to consider are KiRaiseUserExceptionDispatcher (lets a kernel-mode driver call RtlRaiseStatus in an user-mode thread), KiUserCallbackDispatcher (kernel-to-user callback, mainly used to call user-defined callbacks - e.g. custom menu rendering - while the thread is spinning - in kernel mode - in the modal menu loop) and KiUserExceptionDispatcher (dispatches an user-mode exception raised by the current thread through VEH and/or SEH) >This is the general idea. We can also recompile the entire client process >code [thus adding the valdgrind validating/profiling features], copy it >into the client process address space, and run it from there hmmm, this is a lot to take in. Now ideas just keep popping up in my head. We could also run all of the client's code in Valgrind's process and flush the shadow state (registers, virtual memory) into the actual process/threads at certain points (for example on system calls). This will also bring us a lot closer to the current Valgrind design, e.g. the Valgrind-managed RR scheduler will become possible (will this stay? the latest discussions seem to suggest it isn't) I'm still a bit worried about the performance hit of repeated inter-process memory copyies, but maybe, by hooking the appropriate system calls, we can emulate the allocation of private virtual memory with the mapping of unnamed shared memory: Valgrind could then access a lot (still not all) of the client process's memory without a single context switch >[after we hook the IAT and the change SYSENTER/INT 0X2E calls in the >recompiled code into a set of instructions that we'll assure our control] IAT hooking isn't necessary. We're going to hook at a much lower level >Ok. [won't a multi-threaded version of the crt will do the trick?] nope, it will still share global state with the client (if the client uses it), and anyway I can't imagine people will accept a Valgrind you can't write GUI tools for. In general, FV is not an option on Windows. Most subsystems involve state kept in sync between user-mode and kernel-mode components - one kernel process equals one process equals one main executable |
|
From: Adun R. <adu...@ml...> - 2004-08-16 17:45:43
|
Hold your horses; it seems like you're planning to run atleast part of your code in ring 0. This isn't really what I had in mind. With regards to the thoughts you and I raised, why do you think this is needed? Regards, Rauch Adun. -- Adun R. adu...@ml... |
|
From: KJK::Hyperion <no...@li...> - 2004-08-19 00:52:54
|
At 19.45 16/08/2004, Adun R. wrote: >Hold your horses; it seems like you're planning to run atleast part of >your code in ring 0. nope. If you mean the KiXxx functions, despite what the name may suggest, they are user-mode functions. They're probably called like that because only the kernel calls them. No, what a session of Valgrind for Windows would look like is: - run the client as a debuggee - place breakpoints on the KiXxx callbacks - initialize the shadow state (virtual memory, etc.) with the client's current state - execute the client's code in the Valgrind process, leaving the client's threads suspended until they perform a system call |
|
From: Adun R. <adu...@ml...> - 2004-08-27 08:47:39
|
On Wed, 25 Aug 2004 04:07:22 +0200, "KJK::Hyperion" <no...@li...> said: > At 16.33 20/08/2004, you wrote: > >I see not difference between changing the IAT or placing Breakpoints > >inside the code of those functions. > > well, since we control the whole memory space and flow of control, we > don't > need to hook any function (we can just stop execution when the emulated > flow of control reaches certain points or instructions, like the system > call sequences), except the KiXxx callbacks. For those we need a > breakpoint, because we can't change their address and because they make > us > lose control of the program's flow, somewhat like signals on Linux We mean the same. > >I fail to see why not run the entire code in the client's process. > > because 1) we need full control on the program's flow, 2) it drastically > reduces the amount of context switches between client and Valgrind and 3) > it allows us to fully take advantage of the inter-process manipulation > features of Windows, which Valgrind would like to enjoy on Linux but > cannot. I'm still interested to hear about your idea of how Valgrind on > Windows would work, though - could you describe a "typical" session? > maybe > an example will help you clarify your ideas > I don't want context switches at all nor inter-process manipulation, though I would like to hear more from you, as this seems as a good solution as well. The valgrind's process starts the client's process in suspended mode. The next step is to we recompile the client's process code, so that we will be able to implant special code upon instruction such as sysenter or int 2Eh [This was already said by you and by me]. We will also capture setting the KiXXX call-backs and change that code to point to our own code. We will also add the entire profiling mechanism (Valgrind's features) into the code [we will also change the IAT of the client's process, so that for example, malloc will point to our code]. After all this is done, we take the entire compiled code. Copy it into the altered client's process, and start execution form there using SetThreadContext. The valgrind's process is irrelevant from this point onward. -- Adun R. adu...@ml... |