|
From: mindcrime <min...@in...> - 2004-02-28 17:27:08
|
Hi guys. My name is Rajesh and I am a student of Computer Sci in India. I want to write a valgrind like utility for Windows. I know I sound silly but How do I start? I am sorry if I am bothering u guys Yours, RajeshIndiatimes Email now powered by APIC Advantage. Help! HelpClick onthe image to chat with me |
|
From: Chris J. <ch...@at...> - 2004-03-01 19:42:38
|
> Hi guys. > My name is Rajesh and I am a student of Computer Sci in India. I want to write a valgrind like utility for Windows. I know > I sound silly but How do I start? I am sorry if I am bothering u guys I think your main work will be in re-writing vg_mylibc.c and vg_syscalls.c. vg_mylibc.c is ok to re-write but for vg_syscalls.c you need to know all the Windows syscalls, what parameters they take, etc. Not all of these are documented but you can typically work out most of the parameters by disassembling the Nt* preamble for a Zw* system call. There are two main system call tables, one for the standard Zw* calls and another for GDI, etc. calls. I have no idea what system calls are contained in the second table but i have seen some calls documented somewhere. I am willing to help you if you want. I recommend you buy Windows NT/2000 Native API reference by Garry Nebbett. Chris |
|
From: KJK::Hyperion <no...@li...> - 2004-03-01 23:39:28
|
At 20.37 01/03/2004, Chris January wrote:
>>My name is Rajesh and I am a student of Computer Sci in India. I want to
>>write a valgrind like utility for Windows. I know I sound silly but How
>>do I start? I am sorry if I am bothering u guys
>Ithink your main work will be in re-writing vg_mylibc.c and vg_syscalls.c.
>vg_mylibc.c is ok to re-write but for vg_syscalls.c you need to know all
>the Windows syscalls, what parameters they take, etc.
I've been thinking about Valgrind on Windows for a long time, and here's
the major stumbling blocks I've found:
- absolutely, completely forget full virtualization, i.e. loading the
target program in the same process as Valgrind. kernel32.dll, user32.dll
and gdi32.dll can only be loaded *once per process* and *at their default
base address*, and it's very impractical to write Valgrind tools without
using Win32 functions. In detail:
- kernel32.dll:
- once per process because once loaded it connects through LRPC to
the Win32 subsystem (winsrv.dll, running in the context of CSRSS). The
connection needs to be done because the subsystem won't accept calls from
unknown clients, and subsystem calls include the whole console API -
definitely something we want to support. And the connection can only be
done once per process because the process and thread ids are sent to the
subsystem at each call, and the subsystem not only keeps track of them
internally, but requires them to be valid process and thread ids, since it
will open them to perform operations. All of this is microkernel-era crap
that today is otherwise completely irrelevant, but we're stuck with it
- at the default base address to make CreateRemoteThread work.
CreateRemoteThread is used, among others, to signal processes attached to a
console that a control event (for example a Control-C) has been received
- user32.dll, gdi32.dll:
- once per process for reasons similar to kernel32.dll
- at the default base address for reasons you really don't want to
know. It has to do with the fact that the windowing system was ported from
Windows 95, where not only user32.dll and gdi32.dll were actually loaded at
the same address in all processes, but they were also loaded in *shared memory*
- liberal use of shared memory. It isn't uncommon for several processes
to share memory even with kernel mode code
- inter-process modifications. Nicholas told me this is going to be one
of the worst issues, as Valgrind currently assumes that near to no
interaction between unrelated processes is possible. On the other hand,
this kind of interactions is almost commonplace in Windows. Most can be
checked, as they happen at well-defined times (like passing buffers outside
the port section in LRPC calls), while others (like handle duplication) are
asyncronous and unpredictable. Some (like thread context changes) could
outright break Valgrind, altough they're hopefully rarely used
- I/O control operations. These are intrinsically problematic, and the
fact that many common IOCTLs (like the AFD IOCTLs, the default
implementation of TCP/IP for Winsock) aren't documented doesn't help a bit
>I have no idea what system calls are contained in the second table but i
>have seen some calls documented somewhere.
a crude listing of NtUserXxx and NtGdiXxx system calls can be obtained from
the debugging symbols of win32k.sys. Note that some NtUserXxx functions are
multiplexers, actually implementing several different calls each, and that
some others have unusual effects on the calling process (like mapping
several shared memory views) that can confuse Valgrind. Finally, an awful
lot of user32 functions aren't system calls, but they read into memory
shared with the kernel-mode windowing system (window and menu handles
aren't really handles, but decorated offsets into a table stored in said
shared memory). A strange example of such calls is GetWindowRect, which,
reading 64 bits at once, should be prone to race conditions on 32 bit
machines - and it indeed *is* vulnerable! One wonders how can such a f***ed
up windowing system work so well in practice
>I am willing to help you if you want.
count me in too, even if I think that nothing short of a full fork of
Valgrind will do
|
|
From: Jeremy F. <je...@go...> - 2004-03-02 00:20:38
|
On Mon, 2004-03-01 at 15:36, KJK::Hyperion wrote: > At 20.37 01/03/2004, Chris January wrote: > >>My name is Rajesh and I am a student of Computer Sci in India. I want to > >>write a valgrind like utility for Windows. I know I sound silly but How > >>do I start? I am sorry if I am bothering u guys > >Ithink your main work will be in re-writing vg_mylibc.c and vg_syscalls.c. > >vg_mylibc.c is ok to re-write but for vg_syscalls.c you need to know all > >the Windows syscalls, what parameters they take, etc. > > I've been thinking about Valgrind on Windows for a long time, and here's > the major stumbling blocks I've found: > - absolutely, completely forget full virtualization, i.e. loading the > target program in the same process as Valgrind. kernel32.dll, user32.dll > and gdi32.dll can only be loaded *once per process* and *at their default > base address*, and it's very impractical to write Valgrind tools without > using Win32 functions. In detail: Well, the thing which Valgrind ideally wants is two complete address spaces: one for the client, and one for Valgrind. The idea is that the Valgrind should be able to control all the activity in the client address space, control execution, inject generated code, etc. We can't get that with the Unix memory model, but maybe we can use the (otherwise very painful) cross-address space features of Windows to get this effect. > - kernel32.dll: > [...] > - user32.dll, gdi32.dll: > - once per process for reasons similar to kernel32.dll > - at the default base address for reasons you really don't want to > know. It has to do with the fact that the windowing system was ported from > Windows 95, where not only user32.dll and gdi32.dll were actually loaded at > the same address in all processes, but they were also loaded in *shared memory* Does this mean that if we translate this code into the instrumented code cache, then things will care because the EIP isn't within the kernel/user/gdi.dll? Also, is this code running in Ring 3, or a privileged level? > >I am willing to help you if you want. > > count me in too, even if I think that nothing short of a full fork of > Valgrind will do I get that feeling too. J |
|
From: KJK::Hyperion <no...@li...> - 2004-03-05 20:18:24
|
At 01.14 02/03/2004, Jeremy Fitzhardinge wrote:
>Well, the thing which Valgrind ideally wants is two complete address
>spaces: one for the client, and one for Valgrind.
are you 100% sure of what you're saying? if I understand correctly, at
least the JIT compiler needs to run in the same process as the client.
Otherwise we'd have an insane amount of inter-process memory copying, which
doesn't exactly come for free
>The idea is that the Valgrind should be able to control all the activity
>in the client address space, control execution, inject generated code,
>etc. We can't get that with the Unix memory model, but maybe we can use
>the (otherwise very painful) cross-address space features of Windows to
>get this effect.
this sounds like good news, finally. Is there an updated technical
overview? last time I looked, it said Valgrind didn't do multithreading
>Does this mean that if we translate this code into the instrumented code
>cache, then things will care because the EIP isn't within the
>kernel/user/gdi.dll?
no, this shouldn't be a problem, it only hurt full virtualization. The
kernel-mode windowing and graphics code does call back to user-mode, but it
only does so through a well-defined entry point - one of the entry points
Valgrind will have to catch anyway not to lose control
Apropos, some details for the other Windows guys:
- the entry points (exported by ntdll.dll) are:
- KiRaiseUserExceptionDispatcher
- KiUserApcDispatcher
- KiUserCallbackDispatcher
- KiUserExceptionDispatcher
KiUserApcDispatcher will always be hit at least once per thread, as an
APC is queued to all threads to run the initialization code. We won't need
to do special handling of any of them, though. We'll just switch to the JIT
upon entering one of them. The first thread entering the JIT will
initialize it, the others will spin until initialization is done. We could
detect new threads by checking the flat address of their TEB against an
internal table, or by allocating a TLS slot and checking for its value
(NULL -> new thread)
- the entry points above aren't enough. Some control transfers happen
outside of them - luckily there aren't many. I've counted three:
ZwContinue, ZwSetContextThread and ZwVdmControl. The first two are easy,
the third is a mistery. I know it causes control transfers to and from V86
virtual machines, but how does it do that is not known - luckily, only
NTVDM uses it
- catching system calls is a mess. Hooking system calls directly in
kernel mode, as dangerous as it is, is the best way for several reasons. I
don't like how that strace for Windows does it, though. To signal a call
I'd raise an exception: it's semantically correct, so it will work well
with existing (and future) code
>Also, is this code running in Ring 3, or a privileged level?
in normal processes, all code runs at ring 3. Well, winsrv.dll (in the
CSRSS process) has a couple of functions (in user-mode memory) that run in
kernel-mode, in kernel-mode threads created by win32k.sys, but it's the
only instance I know of. Well, the PsCreateSystemThread function *is*
documented in the driver writing documentation, and it *could* be used to
create kernel-mode threads in any process, but I'd be surprised if anyone
used it on a process that isn't System. And even in that case it isn't
something Valgrind can possibly control, and it won't be commonplace
|
|
From: Jeremy F. <je...@go...> - 2004-03-08 01:44:46
|
On Fri, 2004-03-05 at 09:10, KJK::Hyperion wrote: > At 01.14 02/03/2004, Jeremy Fitzhardinge wrote: > >Well, the thing which Valgrind ideally wants is two complete address > >spaces: one for the client, and one for Valgrind. > > are you 100% sure of what you're saying? if I understand correctly, at > least the JIT compiler needs to run in the same process as the client. > Otherwise we'd have an insane amount of inter-process memory copying, which > doesn't exactly come for free Well, there is the question of which address space the generated code should go into. You're right, it needs quick access to the client memory for client execution, and quick access to the shadow memory for the instrumentation. I guess in principle you could play segment games for this or something... I haven't thought about this too deeply. We definitely need the notion of multiple address spaces, but Valgrind's precise requirements somewhat more demanding than the normal uses of address spaces. It wants a distinct address space to keep the client in, and one for the core Valgrind code, but since the client code doesn't execute directly, we need some union space in which generated code can get to both the client and Valgrind address spaces equally easily. This is doable but awkward in a single linear address space: you just partition the address space at some point, and say everything below is client, and everything above is Valgrind - this is what we do now, and so long as client-visible objects don't have high fixed addresses, it all works. The main problem is that 4G is a bit of a squeeze. This model should work a lot better for 64-bit address spaces, since there's a lot more room to play with. (x86-64 is a bit odd, since the toolchain doesn't support code above 4G, and is "only" a 53/48? bit address space anyway.) Two linear address spaces (ie, two separate unix processes) would allow the client to have full run of one whole address space, but there's no clear way in which generated code could have efficient access to both address spaces. Some kind of segmentation scheme, in which we map two segments to two distinct linear address spaces would work, since client code could use some kind of segment prefix on memory operations to distinguish which address space it wants to access. Unfortunately, I don't think this works on x86 (since segmentation is layered on top of paging, so all segments ultimately map onto the same underlying 4G paged address space). PPC's notion of segments is somewhat different, but I don't think it does the right thing either. In other words, if someone wanted to make a CPU and OS optimised for running Valgrind, it would look somewhat different from the Unix model or the Windows model, I think. > this sounds like good news, finally. Is there an updated technical > overview? last time I looked, it said Valgrind didn't do multithreading I think that overview was obsolete then, then. Multithreading has been in there a long time. But it needs a lot more updating, since things changed quite a bit in 2.1.0, and 2.1.1 (when it appears). 2.1.2 will rearrange the source tree and probably include at least the FreeBSD port, but I don't think there'll be much deep architectural change. > >Does this mean that if we translate this code into the instrumented code > >cache, then things will care because the EIP isn't within the > >kernel/user/gdi.dll? > > no, this shouldn't be a problem, it only hurt full virtualization. The > kernel-mode windowing and graphics code does call back to user-mode, but it > only does so through a well-defined entry point - one of the entry points > Valgrind will have to catch anyway not to lose control I don't really understand what the problem with FV is. If kernel32.dll, user32.dll and gdi32.dll need to be loaded once at a fixed address, there's still no reason why the client and Valgrind couldn't share the same copy. All the client uses of code in those .dlls will be virtualized, of course. Hm, I guess if those libraries are holding state, we need to make sure the client version and Valgrind versions are distinct. If this is a case, it doesn't seem like FV itself is the problem - its the more general problem of how to multiplex one "process" state/context between two somewhat independent separate programs. > Apropos, some details for the other Windows guys: > - the entry points (exported by ntdll.dll) are: > - KiRaiseUserExceptionDispatcher > - KiUserApcDispatcher > - KiUserCallbackDispatcher > - KiUserExceptionDispatcher > KiUserApcDispatcher will always be hit at least once per thread, as an > APC is queued to all threads to run the initialization code. We won't need > to do special handling of any of them, though. We'll just switch to the JIT > upon entering one of them. The first thread entering the JIT will > initialize it, the others will spin until initialization is done. We could > detect new threads by checking the flat address of their TEB against an > internal table, or by allocating a TLS slot and checking for its value > (NULL -> new thread) What's an APC? Async procedure call? What does that mean? > - the entry points above aren't enough. Some control transfers happen > outside of them - luckily there aren't many. I've counted three: > ZwContinue, ZwSetContextThread and ZwVdmControl. The first two are easy, > the third is a mistery. I know it causes control transfers to and from V86 > virtual machines, but how does it do that is not known - luckily, only > NTVDM uses it I guess this would be an elaboration of the games we play currently with signals? That's the only place in Unix where the kernel asynchronously changes the process context. > - catching system calls is a mess. Hooking system calls directly in > kernel mode, as dangerous as it is, is the best way for several reasons. I > don't like how that strace for Windows does it, though. To signal a call > I'd raise an exception: it's semantically correct, so it will work well > with existing (and future) code Is there some distinct instruction or class of instructions used for calling into the kernel? int? lcall through some special call gate? Any of those we can identify at translation time and do the right thing. J |
|
From: KJK::Hyperion <no...@li...> - 2004-03-19 20:51:14
|
At 02.36 08/03/2004, Jeremy Fitzhardinge wrote: >Two linear address spaces (ie, two separate unix processes) would allow >the client to have full run of one whole address space, but there's no >clear way in which generated code could have efficient access to both >address spaces. couldn't the memory for the data be shared between the two processes? and offsets used in place of pointers? >I think that overview was obsolete then, then. Multithreading has been in >there a long time. but to me it looks like it isn't real multithreading. From the papers I've read, it looks like it's emulated. Would it be problematic to make Valgrind truly multithreaded? >If this is a case, it doesn't seem like FV itself is the problem - its the >more general problem of how to multiplex one "process" state/context >between two somewhat independent separate programs. maybe you're right. Maybe as little as a separate thread for Valgrind will do. You'd lose the ability to instrument initialization code, though, and Valgrind would be much more complicated having to track several hundreds of library calls *possibly intermixed with system calls* (several system DLLs, services and tools make use of them), rather than just about a hundred system calls >What's an APC? Async procedure call? What does that mean? they are a mechanism akin to real-time (queued) signals, albeit of a much lower level (you queue APCs by specifying a routine address directly) and a lot less useful (the user-mode ones only fire at specific points - and they will never fire under typical conditions - and the need to specify a routine address makes them unusable inter-process. Kernel-mode APCs are much better, since they're never held while the target thread runs in user mode, and I've already coded a proof-of-concept that implements asyncronous signals through them). They are useful in many obscure instances, though, like executing code in the context of a newly created thread and in some specific asyncronous I/O scenarios. They're important for us because creating an user-mode thread intrinsically queues an APC to it, fired as soon as the thread is resumed. We therefore know that every thread begins its lifecycle inside ntdll!KiUserApcDispatcher Like I said, no special handling of the callback routines is needed. They just end up calling user code or longjmp-like system calls. A possible exception is ntdll!KiUserCallbackDispatcher, because it's pretty odd: it's a kernel-to-user call, terminated with a special system call that restores the kernel-mode context previous to the call. You'd think this could cause all sort of weird re-entrancy issues if you called other system calls in such a callback, but it appears to work perfectly. Callbacks at work in the real world: menus in Windows are modal and entirely implemented in kernel mode, so how can some applications draw their own menus? simple: their custom drawing routines are called with a callback. To give you an idea of how robust they are, consider that there's an Explorer extension that draws the preview of picture files in their context menu >I guess this would be an elaboration of the games we play currently with >signals? yes, except a lot easier :-) the only annoying part is that there isn't a system call to compare object handles (i.e. to know if they refer to the same object), so you need to look up the thread ids from the handles to know if the program is trying to change the context of a thread under Valgrind's control >Is there some distinct instruction or class of instructions used for >calling into the kernel? int? lcall through some special call gate? Any >of those we can identify at translation time and do the right thing. it was easy until Windows 2000. There was a number of well-known software interrupts that called the kernel (0x2E for system calls, 0x2D for debugger output/input, etc.), and everything was easy and predictable. System call thunks looked like this: mov eax, <system call number> mov edx, esp int 2Eh retn <size of parameters> In Windows XP and later, much to our collective amazement, they became: mov eax, <system call number> mov edx, <very high, fixed address> call edx retn <size of parameters> The "very high" is an unbelievable 0x7FFE0300, straight in the middle of that no man's land between the kernel memory base and the user-mode memory probe address: not considered valid user-mode memory by most system calls (because it's above the probe address), but not kernel-mode memory yet (because it's below the kernel memory base). Single-stepping it with a debugger reveals the mistery, and it turns out to be a quite smart trick. It looks like that the actual system call thunking code (sysenter, it turns out) is generated by the kernel at boot time, and written there to shield user-mode code from CPU subtleties. It also makes DLLs containing system call thunks binary compatible with WOW64 platforms (64-bit systems with x86 emulation) - on those platforms the thunk likely calls a 64-bit routine that marshalls the parameters and finally calls the kernel Anyway, the kernel-user shared memory area has a semi-documented layout that never changed since 1993, and the debugger even displays a symbolic name (forgot which) for the thunk routine, so we can expect a certain stability in this field. I wouldn't be surprised if the address stayed stable and fixed for another ten years, so "call 7FFE0300h" is a pretty good bet for "system call" |
|
From: Jeremy F. <je...@go...> - 2004-03-21 23:42:03
|
Quoting "KJK::Hyperion" <no...@li...>: > At 02.36 08/03/2004, Jeremy Fitzhardinge wrote: > >Two linear address spaces (ie, two separate unix processes) would allow > > >the client to have full run of one whole address space, but there's no > > >clear way in which generated code could have efficient access to both > > >address spaces. > > couldn't the memory for the data be shared between the two processes? > and > offsets used in place of pointers? Well, the data is the big thing, so you'd stll have the problem of fitting everything into the one adderss space. memcheck uses 9 bits of shadow for every 8 bits of client memory, so if they're both in the same address space, you're always going to be able to use less than half the avaliable adderss space for your program. > but to me it looks like it isn't real multithreading. From the papers > I've > read, it looks like it's emulated. Would it be problematic to make > Valgrind > truly multithreaded? That would be very hard work, since every instruction could potentially be concurrently modifying some structure which is being used by another thread. It is really multithreaded as far as the client is concerned; the main problem is that it will only ever use 1 CPU on an SMP system. > >I guess this would be an elaboration of the games we play currently > with > >signals? > > yes, except a lot easier :-) Yes, I think I can see how they can be handled. > mov eax, <system call number> > mov edx, <very high, fixed address> > call edx > retn <size of parameters> > > The "very high" is an unbelievable 0x7FFE0300, straight in the middle of Hm, that isn't all that high. Does that mean a process has less than 2G of available address space under XP? > It looks like that the actual system call thunking code (sysenter, it > turns > out) is generated by the kernel at boot time, and written there to > shield > user-mode code from CPU subtleties. Linux does the same thing these days; the syscall entrypoint is at 0xffffd000 or so, and it contains whatever is the most efficient way of doing a syscall on this CPU. J |
|
From: KJK::Hyperion <no...@li...> - 2004-03-22 20:21:51
|
At 00.41 22/03/2004, Jeremy Fitzhardinge wrote: >Well, the data is the big thing, so you'd stll have the problem of fitting >everything into the one adderss space. memcheck uses 9 bits of shadow for >every 8 bits of client memory, so if they're both in the same address >space, you're always going to be able to use less than half the avaliable >adderss space for your program. I'm not too worried about memory usage (well, there's the issue of placing Valgrind data so that it doesn't conflict with certain non-relocable system DLLs... but there should be plenty of room in the middle). Separation of address spaces is more a matter of "playing by the rules". Anyway, how do tools register with the JIT engine so they are called at certain points? because the issue now is whether they can store most data in Valgrind's process and only require small "registration data" in the client, or not. Ideally, all tools (except maybe memcheck) should run in Valgrind's process and be called through some form of RPC by the JIT (running in the client), so their execution won't interfere with the client Apropos, I've downloaded some CVS release, but I have a hard time understanding much of it. Basically, all I've understood is that Valgrind has its own scheduler. The rest looks pretty obscure. What do you think would be the best way to get started on Valgrind internals? On an unrelated topic: does the core code depend on GCCisms? I've only seen surprisingly little inline assembler, some expression-with-statements, noreturn functions and functions with registry parameters - all of which have some equivalent in Windows compilers - and playing games with symbol names, which doesn't have effect on Win32. Is there much more? >It is really multithreaded as far as the client is concerned; *this* is what I'm not sure about. I've read the latest Microsoft SQL Server has its own scheduler, and I've read a pretty detailed description of it on the weblog of some Microsoft guy. It looks like it can work only for very specific operations, like only for file I/O - SQL server can afford it because, like all database servers, it's largely self-contained, but most applications aren't And don't forget the pervasive inter-process interactions. Pressing Control-C in a console, asyncronous signals lacking, always creates a thread in each process attached to the console. You right-click on the console window title and select the "properties" menu? a thread is created in the oldest process in order of attachment to display the property pages. And there's no way you can accurately duplicate the scheduler in user mode - some scheduler object may come from outside the process, or even from a driver, and only grant access for being waited on. My general impression is that Windows doesn't like cheaters (or at least loves to make their lifes miserable) Anyway, what about holding a mutex that is only yielded before system calls (and after some quantum expires, to prevent starvation)? isn't that more or less the behavior that Valgrind simulates? >Hm, that isn't all that high. Does that mean a process has less than 2G >of available address space under XP? maybe I'm confusing addresses. You know, all those hexadecimal digits... anyway </me fetches calculator>, the highest user-mode address is reported here (Windows 2000) as being 0x7FFEFFFF, meaning 64 Kb are unavailable. The shared read-only data begins at 0x7FFE0000, and includes the tick counter, some information about the kernel and of course the system call thunk. Not sure where the probe address is at, and if its semantics are what I believe they are (probably not) |
|
From: Chris J. <ch...@at...> - 2004-03-01 19:57:14
|
> Hi guys. > My name is Rajesh and I am a student of Computer Sci in India. I want to write a valgrind like utility for Windows. I know > I sound silly but How do I start? I am sorry if I am bothering u guys Further to my previous post you might find this website useful: http://razor.bindview.com/tools/desc/strace_readme.html Chris |
|
From: Nicholas N. <nj...@ca...> - 2004-03-02 09:28:09
|
On Mon, 1 Mar 2004, Jeremy Fitzhardinge wrote: > > I think that nothing short of a full fork of Valgrind will do > > I get that feeling too. Same here. Not only does Windows have rather, er, interesting architectural features, you would be fighting its closedness the whole way. Sounds like a nightmare. N |
|
From: KJK::Hyperion <no...@li...> - 2004-03-03 21:52:50
|
At 10.23 02/03/2004, Nicholas Nethercote wrote: >>>I think that nothing short of a full fork of Valgrind will do >>I get that feeling too. >Same here. Not only does Windows have rather, er, interesting >architectural features, Actually, the kernel is pretty straightforward - Linux 2.6 comes pretty close to it in features and architecture. The problem is that the kernel is too far from the applications. Example: the socket functions are implemented by an user-mode multiplexer, and it's only the *default* implementation that processes them in kernel mode, and anyway they get there as IOCTLs. This makes it very difficult to validate the use an application is making of BSD sockets >you would be fighting its closedness the whole way. Sounds like a nightmare. sounds like real fun, to me :-) but I'm involved in ReactOS (<http://reactos.com/>), so I may be biased |
|
From: Jeremy F. <je...@go...> - 2004-03-23 07:16:21
|
Quoting "KJK::Hyperion" <no...@li...>: > I'm not too worried about memory usage (well, there's the issue of > placing > Valgrind data so that it doesn't conflict with certain non-relocable > system > DLLs... but there should be plenty of room in the middle). Separation of > > address spaces is more a matter of "playing by the rules". Yes, but there is the issue of simply running out of address space. The numbers you mention below suggest that there's less that 2G of address space for applications under Windows, which means that if the client is sharing the address space with shadow data, there is less than 1G for the client's own use. > Anyway, how > do > tools register with the JIT engine so they are called at certain points? Um, well, they get to instrument the code as it goes through the JIT. There's also special callbacks for things like allocations, but the majority is done with instrumentation. > because the issue now is whether they can store most data in Valgrind's > > process and only require small "registration data" in the client, or > not. > Ideally, all tools (except maybe memcheck) should run in Valgrind's > process > and be called through some form of RPC by the JIT (running in the > client), > so their execution won't interfere with the client Why do you say that? Memcheck, addrcheck, cachegrind, and helgrind use shadow memory a lot (at least every memory access), and making access to the shadow any slower would have enormous performance effects. > Apropos, I've downloaded some CVS release, but I have a hard time > understanding much of it. Basically, all I've understood is that > Valgrind > has its own scheduler. The rest looks pretty obscure. What do you think > > would be the best way to get started on Valgrind internals? Julian's internals document is still a reasonable start for the overall design, though many of the details have changed. Using --trace-* options will give you some idea about what's going on inside. It isn't wildly complex, but there are a lot of details. > On an unrelated topic: does the core code depend on GCCisms? I've only > seen > surprisingly little inline assembler, some expression-with-statements, > > noreturn functions and functions with registry parameters - all of which > > have some equivalent in Windows compilers - and playing games with > symbol > names, which doesn't have effect on Win32. Is there much more? Local functions with lexically-scoped variables are probably the most unportable gcc extension. > >It is really multithreaded as far as the client is concerned; > > *this* is what I'm not sure about. I've read the latest Microsoft SQL > Server has its own scheduler, and I've read a pretty detailed > description > of it on the weblog of some Microsoft guy. It looks like it can work > only > for very specific operations, like only for file I/O - SQL server can > afford it because, like all database servers, it's largely > self-contained, > but most applications aren't Well, for each application level thread, Valgrind creates a kernel thread in order to deal with blocking syscalls. It's just that the application code itself doesn't run in that thread. In other words, Valgrind looks like a multi-threaded program to the kernel, even if it does simple time-slicing within one thread for the client application threading. > >Hm, that isn't all that high. Does that mean a process has less than > 2G > >of available address space under XP? > > maybe I'm confusing addresses. You know, all those hexadecimal digits... > > anyway </me fetches calculator>, the highest user-mode address is > reported > here (Windows 2000) as being 0x7FFEFFFF, meaning 64 Kb are unavailable. > The > shared read-only data begins at 0x7FFE0000, and includes the tick > counter, > some information about the kernel and of course the system call thunk. > Not > sure where the probe address is at, and if its semantics are what I > believe > they are (probably not) Well, that's only about 2G. Typically under linux, the client address space is from 0-3G (though it can be different for different kernel configurations). [ sorry about the formatting - nasty webmail ] J |
|
From: Nicholas N. <nj...@ca...> - 2004-03-23 09:05:46
|
On Mon, 22 Mar 2004, Jeremy Fitzhardinge wrote: > > Anyway, how do tools register with the JIT engine so they are called > > at certain points? > > Um, well, they get to instrument the code as it goes through the JIT. There's > also special callbacks for things like allocations, but the majority is done with > instrumentation. Tools don't need to "register" as such; by choosing the right names for the appropriate functions (eg. the instrumentation function) they get called at the right times. At least, that's how it used to work; recent changes may have affected this, but the basic idea is the same. > Why do you say that? Memcheck, addrcheck, cachegrind, and helgrind use shadow > memory a lot (at least every memory access), and making access to the shadow any > slower would have enormous performance effects. (Cachegrind doesn't use shadow memory.) > Julian's internals document is still a reasonable start for the overall design, > though many of the details have changed. Using --trace-* options will give you > some idea about what's going on inside. It isn't wildly complex, but there are a > lot of details. You could also look at http://www.cl.cam.ac.uk/~njn25/pubs/valgrind2003.ps.gz, which is a bit more recent than the internals document, and is mostly still up-to-date. Also, look at the example skins: "Lackey", and the one in the example/ directory. N |
|
From: KJK::Hyperion <no...@li...> - 2004-03-23 18:04:14
Attachments:
map.png
|
At 08.16 23/03/2004, Jeremy Fitzhardinge wrote: >Yes, but there is the issue of simply running out of address space. The >numbers you mention below suggest that there's less that 2G of address >space for applications under Windows, which means that if the client is >sharing the address space with shadow data, there is less than 1G for the >client's own use. most applications I've seen (including heavyweights like Opera with dozens of tabs and huge link/tab history files) never require more than half of the address space. I've verified this experimentally with a small program I've written that plots the virtual memory map of a given process. In general, the highest portion of the address space is taken by system DLLs and system data such as the PEB and TEBs, the portion slightly below it by other DLLs, and the lowest by nearly everything else (heaps, stacks, mapped shared memory, the main executable, etc.) They say a picture tells a thousand words, so I've attached a sample output. It shows the memory usage of the aforementioned instance of Opera, 1 pixel per memory page. The colors: black is free memory, yellow is DLL-mapped memory (the yellow bar at the top is the main executable), green is anonymous virtual memory (dark if reserved but not committed) and red is mapped memory (dark if mapped from a file). Addresses increase from top to bottom and from left to right. The lowest address is 0x00010000 and the highest 0x7FFEFFFF Anyway, note the *large* black space in the middle. Now, consider that this is an anomaly. The second largest virtual memory space on this machine (the instance of Eudora I'm typing this e-mail in) has *way* more than half of the address space free, in a nice contiguous block in the middle. Other things to consider are that 1) all DLLs are relocable, so those yellow bars you see could easily be moved up or down should necessity arise, that 2) reserved anonymous memory (dark green) can be considered to all practical effects free (is the shadow memory sparse?) and that 3) to 'grind *really* memory-consuming applications you can always boot with the /3GB kernel switch (it does what it sounds it does) >>Anyway, how do tools register with the JIT engine so they are called at >>certain points? >Um, well, they get to instrument the code as it goes through the JIT. so they are linked statically? >There's also special callbacks for things like allocations, but the >majority is done with >instrumentation. hmmm. I'll have to get back at this, when I have some more time >Julian's internals document is still a reasonable start for the overall >design, though many of the details have changed. Using --trace-* options >will give you some idea about what's going on inside. It isn't wildly >complex, but there are a lot of details. cool. I'll try as soon I can >Well, for each application level thread, Valgrind creates a kernel thread >in order to deal with blocking syscalls. perfect >Well, that's only about 2G. Typically under linux, the client address >space is from 0-3G (though it can be different for different kernel >configurations). it's not the default for Windows, but it's supported. It has a problem in that, even when enabled, a certain flag must be set in the main executable for the address space to be really 3GB, but I know a way to work around that |