|
From: Greg P. <gp...@us...> - 2004-12-18 10:39:41
|
Jeremy Fitzhardinge writes: > In Unix, the distinction we make between fast and slow syscalls is > basically a performance optimisation. If you're doing something quick, > like gettimeofday, getpid, or opens of a regular file, then there's no > need to go though the process of rescheduling the thread, but you > clearly have to do that for anything blocking to prevent a deadlock. In > the Mach case, does the same distinction exist, or could any message > take an indefinite amount of time to complete? For some calls it should be safe to expect them to complete in a reasonable time. The trap that implements pthread_self() on some systems is an obvious example. > > Memory sharing is pervasive at the application level. The window > > server in particular uses it heavily. > > Hm, that's going to be a bit tricky. We'll just have to assume that all > shared memory is always defined. Yes, the contents of memory mapped in from another process must be presumed defined. That's not unlike mmapping a file: there's no guarantee that particular parts of the file are initialized, but Valgrind has to assume they are. On Linux, what is the initial state of newly mmapped MAP_ANON memory? Is it random, or is it zero-filled? Mach's vm_allocate() guarantees zero-filled memory, so Valgrind will also have to assume that it is initialized unless it's owned by malloc or some other intermediary that Valgrind knows about or controls. > > 1. Executables are generally compiled non-relocatable starting at 0x1000. > > This can't be changed. > > What form would Valgrind take? Valgrind would probably be a shared library, loaded into the process by the dynamic linker as instructed by an environment variable (not unlike LD_PRELOAD), and with an initialization function called by the dynamic linker. Alternatively, if that initialization function is called too late, we make a copy of libc.dylib that links to libvalgrind, and load that libc into the process using another environment variable. Valgrind's initializer will then be called before libc initializes. > > 2. The highest part of memory (0xfffe0000+) contains some user code > > from the C library and the Objective-C runtime. If necessary, > > these could be avoided in Valgrind's codegen. Also present is > > the shared pasteboard; I don't know if it can be moved. > > Does this range also include library static data, or is it purely read- > only? Can Valgrind also make use of this library code without stomping > on the client state? There is no library data in this region. This code is only a few that are implemented differently on different CPUs, like AltiVec- enhanced memcpy(), or dual-processor-aware locking primitives. I'm not sure Valgrind would gain much by calling them itself, though it should be safe to do so for some of them. The shared pasteboard is a read/write region shared by most applications. > Actually, it sounds to me like you could maybe use a dual address space > model. Have Valgrind live in one process, and make it control the > target process by injecting mappings into it and setting the CPU state. > I guess you'd still need to put the shadow memory into the target > address space (so that the instrumentation code can get to it), which > means that the available address space is still constrained. That's certainly possible under Mach, but I don't think we'd gain much for the IPC cost. I assume the shadow memory is the largest part of Valgrind in terms of memory footprint, the most likely to be clobbered by a misbehaving process, and the most expensive to try to store remotely. Certainly Valgrind's own code could live in another process, but that probably doesn't occupy much space. You could go whole hog and map only small parts of the shadow memory into the target process, "faulting" different regions in as necessary. That would preserve address space at the cost of implementation complexity and time. > It sounds like the address space is going to get pretty crowded. Is 64- > bit MacOS an interesting target yet? Not yet, though it may be by the time a 32-bit port is running. 64-bit address space support is promised for Mac OS X 10.4, which is expected sometime in the first half of 2005. However, that support will be initially limited to the lowest-level system libraries. A typical large application should have at least 3.0 GB of available address space at main(), with about 1.5 GB max contiguous. -- Greg Parker gp...@us... gp...@se... |