From: Nicholas M. <nm...@gm...> - 2006-07-04 03:36:40
|
On Mon, 03 Jul 2006 18:59:24 -0700, Ian Romanick wrote: > Here is my first pass at run-time dispatch generation code for x86-64. > It's not particularly well tested yet (or I'd just commit it now). I > wrote a test program that called fill_in_entrypoint with a few different > parameter signature strings, and I verified that the generated code > looked correct. > > After some discussion on IRC, I've been thinking about how we might make > run-time dispatch generation work on "hardend" systems. Those systems > don't allow a memory region to be both writable and executable. We can > use mprotect to adjust the protections. There are a couple sublte > problems that need to be overcome to make this really work: > > 1. mprotect works at page granularity. > > 2. Once a page is marked (PROT_READ|PROT_EXEC) we can *never* make it > PROT_WRITE again. Multithreaded applications are the problem here. > Imagine one thread jumping to a dispatch function right when another > thread makes the page containing that dispatch function PROT_WRITE. > > 3. Since glXGetProcAddress returns a pointer to the dispatch stub, > memory must be allocated at that time. > > 4. On x86-64 and PowerPC, the dispatch functions cannot be fully created > until the driver asks for them to be created. x86 has the same problem > currently, but that will be changed soon. > > Normally when an application calls glXGetProcAddress a dummy stub is > created. In the current implementation, a dispatch offset is not > assigned at this time. The existing code expects that the driver will > later ask for the function to be added and will provide a dispatch > offset. This is actually a bug, and it prevents drivers that support, > for example, APPLE_vertex_array_object from working with versions of > libGL that don't. Once I commit a fix for that, libGL could assign a > dispatch offset when glXGetProcAddress is called. > > This would allow libGL to create a fully functional dispatch stub on > x86. In fact, libGL could create an entire page of dispatch stubs the > first time glXGetProcAddress is called. This can be done on x86 because > the dispatch function is independent of the parameter signature of the > function being dispatched. > > x86-64 and PowerPC do not share this feature. Since these platforms > pass all parameters in registers, the dispatch function for glBegin is > different from the dispatch function for glTexImage2D. When > glXGetProcAddress is called it is impossible to know what the parameter > signature, and thus the contents of the dispatch stub, should be. This > prevents those platforms from being able to create a page of dispatch > stubs at a time. > > I guess we could do a single dispatch function per page, but since the > dispatch functions are on the order of 128 bytes, that seems awfully > wasteful. > > Thoughts? The GLX_USE_TLS case on AMD64 doesn't require any function calls to get the dispatch pointer (although, you seem to be making one anyway for some reason), so there isn't any need to generate register save/restore code (and, therefore, no need to know the function signature ahead of time). Even with the function call, you know that the TLS _x86_64_get_dispatch doesn't clobber any parameter registers, so the save/restore still isn't necessary. For the general PTHREADS case, you might even be able to get away with the assumption that calling pthread_getspecific won't clobber any of the XMM registers, although you should probably talk to the libc people about that first. (Actually, considering the variation in libcs and the possibility that somebody is doing memcpy via the FPU or might start in the future without telling you, this is probably a stupid idea.) Also for the PTHREADS case, you could attempt to make the page PROT_WRITE|PROT_EXEC, check to see if the syscall fails, and then either generate a page of stubs that pessimistically save all registers or allocate exact stubs on demand. However, I don't think there's going to be much demand for a pure PTHREADS implementation on AMD64 -- everything sensible is likely to have TLS. (And I'm pretty sure that the OSs with the restrictive security policies all definitely have TLS.) Some of this might also apply to PowerPC, but I'm not really familiar with that architecture. |