From: Gwenole B. <gb....@fr...> - 2006-10-26 22:05:51
|
Le jeudi, 26 oct 2006, =E0 21:06 Europe/Paris, Sebastian Biallas a =E9crit= : > Thanks, this works. But I guess it's linux-only. I believe any other x86_64 OS that is reasonnable enough implements a=20 similar feature. >> or build with -fPIE + normal mmap() and use RIP addressing. > > Hmm, how does this help? Should I relocate all other code once I know > where my translation cache is? With PIE (Position Independent Executable), the code + data sections=20 are relocated above 32-bit, possibly randomized. "Normal mmap()" was a=20= little vague. Here are a few ideas to increase the likelihood to have=20 the resulting area next to the relocated .text (so that branches to=20 non-JIT code can fit into 32-bit offsets). a) Set the mmap() start arg so that it rounds up &main to the next 128=20= MB segment, for example. That is (void *)((((uintptr_t)&main) + MB_128)=20= & -MB_128) with const uintptr_t MB_128 =3D 128 * 1024 * 1024. Note=20 however that, nowadays (with kernel 2.6), brk space can grow at will.=20 So, the typical limit of 32 MB for brk is gone (used for small=20 allocations < 128 KB, by default). So, 128 MB is assuming you know your=20= global use of malloc() fits that and you don't have memory leaks.=20 Otherwise, it might overlap your mmap()'ed region. b) Allocate your translation cache as uint8=20 translation_cache[translation_cache_size]=20 __attribute__((aligned(4096))). And then mprotect(+PROT_EXEC) that=20 region on startup. Although POSIX doesn't guarantee it, this works on=20 Linux. That solution is IMHO, more predictable. With PIE, data is relocated above 32-bit too. So, you can use=20 RIP-relative addressing to access global data of your program from the=20= JIT generated code. This generally encodes by one byte shorter than=20 using the SIB prefix to get an absolute 32-bit address, which you can't=20= use anyway if you relocated your program above 32-bit. Now that you have your pearpc executable relocated above 32-bit, you=20 have up to 2^32 - 4 KB (IIRC) full address space left to implement=20 hwmmu correctly. The following approach may work: shm_open() +=20 ftruncate() the fd to the desired Mac RAM size. Then, mmap 4K pages at=20= the desired offset from this fd + MAP_FIXED at a specific address which=20= corresponds to the Mac-side VA. You can have a normal mmap() of the=20 whole fd to get the physical Mac RAM.= |