From: Mikael P. <mi...@cs...> - 2005-03-17 12:13:26
|
I consider the control data API issues to be mostly resolved, but the mmap()ed 'struct perfctr_cpu_state' appears to cause some dispute. From user-space it basically looks like this on x86/amd64/ppc32: struct perfctr_cpu_state { unsigned int cstatus; struct { /* k1 is opaque in the user ABI */ unsigned int id; int isuspend_cpu; } k1; /* The two tsc fields must be inlined. Placing them in a sub-struct causes unwanted internal padding on x86-64. */ unsigned int tsc_start; unsigned long long tsc_sum; struct { unsigned int map; unsigned int start; unsigned long long sum; } pmc[]; /* the size is not part of the user ABI */ }; A few comments on this layout: - The cstatus and k1 fields are accessed by the low-lever drivers at context switches and sampling points. They need to be there at the start of the state, for cache line conservation reasons. - Ditto tsc_start/tsc_sum. - The pmc[] array is of variable length. All counter operations (at suspends, resumes, and sample taking) need the "map" field, and most need one or both of "start" and "sum. Hence the need to keep them together. Now, one complaint is that the start values are 32-bit, and certainly in 10 years time or so from now, that won't suffice. The problem is that extending them to 64 bits causes the number of cache lines accessed to go up considerably on 32-bit machines, and that's something I want to avoid. Consider: struct { unsigned int map; unsigned long long start; unsigned long long sum; } pmc[]; /* the size is not part of the user ABI */ Due to alignment, this is now 24 bytes per entry (4 bytes wasted). Moving the map value out of the struct limits it to 16 bytes (Ok), but then we have the problem of _locating_ the per-counter map[] array since these arrays don't have a fixed size (in user-space). If you can solve that w/o causing undue overheads, then I'd say the ABI should switch to 64-bit start values as well. /Mikael |