|
From: Niall D. <ndo...@bl...> - 2013-04-10 20:02:31
Attachments:
smime.p7s
|
Dear Valgrind Devs, I've finished porting the patch I mentioned earlier to Linux valgrind, so I now just have the unit tests to fix up and I can start the process of getting Legal to authorize release to you guys for inclusion into trunk. Given that Legal will take a month or more as it's GPL, can I quickly detail what I've done so you can tell me now what to change? Please bear in mind that changes after patch release require another round of Legal and therefore another month or more, so getting it right first time would be really great. The patch is fairly minimal. At the start of cachegrind/callgrind if the cache configuration parameters were not supplied, it looks for a file called 'cpucacheconfig.xml' if one was not supplied. If that is present, it loads it. If it is not present, it calls a forthcoming MIT licensed open source BlackBerry library called GenericCPUCacheConfig which does some timing tests to figure out min and max IPS, cache line size, cache configurations etc. which it outputs into cpucacheconfig.xml. If the architecture specific CPUID routines return anything useful, they override any output from GenericCPUCacheConfig. The only difference thereafter with this patch is the following additional information appears in the preambles of cachegrind and callgrind outputs: Sample cachegrind output: desc: I1 cache: 32768 B, 64 B, 8-way associative, 153 picosec desc: D1 cache: 32768 B, 64 B, 8-way associative, 153 picosec desc: LL cache: 6291456 B, 64 B, 12-way associative, 1112 picosec desc: main memory: 1427 picosec desc: max instruct cost:302 picosec desc: min instruct cost:101 picosec cmd: ls events: Ir I1mr ILmr Dr D1mr DLmr Dw D1mw DLmw ... Sample callgrind output: version: 1 creator: callgrind-3.8.1 pid: 39879 cmd: ls part: 1 desc: I1 cache: 32768 B, 64 B, 8-way associative, 153 picosec desc: D1 cache: 32768 B, 64 B, 8-way associative, 153 picosec desc: LL cache: 6291456 B, 64 B, 12-way associative, 1112 picosec desc: main memory: 1427 picosec desc: max instruct cost:302 picosec desc: min instruct cost:101 picosec desc: Timerange: Basic block 0 - 233441 desc: Trigger: Program termination positions: line events: Ir summary: 1057611 ... And for this patch, that is quite literally it for now. Thoughts? Niall |
|
Re: [Valgrind-developers] Details about forthcoming
cachegrind/callgrind patch adding hardware costs
From: Josef W. <Jos...@gm...> - 2013-04-10 21:16:00
|
Am 10.04.2013 22:02, schrieb Niall Douglas: > The patch is fairly minimal. At the start of cachegrind/callgrind if the > cache configuration parameters were not supplied, it looks for a file called > 'cpucacheconfig.xml' if one was not supplied. If that is present, it loads > it. If it is not present, it calls I think this behaviour should be guarded by a command line flag: e.g. on x86, the timing tests should not run when no cache parameter file is present, as it will be supplied via CPUID. a forthcoming MIT licensed open source > BlackBerry library called GenericCPUCacheConfig which does some timing tests > to figure out min and max IPS, cache line size, cache configurations etc. > which it outputs into cpucacheconfig.xml. If the architecture specific CPUID > routines return anything useful, they override any output from > GenericCPUCacheConfig. The only difference thereafter with this patch is the > following additional information appears in the preambles of cachegrind and > callgrind outputs: > > Sample cachegrind output: > > desc: I1 cache: 32768 B, 64 B, 8-way associative, 153 picosec > desc: D1 cache: 32768 B, 64 B, 8-way associative, 153 picosec > desc: LL cache: 6291456 B, 64 B, 12-way associative, 1112 picosec > desc: main memory: 1427 picosec > desc: max instruct cost:302 picosec > desc: min instruct cost:101 picosec What are these times? It would be nice to be more descriptive here (or better in the manual). Putting all this information behind "desc:" lines is fine for the *_annotate scripts and KCachegrind, as this information is just forwarded to the user without trying to parse it. But this also means that tools can not rely on it never to change, or to have a specific format... Josef |