You can subscribe to this list here.
| 2002 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(1) |
Oct
(122) |
Nov
(152) |
Dec
(69) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2003 |
Jan
(6) |
Feb
(25) |
Mar
(73) |
Apr
(82) |
May
(24) |
Jun
(25) |
Jul
(10) |
Aug
(11) |
Sep
(10) |
Oct
(54) |
Nov
(203) |
Dec
(182) |
| 2004 |
Jan
(307) |
Feb
(305) |
Mar
(430) |
Apr
(312) |
May
(187) |
Jun
(342) |
Jul
(487) |
Aug
(637) |
Sep
(336) |
Oct
(373) |
Nov
(441) |
Dec
(210) |
| 2005 |
Jan
(385) |
Feb
(480) |
Mar
(636) |
Apr
(544) |
May
(679) |
Jun
(625) |
Jul
(810) |
Aug
(838) |
Sep
(634) |
Oct
(521) |
Nov
(965) |
Dec
(543) |
| 2006 |
Jan
(494) |
Feb
(431) |
Mar
(546) |
Apr
(411) |
May
(406) |
Jun
(322) |
Jul
(256) |
Aug
(401) |
Sep
(345) |
Oct
(542) |
Nov
(308) |
Dec
(481) |
| 2007 |
Jan
(427) |
Feb
(326) |
Mar
(367) |
Apr
(255) |
May
(244) |
Jun
(204) |
Jul
(223) |
Aug
(231) |
Sep
(354) |
Oct
(374) |
Nov
(497) |
Dec
(362) |
| 2008 |
Jan
(322) |
Feb
(482) |
Mar
(658) |
Apr
(422) |
May
(476) |
Jun
(396) |
Jul
(455) |
Aug
(267) |
Sep
(280) |
Oct
(253) |
Nov
(232) |
Dec
(304) |
| 2009 |
Jan
(486) |
Feb
(470) |
Mar
(458) |
Apr
(423) |
May
(696) |
Jun
(461) |
Jul
(551) |
Aug
(575) |
Sep
(134) |
Oct
(110) |
Nov
(157) |
Dec
(102) |
| 2010 |
Jan
(226) |
Feb
(86) |
Mar
(147) |
Apr
(117) |
May
(107) |
Jun
(203) |
Jul
(193) |
Aug
(238) |
Sep
(300) |
Oct
(246) |
Nov
(23) |
Dec
(75) |
| 2011 |
Jan
(133) |
Feb
(195) |
Mar
(315) |
Apr
(200) |
May
(267) |
Jun
(293) |
Jul
(353) |
Aug
(237) |
Sep
(278) |
Oct
(611) |
Nov
(274) |
Dec
(260) |
| 2012 |
Jan
(303) |
Feb
(391) |
Mar
(417) |
Apr
(441) |
May
(488) |
Jun
(655) |
Jul
(590) |
Aug
(610) |
Sep
(526) |
Oct
(478) |
Nov
(359) |
Dec
(372) |
| 2013 |
Jan
(467) |
Feb
(226) |
Mar
(391) |
Apr
(281) |
May
(299) |
Jun
(252) |
Jul
(311) |
Aug
(352) |
Sep
(481) |
Oct
(571) |
Nov
(222) |
Dec
(231) |
| 2014 |
Jan
(185) |
Feb
(329) |
Mar
(245) |
Apr
(238) |
May
(281) |
Jun
(399) |
Jul
(382) |
Aug
(500) |
Sep
(579) |
Oct
(435) |
Nov
(487) |
Dec
(256) |
| 2015 |
Jan
(338) |
Feb
(357) |
Mar
(330) |
Apr
(294) |
May
(191) |
Jun
(108) |
Jul
(142) |
Aug
(261) |
Sep
(190) |
Oct
(54) |
Nov
(83) |
Dec
(22) |
| 2016 |
Jan
(49) |
Feb
(89) |
Mar
(33) |
Apr
(50) |
May
(27) |
Jun
(34) |
Jul
(53) |
Aug
(53) |
Sep
(98) |
Oct
(206) |
Nov
(93) |
Dec
(53) |
| 2017 |
Jan
(65) |
Feb
(82) |
Mar
(102) |
Apr
(86) |
May
(187) |
Jun
(67) |
Jul
(23) |
Aug
(93) |
Sep
(65) |
Oct
(45) |
Nov
(35) |
Dec
(17) |
| 2018 |
Jan
(26) |
Feb
(35) |
Mar
(38) |
Apr
(32) |
May
(8) |
Jun
(43) |
Jul
(27) |
Aug
(30) |
Sep
(43) |
Oct
(42) |
Nov
(38) |
Dec
(67) |
| 2019 |
Jan
(32) |
Feb
(37) |
Mar
(53) |
Apr
(64) |
May
(49) |
Jun
(18) |
Jul
(14) |
Aug
(53) |
Sep
(25) |
Oct
(30) |
Nov
(49) |
Dec
(31) |
| 2020 |
Jan
(87) |
Feb
(45) |
Mar
(37) |
Apr
(51) |
May
(99) |
Jun
(36) |
Jul
(11) |
Aug
(14) |
Sep
(20) |
Oct
(24) |
Nov
(40) |
Dec
(23) |
| 2021 |
Jan
(14) |
Feb
(53) |
Mar
(85) |
Apr
(15) |
May
(19) |
Jun
(3) |
Jul
(14) |
Aug
(1) |
Sep
(57) |
Oct
(73) |
Nov
(56) |
Dec
(22) |
| 2022 |
Jan
(3) |
Feb
(22) |
Mar
(6) |
Apr
(55) |
May
(46) |
Jun
(39) |
Jul
(15) |
Aug
(9) |
Sep
(11) |
Oct
(34) |
Nov
(20) |
Dec
(36) |
| 2023 |
Jan
(79) |
Feb
(41) |
Mar
(99) |
Apr
(169) |
May
(48) |
Jun
(16) |
Jul
(16) |
Aug
(57) |
Sep
(19) |
Oct
|
Nov
|
Dec
|
| S | M | T | W | T | F | S |
|---|---|---|---|---|---|---|
|
|
1
(15) |
2
(15) |
3
(16) |
4
(16) |
5
(19) |
6
(15) |
|
7
(1) |
8
(4) |
9
|
10
(4) |
11
(14) |
12
(5) |
13
|
|
14
(1) |
15
|
16
|
17
(12) |
18
(25) |
19
(18) |
20
(18) |
|
21
(16) |
22
(1) |
23
(18) |
24
(15) |
25
|
26
(3) |
27
(18) |
|
28
(8) |
29
|
30
(4) |
|
|
|
|
|
From: Josef W. <Jos...@gm...> - 2013-04-10 21:16:00
|
Am 10.04.2013 22:02, schrieb Niall Douglas: > The patch is fairly minimal. At the start of cachegrind/callgrind if the > cache configuration parameters were not supplied, it looks for a file called > 'cpucacheconfig.xml' if one was not supplied. If that is present, it loads > it. If it is not present, it calls I think this behaviour should be guarded by a command line flag: e.g. on x86, the timing tests should not run when no cache parameter file is present, as it will be supplied via CPUID. a forthcoming MIT licensed open source > BlackBerry library called GenericCPUCacheConfig which does some timing tests > to figure out min and max IPS, cache line size, cache configurations etc. > which it outputs into cpucacheconfig.xml. If the architecture specific CPUID > routines return anything useful, they override any output from > GenericCPUCacheConfig. The only difference thereafter with this patch is the > following additional information appears in the preambles of cachegrind and > callgrind outputs: > > Sample cachegrind output: > > desc: I1 cache: 32768 B, 64 B, 8-way associative, 153 picosec > desc: D1 cache: 32768 B, 64 B, 8-way associative, 153 picosec > desc: LL cache: 6291456 B, 64 B, 12-way associative, 1112 picosec > desc: main memory: 1427 picosec > desc: max instruct cost:302 picosec > desc: min instruct cost:101 picosec What are these times? It would be nice to be more descriptive here (or better in the manual). Putting all this information behind "desc:" lines is fine for the *_annotate scripts and KCachegrind, as this information is just forwarded to the user without trying to parse it. But this also means that tools can not rely on it never to change, or to have a specific format... Josef |
|
From: Niall D. <ndo...@bl...> - 2013-04-10 20:15:58
|
> > That's exactly the intent. Cachegrind/Callgrind output would simply > > include the host's cache and memory latencies prepended as comments; > > if in XML form, it appears as an additional XML stanza. That way it > > doesn't break any tooling which relies on output to not change. > > Cachegrind/Callgrind do not support XML output for profile data (at least for > now). But it should be quite easy to define a sensible XML format for the data > (not saying that we want that - I do not see any benefit). I've since realized that, so I different approach to modifying the output (see other post for samples). > Both the *_annotate scripts and KCachegrind support arbitrarily named event > types. And as you can see at section 3.1.7.2 on > > http://valgrind.org/docs/manual/cl-format.html > > the callgrind format allows to specify formulas for derived event types. > I think this is everything you need: Add a line with e.g. > > "event: CycleEst = Ir + 100 DLmr + 100 DLmw" > > where 100 is your cycle penalty for LL cache misses. > Only {K,Q}Cachegrind currently support such lines, but it should not be too > complex to add that to the *_annotate scripts. That's *very* useful. You just saved me a raft of code. Thank you. > Actually, also for Intel, main memory accesses are slow, similar to the numbers > you quote for the mobile chips. > > However, miss penalties are quite different for random and stream access, > where hardware prefetchers kick in. GenericCPUConfigDetect mutates the cache line fetch using a bit reverser of the last byte indexing cache lines to try and "randomly" straddle 4Kb pages in order to try and persuade Intel's prefetchers to stop being so damn clever. So the results I quoted earlier are a supposedly a worst case latency. But I completely agree, under the bonnet PC memory is no different to mobile memory, albeit usually clocked a bit faster. Intel's prefetchers are really impressive. I just bought a Cortex A15 Chromebook, and I look forward to finding out how its much improved prefetchers perform. > >> Why does this microbenchmark measurement have to be part of the tool? > > > right (mea culpa, I should have read the docs more closely). I have no > > issue with having cachegrind/callgrind refuse to run without a known > > good cache config BTW, but that does seem a bit overkill. > > Not so sure about that. You said yourself that you wasted much time. Right now I have it printing a warning saying one should really examine cpucacheconfig.xml for accuracy. But maybe you're right and it should print its configured cache parameters every program run. That way I wouldn't have lost a week. > Micro-benchmarks often are very sensitive on what else is going on in the > systems. If the system is loaded, results may be way off. > It seems better to run that benchmark at a time controlled by the user. Agreed. As I mentioned in my other post, it loads from a supplied file where possible. I would expect it to be rare that people don't supply their own file, except on Intel where CPUID is useful in user mode code. Niall |
|
From: Niall D. <ndo...@bl...> - 2013-04-10 20:02:47
|
I've just finished the port of my overall patch to Linux valgrind, so please find attached a patch implementing VG_(read_nanosecond_timer). It looks like your bug tracker is for bugs and not feature patches, so please do mention if you want this patch to go elsewhere. I'll detail the overall patch to cachegrind in a separate post. Niall > -----Original Message----- > From: Julian Seward [mailto:js...@ac...] > Sent: 05 April 2013 11:45 > To: Niall Douglas > Cc: val...@li... > Subject: Re: [Valgrind-developers] Any objection if I add > VG_(read_nanosecond_timer) as well as VG_(read_millisecond_timer)? > > > > The problem, at present, is that VG_(read_millisecond_timer) is the > > only timing routine I can see. The generic cache configuration > > detection routine is far more accurate if it is given microsecond or better > accurate timing. > > So would it be okay if I add VG_(read_nanosecond_timer) returning a ULong? > > How do you plan to implement VG_(read_nanosecond_timer) ? > > J |
|
From: Niall D. <ndo...@bl...> - 2013-04-10 20:02:31
|
Dear Valgrind Devs, I've finished porting the patch I mentioned earlier to Linux valgrind, so I now just have the unit tests to fix up and I can start the process of getting Legal to authorize release to you guys for inclusion into trunk. Given that Legal will take a month or more as it's GPL, can I quickly detail what I've done so you can tell me now what to change? Please bear in mind that changes after patch release require another round of Legal and therefore another month or more, so getting it right first time would be really great. The patch is fairly minimal. At the start of cachegrind/callgrind if the cache configuration parameters were not supplied, it looks for a file called 'cpucacheconfig.xml' if one was not supplied. If that is present, it loads it. If it is not present, it calls a forthcoming MIT licensed open source BlackBerry library called GenericCPUCacheConfig which does some timing tests to figure out min and max IPS, cache line size, cache configurations etc. which it outputs into cpucacheconfig.xml. If the architecture specific CPUID routines return anything useful, they override any output from GenericCPUCacheConfig. The only difference thereafter with this patch is the following additional information appears in the preambles of cachegrind and callgrind outputs: Sample cachegrind output: desc: I1 cache: 32768 B, 64 B, 8-way associative, 153 picosec desc: D1 cache: 32768 B, 64 B, 8-way associative, 153 picosec desc: LL cache: 6291456 B, 64 B, 12-way associative, 1112 picosec desc: main memory: 1427 picosec desc: max instruct cost:302 picosec desc: min instruct cost:101 picosec cmd: ls events: Ir I1mr ILmr Dr D1mr DLmr Dw D1mw DLmw ... Sample callgrind output: version: 1 creator: callgrind-3.8.1 pid: 39879 cmd: ls part: 1 desc: I1 cache: 32768 B, 64 B, 8-way associative, 153 picosec desc: D1 cache: 32768 B, 64 B, 8-way associative, 153 picosec desc: LL cache: 6291456 B, 64 B, 12-way associative, 1112 picosec desc: main memory: 1427 picosec desc: max instruct cost:302 picosec desc: min instruct cost:101 picosec desc: Timerange: Basic block 0 - 233441 desc: Trigger: Program termination positions: line events: Ir summary: 1057611 ... And for this patch, that is quite literally it for now. Thoughts? Niall |