You can subscribe to this list here.
| 2002 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(1) |
Oct
(122) |
Nov
(152) |
Dec
(69) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2003 |
Jan
(6) |
Feb
(25) |
Mar
(73) |
Apr
(82) |
May
(24) |
Jun
(25) |
Jul
(10) |
Aug
(11) |
Sep
(10) |
Oct
(54) |
Nov
(203) |
Dec
(182) |
| 2004 |
Jan
(307) |
Feb
(305) |
Mar
(430) |
Apr
(312) |
May
(187) |
Jun
(342) |
Jul
(487) |
Aug
(637) |
Sep
(336) |
Oct
(373) |
Nov
(441) |
Dec
(210) |
| 2005 |
Jan
(385) |
Feb
(480) |
Mar
(636) |
Apr
(544) |
May
(679) |
Jun
(625) |
Jul
(810) |
Aug
(838) |
Sep
(634) |
Oct
(521) |
Nov
(965) |
Dec
(543) |
| 2006 |
Jan
(494) |
Feb
(431) |
Mar
(546) |
Apr
(411) |
May
(406) |
Jun
(322) |
Jul
(256) |
Aug
(401) |
Sep
(345) |
Oct
(542) |
Nov
(308) |
Dec
(481) |
| 2007 |
Jan
(427) |
Feb
(326) |
Mar
(367) |
Apr
(255) |
May
(244) |
Jun
(204) |
Jul
(223) |
Aug
(231) |
Sep
(354) |
Oct
(374) |
Nov
(497) |
Dec
(362) |
| 2008 |
Jan
(322) |
Feb
(482) |
Mar
(658) |
Apr
(422) |
May
(476) |
Jun
(396) |
Jul
(455) |
Aug
(267) |
Sep
(280) |
Oct
(253) |
Nov
(232) |
Dec
(304) |
| 2009 |
Jan
(486) |
Feb
(470) |
Mar
(458) |
Apr
(423) |
May
(696) |
Jun
(461) |
Jul
(551) |
Aug
(575) |
Sep
(134) |
Oct
(110) |
Nov
(157) |
Dec
(102) |
| 2010 |
Jan
(226) |
Feb
(86) |
Mar
(147) |
Apr
(117) |
May
(107) |
Jun
(203) |
Jul
(193) |
Aug
(238) |
Sep
(300) |
Oct
(246) |
Nov
(23) |
Dec
(75) |
| 2011 |
Jan
(133) |
Feb
(195) |
Mar
(315) |
Apr
(200) |
May
(267) |
Jun
(293) |
Jul
(353) |
Aug
(237) |
Sep
(278) |
Oct
(611) |
Nov
(274) |
Dec
(260) |
| 2012 |
Jan
(303) |
Feb
(391) |
Mar
(417) |
Apr
(441) |
May
(488) |
Jun
(655) |
Jul
(590) |
Aug
(610) |
Sep
(526) |
Oct
(478) |
Nov
(359) |
Dec
(372) |
| 2013 |
Jan
(467) |
Feb
(226) |
Mar
(391) |
Apr
(281) |
May
(299) |
Jun
(252) |
Jul
(311) |
Aug
(352) |
Sep
(481) |
Oct
(571) |
Nov
(222) |
Dec
(231) |
| 2014 |
Jan
(185) |
Feb
(329) |
Mar
(245) |
Apr
(238) |
May
(281) |
Jun
(399) |
Jul
(382) |
Aug
(500) |
Sep
(579) |
Oct
(435) |
Nov
(487) |
Dec
(256) |
| 2015 |
Jan
(338) |
Feb
(357) |
Mar
(330) |
Apr
(294) |
May
(191) |
Jun
(108) |
Jul
(142) |
Aug
(261) |
Sep
(190) |
Oct
(54) |
Nov
(83) |
Dec
(22) |
| 2016 |
Jan
(49) |
Feb
(89) |
Mar
(33) |
Apr
(50) |
May
(27) |
Jun
(34) |
Jul
(53) |
Aug
(53) |
Sep
(98) |
Oct
(206) |
Nov
(93) |
Dec
(53) |
| 2017 |
Jan
(65) |
Feb
(82) |
Mar
(102) |
Apr
(86) |
May
(187) |
Jun
(67) |
Jul
(23) |
Aug
(93) |
Sep
(65) |
Oct
(45) |
Nov
(35) |
Dec
(17) |
| 2018 |
Jan
(26) |
Feb
(35) |
Mar
(38) |
Apr
(32) |
May
(8) |
Jun
(43) |
Jul
(27) |
Aug
(30) |
Sep
(43) |
Oct
(42) |
Nov
(38) |
Dec
(67) |
| 2019 |
Jan
(32) |
Feb
(37) |
Mar
(53) |
Apr
(64) |
May
(49) |
Jun
(18) |
Jul
(14) |
Aug
(53) |
Sep
(25) |
Oct
(30) |
Nov
(49) |
Dec
(31) |
| 2020 |
Jan
(87) |
Feb
(45) |
Mar
(37) |
Apr
(51) |
May
(99) |
Jun
(36) |
Jul
(11) |
Aug
(14) |
Sep
(20) |
Oct
(24) |
Nov
(40) |
Dec
(23) |
| 2021 |
Jan
(14) |
Feb
(53) |
Mar
(85) |
Apr
(15) |
May
(19) |
Jun
(3) |
Jul
(14) |
Aug
(1) |
Sep
(57) |
Oct
(73) |
Nov
(56) |
Dec
(22) |
| 2022 |
Jan
(3) |
Feb
(22) |
Mar
(6) |
Apr
(55) |
May
(46) |
Jun
(39) |
Jul
(15) |
Aug
(9) |
Sep
(11) |
Oct
(34) |
Nov
(20) |
Dec
(36) |
| 2023 |
Jan
(79) |
Feb
(41) |
Mar
(99) |
Apr
(169) |
May
(48) |
Jun
(16) |
Jul
(16) |
Aug
(57) |
Sep
(19) |
Oct
|
Nov
|
Dec
|
| S | M | T | W | T | F | S |
|---|---|---|---|---|---|---|
|
|
|
|
|
|
1
(7) |
2
|
|
3
(1) |
4
(16) |
5
(6) |
6
(17) |
7
(18) |
8
(13) |
9
(7) |
|
10
(7) |
11
(15) |
12
(14) |
13
(10) |
14
(9) |
15
(17) |
16
(6) |
|
17
|
18
(18) |
19
(7) |
20
(20) |
21
(10) |
22
(8) |
23
(5) |
|
24
(2) |
25
(2) |
26
(10) |
27
|
28
|
29
|
30
|
|
31
(1) |
|
|
|
|
|
|
|
From: <sv...@va...> - 2010-10-06 22:46:44
|
Author: njn
Date: 2010-10-06 23:46:31 +0100 (Wed, 06 Oct 2010)
New Revision: 11404
Log:
Change Cachegrind/Callgrind to talk about the LL (last-level) cache instead
of the L2 cache. This is to accommodate machines with three levels of
cache. We still only simulate two levels, the first and the last.
Modified:
trunk/NEWS
trunk/cachegrind/cg-arm.c
trunk/cachegrind/cg-ppc32.c
trunk/cachegrind/cg-ppc64.c
trunk/cachegrind/cg-x86-amd64.c
trunk/cachegrind/cg_arch.h
trunk/cachegrind/cg_main.c
trunk/cachegrind/cg_sim.c
trunk/cachegrind/docs/cg-manual.xml
trunk/cachegrind/tests/chdir.stderr.exp
trunk/cachegrind/tests/dlclose.stderr.exp
trunk/cachegrind/tests/filter_stderr
trunk/cachegrind/tests/notpower2.stderr.exp
trunk/cachegrind/tests/notpower2.vgtest
trunk/cachegrind/tests/wrap5.stderr.exp
trunk/cachegrind/tests/x86/fpu-28-108.stderr.exp
trunk/callgrind/docs/cl-format.xml
trunk/callgrind/docs/cl-manual.xml
trunk/callgrind/sim.c
trunk/callgrind/tests/filter_stderr
trunk/callgrind/tests/notpower2-hwpref.stderr.exp
trunk/callgrind/tests/notpower2-hwpref.vgtest
trunk/callgrind/tests/notpower2-use.stderr.exp
trunk/callgrind/tests/notpower2-use.vgtest
trunk/callgrind/tests/notpower2-wb.stderr.exp
trunk/callgrind/tests/notpower2-wb.vgtest
trunk/callgrind/tests/notpower2.stderr.exp
trunk/callgrind/tests/notpower2.vgtest
trunk/callgrind/tests/simwork-both.stderr.exp
trunk/callgrind/tests/simwork-cache.stderr.exp
trunk/callgrind/tests/simwork1.stderr.exp
trunk/callgrind/tests/simwork2.stderr.exp
trunk/callgrind/tests/simwork3.stderr.exp
trunk/callgrind/tests/threads-use.stderr.exp
Modified: trunk/NEWS
===================================================================
--- trunk/NEWS 2010-10-06 22:45:18 UTC (rev 11403)
+++ trunk/NEWS 2010-10-06 22:46:31 UTC (rev 11404)
@@ -16,6 +16,20 @@
--threshold option has changed; this is unlikely to affect many people, if
you do use it please see the user manual for details.
+- Callgrind now can do branch prediction simulation, similar to Cachegrind.
+ In addition, it optionally can count the number of executed global bus events.
+ Both can be used for a better approximation of a "Cycle Estimation" as
+ derived event (you need to update the event formula in KCachegrind yourself).
+
+- Cachegrind and Callgrind now refer to the LL (last-level) cache rather
+ than the L2 cache. This is to accommodate machines with three levels of
+ caches -- if Cachegrind/Callgrind auto-detects the cache configuration of
+ such a machine it will run the simulation as if the L2 cache isn't
+ present. This means the results are less likely to match the true result
+ for the machine, but Cachegrind/Callgrind's results are already only
+ approximate, and should not be considered authoritative. The results are
+ still useful for giving a general idea about a program's locality.
+
- Massif has a new option, --pages-as-heap, which is disabled by default.
When enabled, instead of tracking allocations at the level of heap blocks
(as allocated with malloc/new/new[]), it instead tracks memory allocations
@@ -24,11 +38,6 @@
harder than the heap-level output, but this option is useful if you want
to account for every byte of memory used by a program.
-- Callgrind now can do branch prediction simulation, similar to Cachegrind.
- In addition, it optionally can count the number of executed global bus events.
- Both can be used for a better approximation of a "Cycle Estimation" as
- derived event (you need to update the event formula in KCachegrind yourself).
-
- Added new memcheck command-line option --show-possibly-lost.
Modified: trunk/cachegrind/cg-arm.c
===================================================================
--- trunk/cachegrind/cg-arm.c 2010-10-06 22:45:18 UTC (rev 11403)
+++ trunk/cachegrind/cg-arm.c 2010-10-06 22:46:31 UTC (rev 11404)
@@ -37,13 +37,13 @@
#include "cg_arch.h"
-void VG_(configure_caches)(cache_t* I1c, cache_t* D1c, cache_t* L2c,
+void VG_(configure_caches)(cache_t* I1c, cache_t* D1c, cache_t* LLc,
Bool all_caches_clo_defined)
{
// Set caches to default (for Cortex-A8 ?)
*I1c = (cache_t) { 16384, 4, 64 };
*D1c = (cache_t) { 16384, 4, 64 };
- *L2c = (cache_t) { 262144, 8, 64 };
+ *LLc = (cache_t) { 262144, 8, 64 };
if (!all_caches_clo_defined) {
VG_(message)(Vg_DebugMsg,
Modified: trunk/cachegrind/cg-ppc32.c
===================================================================
--- trunk/cachegrind/cg-ppc32.c 2010-10-06 22:45:18 UTC (rev 11403)
+++ trunk/cachegrind/cg-ppc32.c 2010-10-06 22:46:31 UTC (rev 11404)
@@ -37,13 +37,13 @@
#include "cg_arch.h"
-void VG_(configure_caches)(cache_t* I1c, cache_t* D1c, cache_t* L2c,
+void VG_(configure_caches)(cache_t* I1c, cache_t* D1c, cache_t* LLc,
Bool all_caches_clo_defined)
{
// Set caches to default.
*I1c = (cache_t) { 65536, 2, 64 };
*D1c = (cache_t) { 65536, 2, 64 };
- *L2c = (cache_t) { 262144, 8, 64 };
+ *LLc = (cache_t) { 262144, 8, 64 };
// Warn if config not completely specified from cmd line. Note that
// this message is slightly different from the one we give on x86/AMD64
Modified: trunk/cachegrind/cg-ppc64.c
===================================================================
--- trunk/cachegrind/cg-ppc64.c 2010-10-06 22:45:18 UTC (rev 11403)
+++ trunk/cachegrind/cg-ppc64.c 2010-10-06 22:46:31 UTC (rev 11404)
@@ -37,13 +37,13 @@
#include "cg_arch.h"
-void VG_(configure_caches)(cache_t* I1c, cache_t* D1c, cache_t* L2c,
+void VG_(configure_caches)(cache_t* I1c, cache_t* D1c, cache_t* LLc,
Bool all_caches_clo_defined)
{
// Set caches to default.
*I1c = (cache_t) { 65536, 2, 64 };
*D1c = (cache_t) { 65536, 2, 64 };
- *L2c = (cache_t) { 262144, 8, 64 };
+ *LLc = (cache_t) { 262144, 8, 64 };
// Warn if config not completely specified from cmd line. Note that
// this message is slightly different from the one we give on x86/AMD64
Modified: trunk/cachegrind/cg-x86-amd64.c
===================================================================
--- trunk/cachegrind/cg-x86-amd64.c 2010-10-06 22:45:18 UTC (rev 11403)
+++ trunk/cachegrind/cg-x86-amd64.c 2010-10-06 22:46:31 UTC (rev 11404)
@@ -54,9 +54,12 @@
* array of pre-defined configurations for various parts of the memory
* hierarchy.
* According to Intel Processor Identification, App Note 485.
+ *
+ * If a L3 cache is found, then data for it rather than the L2
+ * is returned via *LLc.
*/
static
-Int Intel_cache_info(Int level, cache_t* I1c, cache_t* D1c, cache_t* L2c)
+Int Intel_cache_info(Int level, cache_t* I1c, cache_t* D1c, cache_t* LLc)
{
Int cpuid1_eax;
Int cpuid1_ignore;
@@ -65,6 +68,14 @@
UChar info[16];
Int i, trials;
Bool L2_found = False;
+ /* If we see L3 cache info, copy it into L3c. Then, at the end,
+ copy it into *LLc. Hence if a L3 cache is specified, *LLc will
+ eventually contain a description of it rather than the L2 cache.
+ The use of the L3c intermediary makes this process independent
+ of the order in which the cache specifications appear in
+ info[]. */
+ Bool L3_found = False;
+ cache_t L3c = { 0, 0, 0 };
if (level < 2) {
VG_(dmsg)("warning: CPUID level < 2 for Intel processor (%d)\n", level);
@@ -121,18 +132,39 @@
case 0x90: case 0x96: case 0x9b:
VG_(tool_panic)("IA-64 cache detected?!");
- case 0x22: case 0x23: case 0x25: case 0x29:
- case 0x46: case 0x47: case 0x4a: case 0x4b: case 0x4c: case 0x4d:
- case 0xe2: case 0xe3: case 0xe4: case 0xea: case 0xeb: case 0xec:
- VG_(dmsg)("warning: L3 cache detected but ignored\n");
- break;
+ /* L3 cache info. */
+ case 0x22: L3c = (cache_t) { 512, 4, 64 }; L3_found = True; break;
+ case 0x23: L3c = (cache_t) { 1024, 8, 64 }; L3_found = True; break;
+ case 0x25: L3c = (cache_t) { 2048, 8, 64 }; L3_found = True; break;
+ case 0x29: L3c = (cache_t) { 4096, 8, 64 }; L3_found = True; break;
+ case 0x46: L3c = (cache_t) { 4096, 4, 64 }; L3_found = True; break;
+ case 0x47: L3c = (cache_t) { 8192, 8, 64 }; L3_found = True; break;
+ case 0x4a: L3c = (cache_t) { 6144, 12, 64 }; L3_found = True; break;
+ case 0x4b: L3c = (cache_t) { 8192, 16, 64 }; L3_found = True; break;
+ case 0x4c: L3c = (cache_t) { 12288, 12, 64 }; L3_found = True; break;
+ case 0x4d: L3c = (cache_t) { 16384, 16, 64 }; L3_found = True; break;
+ case 0xd0: L3c = (cache_t) { 512, 4, 64 }; L3_found = True; break;
+ case 0xd1: L3c = (cache_t) { 1024, 4, 64 }; L3_found = True; break;
+ case 0xd2: L3c = (cache_t) { 2048, 4, 64 }; L3_found = True; break;
+ case 0xd6: L3c = (cache_t) { 1024, 8, 64 }; L3_found = True; break;
+ case 0xd7: L3c = (cache_t) { 2048, 8, 64 }; L3_found = True; break;
+ case 0xd8: L3c = (cache_t) { 4096, 8, 64 }; L3_found = True; break;
+ case 0xdc: L3c = (cache_t) { 1536, 12, 64 }; L3_found = True; break;
+ case 0xdd: L3c = (cache_t) { 3072, 12, 64 }; L3_found = True; break;
+ case 0xde: L3c = (cache_t) { 6144, 12, 64 }; L3_found = True; break;
+ case 0xe2: L3c = (cache_t) { 2048, 16, 64 }; L3_found = True; break;
+ case 0xe3: L3c = (cache_t) { 4096, 16, 64 }; L3_found = True; break;
+ case 0xe4: L3c = (cache_t) { 8192, 16, 64 }; L3_found = True; break;
+ case 0xea: L3c = (cache_t) { 12288, 24, 64 }; L3_found = True; break;
+ case 0xeb: L3c = (cache_t) { 18432, 24, 64 }; L3_found = True; break;
+ case 0xec: L3c = (cache_t) { 24576, 24, 64 }; L3_found = True; break;
/* Described as "MLC" in Intel documentation */
- case 0x21: *L2c = (cache_t) { 256, 8, 64 }; L2_found = True; break;
+ case 0x21: *LLc = (cache_t) { 256, 8, 64 }; L2_found = True; break;
/* These are sectored, whatever that means */
- case 0x39: *L2c = (cache_t) { 128, 4, 64 }; L2_found = True; break;
- case 0x3c: *L2c = (cache_t) { 256, 4, 64 }; L2_found = True; break;
+ case 0x39: *LLc = (cache_t) { 128, 4, 64 }; L2_found = True; break;
+ case 0x3c: *LLc = (cache_t) { 256, 4, 64 }; L2_found = True; break;
/* If a P6 core, this means "no L2 cache".
If a P4 core, this means "no L3 cache".
@@ -141,20 +173,21 @@
case 0x40:
break;
- case 0x41: *L2c = (cache_t) { 128, 4, 32 }; L2_found = True; break;
- case 0x42: *L2c = (cache_t) { 256, 4, 32 }; L2_found = True; break;
- case 0x43: *L2c = (cache_t) { 512, 4, 32 }; L2_found = True; break;
- case 0x44: *L2c = (cache_t) { 1024, 4, 32 }; L2_found = True; break;
- case 0x45: *L2c = (cache_t) { 2048, 4, 32 }; L2_found = True; break;
- case 0x48: *L2c = (cache_t) { 3072,12, 64 }; L2_found = True; break;
+ case 0x41: *LLc = (cache_t) { 128, 4, 32 }; L2_found = True; break;
+ case 0x42: *LLc = (cache_t) { 256, 4, 32 }; L2_found = True; break;
+ case 0x43: *LLc = (cache_t) { 512, 4, 32 }; L2_found = True; break;
+ case 0x44: *LLc = (cache_t) { 1024, 4, 32 }; L2_found = True; break;
+ case 0x45: *LLc = (cache_t) { 2048, 4, 32 }; L2_found = True; break;
+ case 0x48: *LLc = (cache_t) { 3072, 12, 64 }; L2_found = True; break;
+ case 0x4e: *LLc = (cache_t) { 6144, 24, 64 }; L2_found = True; break;
case 0x49:
- if ((family == 15) && (model == 6))
- /* On Xeon MP (family F, model 6), this is for L3 */
- VG_(dmsg)("warning: L3 cache detected but ignored\n");
- else
- *L2c = (cache_t) { 4096, 16, 64 }; L2_found = True;
- break;
- case 0x4e: *L2c = (cache_t) { 6144, 24, 64 }; L2_found = True; break;
+ if (family == 15 && model == 6) {
+ /* On Xeon MP (family F, model 6), this is for L3 */
+ L3c = (cache_t) { 4096, 16, 64 }; L3_found = True;
+ } else {
+ *LLc = (cache_t) { 4096, 16, 64 }; L2_found = True;
+ }
+ break;
/* These are sectored, whatever that means */
case 0x60: *D1c = (cache_t) { 16, 8, 64 }; break; /* sectored */
@@ -181,27 +214,25 @@
break;
/* not sectored, whatever that might mean */
- case 0x78: *L2c = (cache_t) { 1024, 4, 64 }; L2_found = True; break;
+ case 0x78: *LLc = (cache_t) { 1024, 4, 64 }; L2_found = True; break;
/* These are sectored, whatever that means */
- case 0x79: *L2c = (cache_t) { 128, 8, 64 }; L2_found = True; break;
- case 0x7a: *L2c = (cache_t) { 256, 8, 64 }; L2_found = True; break;
- case 0x7b: *L2c = (cache_t) { 512, 8, 64 }; L2_found = True; break;
- case 0x7c: *L2c = (cache_t) { 1024, 8, 64 }; L2_found = True; break;
- case 0x7d: *L2c = (cache_t) { 2048, 8, 64 }; L2_found = True; break;
- case 0x7e: *L2c = (cache_t) { 256, 8, 128 }; L2_found = True; break;
+ case 0x79: *LLc = (cache_t) { 128, 8, 64 }; L2_found = True; break;
+ case 0x7a: *LLc = (cache_t) { 256, 8, 64 }; L2_found = True; break;
+ case 0x7b: *LLc = (cache_t) { 512, 8, 64 }; L2_found = True; break;
+ case 0x7c: *LLc = (cache_t) { 1024, 8, 64 }; L2_found = True; break;
+ case 0x7d: *LLc = (cache_t) { 2048, 8, 64 }; L2_found = True; break;
+ case 0x7e: *LLc = (cache_t) { 256, 8, 128 }; L2_found = True; break;
+ case 0x7f: *LLc = (cache_t) { 512, 2, 64 }; L2_found = True; break;
+ case 0x80: *LLc = (cache_t) { 512, 8, 64 }; L2_found = True; break;
+ case 0x81: *LLc = (cache_t) { 128, 8, 32 }; L2_found = True; break;
+ case 0x82: *LLc = (cache_t) { 256, 8, 32 }; L2_found = True; break;
+ case 0x83: *LLc = (cache_t) { 512, 8, 32 }; L2_found = True; break;
+ case 0x84: *LLc = (cache_t) { 1024, 8, 32 }; L2_found = True; break;
+ case 0x85: *LLc = (cache_t) { 2048, 8, 32 }; L2_found = True; break;
+ case 0x86: *LLc = (cache_t) { 512, 4, 64 }; L2_found = True; break;
+ case 0x87: *LLc = (cache_t) { 1024, 8, 64 }; L2_found = True; break;
- case 0x7f: *L2c = (cache_t) { 512, 2, 64 }; L2_found = True; break;
- case 0x80: *L2c = (cache_t) { 512, 8, 64 }; L2_found = True; break;
-
- case 0x81: *L2c = (cache_t) { 128, 8, 32 }; L2_found = True; break;
- case 0x82: *L2c = (cache_t) { 256, 8, 32 }; L2_found = True; break;
- case 0x83: *L2c = (cache_t) { 512, 8, 32 }; L2_found = True; break;
- case 0x84: *L2c = (cache_t) { 1024, 8, 32 }; L2_found = True; break;
- case 0x85: *L2c = (cache_t) { 2048, 8, 32 }; L2_found = True; break;
- case 0x86: *L2c = (cache_t) { 512, 4, 64 }; L2_found = True; break;
- case 0x87: *L2c = (cache_t) { 1024, 8, 64 }; L2_found = True; break;
-
/* Ignore prefetch information */
case 0xf0: case 0xf1:
break;
@@ -213,8 +244,15 @@
}
}
+ /* If we found a L3 cache, throw away the L2 data and use the L3's instead. */
+ if (L3_found) {
+ VG_(dmsg)("warning: L3 cache found, using its data for the LL simulation.\n");
+ *LLc = L3c;
+ L2_found = True;
+ }
+
if (!L2_found)
- VG_(dmsg)("warning: L2 cache not installed, ignore L2 results.\n");
+ VG_(dmsg)("warning: L2 cache not installed, ignore LL results.\n");
return 0;
}
@@ -241,14 +279,37 @@
* 0x630) have a bug and misreport their L2 size as 1KB (it's really 64KB),
* so we detect that.
*
- * Returns 0 on success, non-zero on failure.
+ * Returns 0 on success, non-zero on failure. As with the Intel code
+ * above, if a L3 cache is found, then data for it rather than the L2
+ * is returned via *LLc.
*/
+
+/* A small helper */
+static Int decode_AMD_cache_L2_L3_assoc ( Int bits_15_12 )
+{
+ /* Decode a L2/L3 associativity indication. It is encoded
+ differently from the I1/D1 associativity. Returns 1
+ (direct-map) as a safe but suboptimal result for unknown
+ encodings. */
+ switch (bits_15_12 & 0xF) {
+ case 1: return 1; case 2: return 2;
+ case 4: return 4; case 6: return 8;
+ case 8: return 16; case 0xA: return 32;
+ case 0xB: return 48; case 0xC: return 64;
+ case 0xD: return 96; case 0xE: return 128;
+ case 0xF: /* fully associative */
+ case 0: /* L2/L3 cache or TLB is disabled */
+ default:
+ return 1;
+ }
+}
+
static
-Int AMD_cache_info(cache_t* I1c, cache_t* D1c, cache_t* L2c)
+Int AMD_cache_info(cache_t* I1c, cache_t* D1c, cache_t* LLc)
{
UInt ext_level;
UInt dummy, model;
- UInt I1i, D1i, L2i;
+ UInt I1i, D1i, L2i, L3i;
VG_(cpuid)(0x80000000, &ext_level, &dummy, &dummy, &dummy);
@@ -259,7 +320,7 @@
}
VG_(cpuid)(0x80000005, &dummy, &dummy, &D1i, &I1i);
- VG_(cpuid)(0x80000006, &dummy, &dummy, &L2i, &dummy);
+ VG_(cpuid)(0x80000006, &dummy, &dummy, &L2i, &L3i);
VG_(cpuid)(0x1, &model, &dummy, &dummy, &dummy);
@@ -277,15 +338,26 @@
I1c->assoc = (I1i >> 16) & 0xff;
I1c->line_size = (I1i >> 0) & 0xff;
- L2c->size = (L2i >> 16) & 0xffff; /* Nb: different bits used for L2 */
- L2c->assoc = (L2i >> 12) & 0xf;
- L2c->line_size = (L2i >> 0) & 0xff;
+ LLc->size = (L2i >> 16) & 0xffff; /* Nb: different bits used for L2 */
+ LLc->assoc = decode_AMD_cache_L2_L3_assoc((L2i >> 12) & 0xf);
+ LLc->line_size = (L2i >> 0) & 0xff;
+ if (((L3i >> 18) & 0x3fff) > 0) {
+ /* There's an L3 cache. Replace *LLc contents with this info. */
+ /* NB: the test in the if is "if L3 size > 0 ". I don't know if
+ this is the right way to test presence-vs-absence of L3. I
+ can't see any guidance on this in the AMD documentation. */
+ LLc->size = ((L3i >> 18) & 0x3fff) * 512;
+ LLc->assoc = decode_AMD_cache_L2_L3_assoc((L3i >> 12) & 0xf);
+ LLc->line_size = (L3i >> 0) & 0xff;
+ VG_(dmsg)("warning: L3 cache found, using its data for the L2 simulation.\n");
+ }
+
return 0;
}
static
-Int get_caches_from_CPUID(cache_t* I1c, cache_t* D1c, cache_t* L2c)
+Int get_caches_from_CPUID(cache_t* I1c, cache_t* D1c, cache_t* LLc)
{
Int level, ret;
Char vendor_id[13];
@@ -306,10 +378,10 @@
/* Only handling Intel and AMD chips... no Cyrix, Transmeta, etc */
if (0 == VG_(strcmp)(vendor_id, "GenuineIntel")) {
- ret = Intel_cache_info(level, I1c, D1c, L2c);
+ ret = Intel_cache_info(level, I1c, D1c, LLc);
} else if (0 == VG_(strcmp)(vendor_id, "AuthenticAMD")) {
- ret = AMD_cache_info(I1c, D1c, L2c);
+ ret = AMD_cache_info(I1c, D1c, LLc);
} else if (0 == VG_(strcmp)(vendor_id, "CentaurHauls")) {
/* Total kludge. Pretend to be a VIA Nehemiah. */
@@ -319,9 +391,9 @@
I1c->size = 64;
I1c->assoc = 4;
I1c->line_size = 16;
- L2c->size = 64;
- L2c->assoc = 16;
- L2c->line_size = 16;
+ LLc->size = 64;
+ LLc->assoc = 16;
+ LLc->line_size = 16;
ret = 0;
} else {
@@ -332,13 +404,13 @@
/* Successful! Convert sizes from KB to bytes */
I1c->size *= 1024;
D1c->size *= 1024;
- L2c->size *= 1024;
+ LLc->size *= 1024;
return ret;
}
-void VG_(configure_caches)(cache_t* I1c, cache_t* D1c, cache_t* L2c,
+void VG_(configure_caches)(cache_t* I1c, cache_t* D1c, cache_t* LLc,
Bool all_caches_clo_defined)
{
Int res;
@@ -346,10 +418,10 @@
// Set caches to default.
*I1c = (cache_t) { 65536, 2, 64 };
*D1c = (cache_t) { 65536, 2, 64 };
- *L2c = (cache_t) { 262144, 8, 64 };
+ *LLc = (cache_t) { 262144, 8, 64 };
// Then replace with any info we can get from CPUID.
- res = get_caches_from_CPUID(I1c, D1c, L2c);
+ res = get_caches_from_CPUID(I1c, D1c, LLc);
// Warn if CPUID failed and config not completely specified from cmd line.
if (res != 0 && !all_caches_clo_defined) {
Modified: trunk/cachegrind/cg_arch.h
===================================================================
--- trunk/cachegrind/cg_arch.h 2010-10-06 22:45:18 UTC (rev 11403)
+++ trunk/cachegrind/cg_arch.h 2010-10-06 22:46:31 UTC (rev 11404)
@@ -33,14 +33,14 @@
// For cache simulation
typedef struct {
- int size; // bytes
- int assoc;
- int line_size; // bytes
+ Int size; // bytes
+ Int assoc;
+ Int line_size; // bytes
} cache_t;
-// Gives the configuration of I1, D1 and L2 caches. They get overridden
+// Gives the configuration of I1, D1 and LL caches. They get overridden
// by any cache configurations specified on the command line.
-void VG_(configure_caches)(cache_t* I1c, cache_t* D1c, cache_t* L2c,
+void VG_(configure_caches)(cache_t* I1c, cache_t* D1c, cache_t* LLc,
Bool all_caches_clo_defined);
#endif // __CG_ARCH_H
Modified: trunk/cachegrind/cg_main.c
===================================================================
--- trunk/cachegrind/cg_main.c 2010-10-06 22:45:18 UTC (rev 11403)
+++ trunk/cachegrind/cg_main.c 2010-10-06 22:46:31 UTC (rev 11404)
@@ -77,7 +77,7 @@
struct {
ULong a; /* total # memory accesses of this kind */
ULong m1; /* misses in the first level cache */
- ULong m2; /* misses in the second level cache */
+ ULong mL; /* misses in the second level cache */
}
CacheCC;
@@ -268,13 +268,13 @@
lineCC->loc.line = loc.line;
lineCC->Ir.a = 0;
lineCC->Ir.m1 = 0;
- lineCC->Ir.m2 = 0;
+ lineCC->Ir.mL = 0;
lineCC->Dr.a = 0;
lineCC->Dr.m1 = 0;
- lineCC->Dr.m2 = 0;
+ lineCC->Dr.mL = 0;
lineCC->Dw.a = 0;
lineCC->Dw.m1 = 0;
- lineCC->Dw.m2 = 0;
+ lineCC->Dw.mL = 0;
lineCC->Bc.b = 0;
lineCC->Bc.mp = 0;
lineCC->Bi.b = 0;
@@ -319,7 +319,7 @@
//VG_(printf)("1I_0D : CCaddr=0x%010lx, iaddr=0x%010lx, isize=%lu\n",
// n, n->instr_addr, n->instr_len);
cachesim_I1_doref(n->instr_addr, n->instr_len,
- &n->parent->Ir.m1, &n->parent->Ir.m2);
+ &n->parent->Ir.m1, &n->parent->Ir.mL);
n->parent->Ir.a++;
}
@@ -331,10 +331,10 @@
// n, n->instr_addr, n->instr_len,
// n2, n2->instr_addr, n2->instr_len);
cachesim_I1_doref(n->instr_addr, n->instr_len,
- &n->parent->Ir.m1, &n->parent->Ir.m2);
+ &n->parent->Ir.m1, &n->parent->Ir.mL);
n->parent->Ir.a++;
cachesim_I1_doref(n2->instr_addr, n2->instr_len,
- &n2->parent->Ir.m1, &n2->parent->Ir.m2);
+ &n2->parent->Ir.m1, &n2->parent->Ir.mL);
n2->parent->Ir.a++;
}
@@ -348,13 +348,13 @@
// n2, n2->instr_addr, n2->instr_len,
// n3, n3->instr_addr, n3->instr_len);
cachesim_I1_doref(n->instr_addr, n->instr_len,
- &n->parent->Ir.m1, &n->parent->Ir.m2);
+ &n->parent->Ir.m1, &n->parent->Ir.mL);
n->parent->Ir.a++;
cachesim_I1_doref(n2->instr_addr, n2->instr_len,
- &n2->parent->Ir.m1, &n2->parent->Ir.m2);
+ &n2->parent->Ir.m1, &n2->parent->Ir.mL);
n2->parent->Ir.a++;
cachesim_I1_doref(n3->instr_addr, n3->instr_len,
- &n3->parent->Ir.m1, &n3->parent->Ir.m2);
+ &n3->parent->Ir.m1, &n3->parent->Ir.mL);
n3->parent->Ir.a++;
}
@@ -365,11 +365,11 @@
// " daddr=0x%010lx, dsize=%lu\n",
// n, n->instr_addr, n->instr_len, data_addr, data_size);
cachesim_I1_doref(n->instr_addr, n->instr_len,
- &n->parent->Ir.m1, &n->parent->Ir.m2);
+ &n->parent->Ir.m1, &n->parent->Ir.mL);
n->parent->Ir.a++;
cachesim_D1_doref(data_addr, data_size,
- &n->parent->Dr.m1, &n->parent->Dr.m2);
+ &n->parent->Dr.m1, &n->parent->Dr.mL);
n->parent->Dr.a++;
}
@@ -380,11 +380,11 @@
// " daddr=0x%010lx, dsize=%lu\n",
// n, n->instr_addr, n->instr_len, data_addr, data_size);
cachesim_I1_doref(n->instr_addr, n->instr_len,
- &n->parent->Ir.m1, &n->parent->Ir.m2);
+ &n->parent->Ir.m1, &n->parent->Ir.mL);
n->parent->Ir.a++;
cachesim_D1_doref(data_addr, data_size,
- &n->parent->Dw.m1, &n->parent->Dw.m2);
+ &n->parent->Dw.m1, &n->parent->Dw.mL);
n->parent->Dw.a++;
}
@@ -394,7 +394,7 @@
//VG_(printf)("0I_1Dr: CCaddr=0x%010lx, daddr=0x%010lx, dsize=%lu\n",
// n, data_addr, data_size);
cachesim_D1_doref(data_addr, data_size,
- &n->parent->Dr.m1, &n->parent->Dr.m2);
+ &n->parent->Dr.m1, &n->parent->Dr.mL);
n->parent->Dr.a++;
}
@@ -404,7 +404,7 @@
//VG_(printf)("0I_1Dw: CCaddr=0x%010lx, daddr=0x%010lx, dsize=%lu\n",
// n, data_addr, data_size);
cachesim_D1_doref(data_addr, data_size,
- &n->parent->Dw.m1, &n->parent->Dw.m2);
+ &n->parent->Dw.m1, &n->parent->Dw.mL);
n->parent->Dw.a++;
}
@@ -1234,7 +1234,7 @@
static cache_t clo_I1_cache = UNDEFINED_CACHE;
static cache_t clo_D1_cache = UNDEFINED_CACHE;
-static cache_t clo_L2_cache = UNDEFINED_CACHE;
+static cache_t clo_LL_cache = UNDEFINED_CACHE;
// Checks cache config is ok. Returns NULL if ok, or a pointer to an error
// string otherwise.
@@ -1273,7 +1273,7 @@
}
static
-void configure_caches(cache_t* I1c, cache_t* D1c, cache_t* L2c)
+void configure_caches(cache_t* I1c, cache_t* D1c, cache_t* LLc)
{
#define DEFINED(L) (-1 != L.size || -1 != L.assoc || -1 != L.line_size)
@@ -1283,22 +1283,22 @@
Bool all_caches_clo_defined =
(DEFINED(clo_I1_cache) &&
DEFINED(clo_D1_cache) &&
- DEFINED(clo_L2_cache));
+ DEFINED(clo_LL_cache));
// Set the cache config (using auto-detection, if supported by the
// architecture).
- VG_(configure_caches)( I1c, D1c, L2c, all_caches_clo_defined );
+ VG_(configure_caches)( I1c, D1c, LLc, all_caches_clo_defined );
// Check the default/auto-detected values.
checkRes = check_cache(I1c); tl_assert(!checkRes);
checkRes = check_cache(D1c); tl_assert(!checkRes);
- checkRes = check_cache(L2c); tl_assert(!checkRes);
+ checkRes = check_cache(LLc); tl_assert(!checkRes);
// Then replace with any defined on the command line. (Already checked in
// parse_cache_opt().)
if (DEFINED(clo_I1_cache)) { *I1c = clo_I1_cache; }
if (DEFINED(clo_D1_cache)) { *D1c = clo_D1_cache; }
- if (DEFINED(clo_L2_cache)) { *L2c = clo_L2_cache; }
+ if (DEFINED(clo_LL_cache)) { *LLc = clo_LL_cache; }
if (VG_(clo_verbosity) >= 2) {
VG_(umsg)("Cache configuration used:\n");
@@ -1306,8 +1306,8 @@
I1c->size, I1c->assoc, I1c->line_size);
VG_(umsg)(" D1: %dB, %d-way, %dB lines\n",
D1c->size, D1c->assoc, D1c->line_size);
- VG_(umsg)(" L2: %dB, %d-way, %dB lines\n",
- L2c->size, L2c->assoc, L2c->line_size);
+ VG_(umsg)(" LL: %dB, %d-way, %dB lines\n",
+ LLc->size, LLc->assoc, LLc->line_size);
}
#undef CMD_LINE_DEFINED
}
@@ -1354,12 +1354,12 @@
VG_(free)(cachegrind_out_file);
}
- // "desc:" lines (giving I1/D1/L2 cache configuration). The spaces after
+ // "desc:" lines (giving I1/D1/LL cache configuration). The spaces after
// the 2nd colon makes cg_annotate's output look nicer.
VG_(sprintf)(buf, "desc: I1 cache: %s\n"
"desc: D1 cache: %s\n"
- "desc: L2 cache: %s\n",
- I1.desc_line, D1.desc_line, L2.desc_line);
+ "desc: LL cache: %s\n",
+ I1.desc_line, D1.desc_line, LL.desc_line);
VG_(write)(fd, (void*)buf, VG_(strlen)(buf));
// "cmd:" line
@@ -1379,11 +1379,11 @@
}
// "events:" line
if (clo_cache_sim && clo_branch_sim) {
- VG_(sprintf)(buf, "\nevents: Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw "
+ VG_(sprintf)(buf, "\nevents: Ir I1mr ILmr Dr D1mr DLmr Dw D1mw DLmw "
"Bc Bcm Bi Bim\n");
}
else if (clo_cache_sim && !clo_branch_sim) {
- VG_(sprintf)(buf, "\nevents: Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw "
+ VG_(sprintf)(buf, "\nevents: Ir I1mr ILmr Dr D1mr DLmr Dw D1mw DLmw "
"\n");
}
else if (!clo_cache_sim && clo_branch_sim) {
@@ -1430,9 +1430,9 @@
" %llu %llu %llu"
" %llu %llu %llu %llu\n",
lineCC->loc.line,
- lineCC->Ir.a, lineCC->Ir.m1, lineCC->Ir.m2,
- lineCC->Dr.a, lineCC->Dr.m1, lineCC->Dr.m2,
- lineCC->Dw.a, lineCC->Dw.m1, lineCC->Dw.m2,
+ lineCC->Ir.a, lineCC->Ir.m1, lineCC->Ir.mL,
+ lineCC->Dr.a, lineCC->Dr.m1, lineCC->Dr.mL,
+ lineCC->Dw.a, lineCC->Dw.m1, lineCC->Dw.mL,
lineCC->Bc.b, lineCC->Bc.mp,
lineCC->Bi.b, lineCC->Bi.mp);
}
@@ -1441,9 +1441,9 @@
" %llu %llu %llu"
" %llu %llu %llu\n",
lineCC->loc.line,
- lineCC->Ir.a, lineCC->Ir.m1, lineCC->Ir.m2,
- lineCC->Dr.a, lineCC->Dr.m1, lineCC->Dr.m2,
- lineCC->Dw.a, lineCC->Dw.m1, lineCC->Dw.m2);
+ lineCC->Ir.a, lineCC->Ir.m1, lineCC->Ir.mL,
+ lineCC->Dr.a, lineCC->Dr.m1, lineCC->Dr.mL,
+ lineCC->Dw.a, lineCC->Dw.m1, lineCC->Dw.mL);
}
else if (!clo_cache_sim && clo_branch_sim) {
VG_(sprintf)(buf, "%u %llu"
@@ -1464,13 +1464,13 @@
// Update summary stats
Ir_total.a += lineCC->Ir.a;
Ir_total.m1 += lineCC->Ir.m1;
- Ir_total.m2 += lineCC->Ir.m2;
+ Ir_total.mL += lineCC->Ir.mL;
Dr_total.a += lineCC->Dr.a;
Dr_total.m1 += lineCC->Dr.m1;
- Dr_total.m2 += lineCC->Dr.m2;
+ Dr_total.mL += lineCC->Dr.mL;
Dw_total.a += lineCC->Dw.a;
Dw_total.m1 += lineCC->Dw.m1;
- Dw_total.m2 += lineCC->Dw.m2;
+ Dw_total.mL += lineCC->Dw.mL;
Bc_total.b += lineCC->Bc.b;
Bc_total.mp += lineCC->Bc.mp;
Bi_total.b += lineCC->Bi.b;
@@ -1487,9 +1487,9 @@
" %llu %llu %llu"
" %llu %llu %llu"
" %llu %llu %llu %llu\n",
- Ir_total.a, Ir_total.m1, Ir_total.m2,
- Dr_total.a, Dr_total.m1, Dr_total.m2,
- Dw_total.a, Dw_total.m1, Dw_total.m2,
+ Ir_total.a, Ir_total.m1, Ir_total.mL,
+ Dr_total.a, Dr_total.m1, Dr_total.mL,
+ Dw_total.a, Dw_total.m1, Dw_total.mL,
Bc_total.b, Bc_total.mp,
Bi_total.b, Bi_total.mp);
}
@@ -1498,9 +1498,9 @@
" %llu %llu %llu"
" %llu %llu %llu"
" %llu %llu %llu\n",
- Ir_total.a, Ir_total.m1, Ir_total.m2,
- Dr_total.a, Dr_total.m1, Dr_total.m2,
- Dw_total.a, Dw_total.m1, Dw_total.m2);
+ Ir_total.a, Ir_total.m1, Ir_total.mL,
+ Dr_total.a, Dr_total.m1, Dr_total.mL,
+ Dw_total.a, Dw_total.m1, Dw_total.mL);
}
else if (!clo_cache_sim && clo_branch_sim) {
VG_(sprintf)(buf, "summary:"
@@ -1537,8 +1537,8 @@
CacheCC D_total;
BranchCC B_total;
- ULong L2_total_m, L2_total_mr, L2_total_mw,
- L2_total, L2_total_r, L2_total_w;
+ ULong LL_total_m, LL_total_mr, LL_total_mw,
+ LL_total, LL_total_r, LL_total_w;
Int l1, l2, l3;
fprint_CC_table_and_calc_totals();
@@ -1565,21 +1565,21 @@
miss numbers */
if (clo_cache_sim) {
VG_(umsg)(fmt, "I1 misses: ", Ir_total.m1);
- VG_(umsg)(fmt, "L2i misses: ", Ir_total.m2);
+ VG_(umsg)(fmt, "LLi misses: ", Ir_total.mL);
if (0 == Ir_total.a) Ir_total.a = 1;
VG_(percentify)(Ir_total.m1, Ir_total.a, 2, l1+1, buf1);
VG_(umsg)("I1 miss rate: %s\n", buf1);
- VG_(percentify)(Ir_total.m2, Ir_total.a, 2, l1+1, buf1);
- VG_(umsg)("L2i miss rate: %s\n", buf1);
+ VG_(percentify)(Ir_total.mL, Ir_total.a, 2, l1+1, buf1);
+ VG_(umsg)("LLi miss rate: %s\n", buf1);
VG_(umsg)("\n");
/* D cache results. Use the D_refs.rd and D_refs.wr values to
* determine the width of columns 2 & 3. */
D_total.a = Dr_total.a + Dw_total.a;
D_total.m1 = Dr_total.m1 + Dw_total.m1;
- D_total.m2 = Dr_total.m2 + Dw_total.m2;
+ D_total.mL = Dr_total.mL + Dw_total.mL;
/* Make format string, getting width right for numbers */
VG_(sprintf)(fmt, "%%s %%,%dllu (%%,%dllu rd + %%,%dllu wr)\n",
@@ -1589,8 +1589,8 @@
D_total.a, Dr_total.a, Dw_total.a);
VG_(umsg)(fmt, "D1 misses: ",
D_total.m1, Dr_total.m1, Dw_total.m1);
- VG_(umsg)(fmt, "L2d misses: ",
- D_total.m2, Dr_total.m2, Dw_total.m2);
+ VG_(umsg)(fmt, "LLd misses: ",
+ D_total.mL, Dr_total.mL, Dw_total.mL);
if (0 == D_total.a) D_total.a = 1;
if (0 == Dr_total.a) Dr_total.a = 1;
@@ -1600,30 +1600,30 @@
VG_(percentify)(Dw_total.m1, Dw_total.a, 1, l3+1, buf3);
VG_(umsg)("D1 miss rate: %s (%s + %s )\n", buf1, buf2,buf3);
- VG_(percentify)( D_total.m2, D_total.a, 1, l1+1, buf1);
- VG_(percentify)(Dr_total.m2, Dr_total.a, 1, l2+1, buf2);
- VG_(percentify)(Dw_total.m2, Dw_total.a, 1, l3+1, buf3);
- VG_(umsg)("L2d miss rate: %s (%s + %s )\n", buf1, buf2,buf3);
+ VG_(percentify)( D_total.mL, D_total.a, 1, l1+1, buf1);
+ VG_(percentify)(Dr_total.mL, Dr_total.a, 1, l2+1, buf2);
+ VG_(percentify)(Dw_total.mL, Dw_total.a, 1, l3+1, buf3);
+ VG_(umsg)("LLd miss rate: %s (%s + %s )\n", buf1, buf2,buf3);
VG_(umsg)("\n");
- /* L2 overall results */
+ /* LL overall results */
- L2_total = Dr_total.m1 + Dw_total.m1 + Ir_total.m1;
- L2_total_r = Dr_total.m1 + Ir_total.m1;
- L2_total_w = Dw_total.m1;
- VG_(umsg)(fmt, "L2 refs: ",
- L2_total, L2_total_r, L2_total_w);
+ LL_total = Dr_total.m1 + Dw_total.m1 + Ir_total.m1;
+ LL_total_r = Dr_total.m1 + Ir_total.m1;
+ LL_total_w = Dw_total.m1;
+ VG_(umsg)(fmt, "LL refs: ",
+ LL_total, LL_total_r, LL_total_w);
- L2_total_m = Dr_total.m2 + Dw_total.m2 + Ir_total.m2;
- L2_total_mr = Dr_total.m2 + Ir_total.m2;
- L2_total_mw = Dw_total.m2;
- VG_(umsg)(fmt, "L2 misses: ",
- L2_total_m, L2_total_mr, L2_total_mw);
+ LL_total_m = Dr_total.mL + Dw_total.mL + Ir_total.mL;
+ LL_total_mr = Dr_total.mL + Ir_total.mL;
+ LL_total_mw = Dw_total.mL;
+ VG_(umsg)(fmt, "LL misses: ",
+ LL_total_m, LL_total_mr, LL_total_mw);
- VG_(percentify)(L2_total_m, (Ir_total.a + D_total.a), 1, l1+1, buf1);
- VG_(percentify)(L2_total_mr, (Ir_total.a + Dr_total.a), 1, l2+1, buf2);
- VG_(percentify)(L2_total_mw, Dw_total.a, 1, l3+1, buf3);
- VG_(umsg)("L2 miss rate: %s (%s + %s )\n", buf1, buf2,buf3);
+ VG_(percentify)(LL_total_m, (Ir_total.a + D_total.a), 1, l1+1, buf1);
+ VG_(percentify)(LL_total_mr, (Ir_total.a + Dr_total.a), 1, l2+1, buf2);
+ VG_(percentify)(LL_total_mw, Dw_total.a, 1, l3+1, buf3);
+ VG_(umsg)("LL miss rate: %s (%s + %s )\n", buf1, buf2,buf3);
}
/* If branch profiling is enabled, show branch overall results. */
@@ -1760,8 +1760,9 @@
parse_cache_opt(&clo_I1_cache, arg, tmp_str);
else if VG_STR_CLO(arg, "--D1", tmp_str)
parse_cache_opt(&clo_D1_cache, arg, tmp_str);
- else if VG_STR_CLO(arg, "--L2", tmp_str)
- parse_cache_opt(&clo_L2_cache, arg, tmp_str);
+ else if (VG_STR_CLO(arg, "--L2", tmp_str) || // for backwards compatibility
+ VG_STR_CLO(arg, "--LL", tmp_str))
+ parse_cache_opt(&clo_LL_cache, arg, tmp_str);
else if VG_STR_CLO( arg, "--cachegrind-out-file", clo_cachegrind_out_file) {}
else if VG_BOOL_CLO(arg, "--cache-sim", clo_cache_sim) {}
@@ -1777,7 +1778,7 @@
VG_(printf)(
" --I1=<size>,<assoc>,<line_size> set I1 cache manually\n"
" --D1=<size>,<assoc>,<line_size> set D1 cache manually\n"
-" --L2=<size>,<assoc>,<line_size> set L2 cache manually\n"
+" --LL=<size>,<assoc>,<line_size> set LL cache manually\n"
" --cache-sim=yes|no [yes] collect cache stats?\n"
" --branch-sim=yes|no [no] collect branch prediction stats?\n"
" --cachegrind-out-file=<file> output file name [cachegrind.out.%%p]\n"
@@ -1819,7 +1820,7 @@
static void cg_post_clo_init(void)
{
- cache_t I1c, D1c, L2c;
+ cache_t I1c, D1c, LLc;
CC_table =
VG_(OSetGen_Create)(offsetof(LineCC, loc),
@@ -1837,11 +1838,11 @@
VG_(malloc), "cg.main.cpci.3",
VG_(free));
- configure_caches(&I1c, &D1c, &L2c);
+ configure_caches(&I1c, &D1c, &LLc);
cachesim_I1_initcache(I1c);
cachesim_D1_initcache(D1c);
- cachesim_L2_initcache(L2c);
+ cachesim_LL_initcache(LLc);
}
VG_DETERMINE_INTERFACE_VERSION(cg_pre_clo_init)
Modified: trunk/cachegrind/cg_sim.c
===================================================================
--- trunk/cachegrind/cg_sim.c 2010-10-06 22:45:18 UTC (rev 11403)
+++ trunk/cachegrind/cg_sim.c 2010-10-06 22:46:31 UTC (rev 11404)
@@ -96,7 +96,7 @@
/* bigger than its usual limit. Inlining gains around 5--10% speedup. */ \
__attribute__((always_inline)) \
static __inline__ \
-void cachesim_##L##_doref(Addr a, UChar size, ULong* m1, ULong *m2) \
+void cachesim_##L##_doref(Addr a, UChar size, ULong* m1, ULong *mL) \
{ \
UInt set1 = ( a >> L.line_size_bits) & (L.sets_min_1); \
UInt set2 = ((a+size-1) >> L.line_size_bits) & (L.sets_min_1); \
@@ -188,9 +188,9 @@
return; \
}
-CACHESIM(L2, (*m2)++ );
-CACHESIM(I1, { (*m1)++; cachesim_L2_doref(a, size, m1, m2); } );
-CACHESIM(D1, { (*m1)++; cachesim_L2_doref(a, size, m1, m2); } );
+CACHESIM(LL, (*mL)++ );
+CACHESIM(I1, { (*m1)++; cachesim_LL_doref(a, size, m1, mL); } );
+CACHESIM(D1, { (*m1)++; cachesim_LL_doref(a, size, m1, mL); } );
/*--------------------------------------------------------------------*/
/*--- end cg_sim.c ---*/
Modified: trunk/cachegrind/docs/cg-manual.xml
===================================================================
--- trunk/cachegrind/docs/cg-manual.xml 2010-10-06 22:45:18 UTC (rev 11403)
+++ trunk/cachegrind/docs/cg-manual.xml 2010-10-06 22:46:31 UTC (rev 11404)
@@ -16,33 +16,45 @@
<para>Cachegrind simulates how your program interacts with a machine's cache
hierarchy and (optionally) branch predictor. It simulates a machine with
-independent first level instruction and data caches (I1 and D1), backed by a
-unified second level cache (L2). This configuration is used by almost all
-modern machines.</para>
+independent first-level instruction and data caches (I1 and D1), backed by a
+unified second-level cache (L2). This exactly matches the configuration of
+many modern machines.</para>
+<para>However, some modern machines have three levels of cache. For these
+machines (in the cases where Cachegrind can auto-detect the cache
+configuration) Cachegrind simulates the first-level and third-level caches.
+The reason for this choice is that the L3 cache has the most influence on
+runtime, as it masks accesses to main memory. Furthermore, the L1 caches
+often have low associativity, so simulating them can detect cases where the
+code interacts badly with this cache (eg. traversing a matrix column-wise
+with the row length being a power of 2).</para>
+
+<para>Therefore, Cachegrind always refers to the I1, D1 and LL (last-level)
+caches.</para>
+
<para>
-It gathers the following statistics (abbreviations used for each statistic
+Cachegrind gathers the following statistics (abbreviations used for each statistic
is given in parentheses):</para>
<itemizedlist>
<listitem>
<para>I cache reads (<computeroutput>Ir</computeroutput>,
which equals the number of instructions executed),
I1 cache read misses (<computeroutput>I1mr</computeroutput>) and
- L2 cache instruction read misses (<computeroutput>I1mr</computeroutput>).
+ LL cache instruction read misses (<computeroutput>ILmr</computeroutput>).
</para>
</listitem>
<listitem>
<para>D cache reads (<computeroutput>Dr</computeroutput>, which
equals the number of memory reads),
D1 cache read misses (<computeroutput>D1mr</computeroutput>), and
- L2 cache data read misses (<computeroutput>D2mr</computeroutput>).
+ LL cache data read misses (<computeroutput>DLmr</computeroutput>).
</para>
</listitem>
<listitem>
<para>D cache writes (<computeroutput>Dw</computeroutput>, which equals
the number of memory writes),
D1 cache write misses (<computeroutput>D1mw</computeroutput>), and
- L2 cache data write misses (<computeroutput>D2mw</computeroutput>).
+ LL cache data write misses (<computeroutput>DLmw</computeroutput>).
</para>
</listitem>
<listitem>
@@ -59,10 +71,10 @@
<para>Note that D1 total accesses is given by
<computeroutput>D1mr</computeroutput> +
-<computeroutput>D1mw</computeroutput>, and that L2 total
-accesses is given by <computeroutput>I2mr</computeroutput> +
-<computeroutput>D2mr</computeroutput> +
-<computeroutput>D2mw</computeroutput>.
+<computeroutput>D1mw</computeroutput>, and that LL total
+accesses is given by <computeroutput>ILmr</computeroutput> +
+<computeroutput>DLmr</computeroutput> +
+<computeroutput>DLmw</computeroutput>.
</para>
<para>These statistics are presented for the entire program and for each
@@ -70,7 +82,7 @@
the program with the counts that were caused directly by it.</para>
<para>On a modern machine, an L1 miss will typically cost
-around 10 cycles, an L2 miss can cost as much as 200
+around 10 cycles, an LL miss can cost as much as 200
cycles, and a mispredicted branch costs in the region of 10
to 30 cycles. Detailed cache and branch profiling can be very useful
for understanding how your program interacts with the machine and thus how
@@ -118,24 +130,24 @@
<programlisting><![CDATA[
==31751== I refs: 27,742,716
==31751== I1 misses: 276
-==31751== L2i misses: 275
+==31751== LLi misses: 275
==31751== I1 miss rate: 0.0%
-==31751== L2i miss rate: 0.0%
+==31751== LLi miss rate: 0.0%
==31751==
==31751== D refs: 15,430,290 (10,955,517 rd + 4,474,773 wr)
==31751== D1 misses: 41,185 ( 21,905 rd + 19,280 wr)
-==31751== L2d misses: 23,085 ( 3,987 rd + 19,098 wr)
+==31751== LLd misses: 23,085 ( 3,987 rd + 19,098 wr)
==31751== D1 miss rate: 0.2% ( 0.1% + 0.4%)
-==31751== L2d miss rate: 0.1% ( 0.0% + 0.4%)
+==31751== LLd miss rate: 0.1% ( 0.0% + 0.4%)
==31751==
-==31751== L2 misses: 23,360 ( 4,262 rd + 19,098 wr)
-==31751== L2 miss rate: 0.0% ( 0.0% + 0.4%)]]></programlisting>
+==31751== LL misses: 23,360 ( 4,262 rd + 19,098 wr)
+==31751== LL miss rate: 0.0% ( 0.0% + 0.4%)]]></programlisting>
<para>Cache accesses for instruction fetches are summarised
first, giving the number of fetches made (this is the number of
instructions executed, which can be useful to know in its own
-right), the number of I1 misses, and the number of L2 instruction
-(<computeroutput>L2i</computeroutput>) misses.</para>
+right), the number of I1 misses, and the number of LL instruction
+(<computeroutput>LLi</computeroutput>) misses.</para>
<para>Cache accesses for data follow. The information is similar
to that of the instruction fetches, except that the values are
@@ -144,12 +156,12 @@
<computeroutput>wr</computeroutput> values add up to the row's
total).</para>
-<para>Combined instruction and data figures for the L2 cache
-follow that. Note that the L2 miss rate is computed relative to the total
+<para>Combined instruction and data figures for the LL cache
+follow that. Note that the LL miss rate is computed relative to the total
number of memory accesses, not the number of L1 misses. I.e. it is
-<computeroutput>(I2mr + D2mr + D2mw) / (Ir + Dr + Dw)</computeroutput>
+<computeroutput>(ILmr + DLmr + DLmw) / (Ir + Dr + Dw)</computeroutput>
not
-<computeroutput>(I2mr + D2mr + D2mw) / (I1mr + D1mr + D1mw)</computeroutput>
+<computeroutput>(ILmr + DLmr + DLmw) / (I1mr + D1mr + D1mw)</computeroutput>
</para>
<para>Branch prediction statistics are not collected by default.
@@ -208,11 +220,11 @@
--------------------------------------------------------------------------------
I1 cache: 65536 B, 64 B, 2-way associative
D1 cache: 65536 B, 64 B, 2-way associative
-L2 cache: 262144 B, 64 B, 8-way associative
+LL cache: 262144 B, 64 B, 8-way associative
Command: concord vg_to_ucode.c
-Events recorded: Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw
-Events shown: Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw
-Event sort order: Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw
+Events recorded: Ir I1mr ILmr Dr D1mr DLmr Dw D1mw DLmw
+Events shown: Ir I1mr ILmr Dr D1mr DLmr Dw D1mw DLmw
+Event sort order: Ir I1mr ILmr Dr D1mr DLmr Dw D1mw DLmw
Threshold: 99%
Chosen for annotation:
Auto-annotation: off
@@ -224,7 +236,7 @@
<itemizedlist>
<listitem>
- <para>I1 cache, D1 cache, L2 cache: cache configuration. So
+ <para>I1 cache, D1 cache, LL cache: cache configuration. So
you know the configuration with which these results were
obtained.</para>
</listitem>
@@ -300,7 +312,7 @@
<programlisting><![CDATA[
--------------------------------------------------------------------------------
-Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw
+Ir I1mr ILmr Dr D1mr DLmr Dw D1mw DLmw
--------------------------------------------------------------------------------
27,742,716 276 275 10,955,517 21,905 3,987 4,474,773 19,280 19,098 PROGRAM TOTALS]]></programlisting>
@@ -312,7 +324,7 @@
<programlisting><![CDATA[
--------------------------------------------------------------------------------
-Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw file:function
+Ir I1mr ILmr Dr D1mr DLmr Dw D1mw DLmw file:function
--------------------------------------------------------------------------------
8,821,482 5 5 2,242,702 1,621 73 1,794,230 0 0 getc.c:_IO_getc
5,222,023 4 4 2,276,334 16 12 875,959 1 1 concord.c:get_word
@@ -367,7 +379,7 @@
--------------------------------------------------------------------------------
-- User-annotated source: concord.c
--------------------------------------------------------------------------------
-Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw
+Ir I1mr ILmr Dr D1mr DLmr Dw D1mw DLmw
. . . . . . . . . void init_hash_table(char *file_name, Word_Node *table[])
3 1 1 . . . 1 0 0 {
@@ -687,7 +699,7 @@
<computeroutput>Events:</computeroutput> lines of all the inputs are
identical, so as to ensure that the addition of costs makes sense.
For example, it would be nonsensical for it to add a number indicating
-D1 read references to a number from a different file indicating L2
+D1 read references to a number from a different file indicating LL
write misses.</para>
<para>
@@ -746,7 +758,7 @@
<computeroutput>Events:</computeroutput> lines of all the inputs are
identical, so as to ensure that the addition of costs makes sense.
For example, it would be nonsensical for it to add a number indicating
-D1 read references to a number from a different file indicating L2
+D1 read references to a number from a different file indicating LL
write misses.</para>
<para>
@@ -810,12 +822,12 @@
</listitem>
</varlistentry>
- <varlistentry id="opt.L2" xreflabel="--L2">
+ <varlistentry id="opt.LL" xreflabel="--LL">
<term>
- <option><![CDATA[--L2=<size>,<associativity>,<line size> ]]></option>
+ <option><![CDATA[--LL=<size>,<associativity>,<line size> ]]></option>
</term>
<listitem>
- <para>Specify the size, associativity and line size of the level 2
+ <para>Specify the size, associativity and line size of the last-level
cache.</para>
</listitem>
</varlistentry>
@@ -903,9 +915,9 @@
order). Default is to use all present in the
<filename>cachegrind.out.<pid></filename> file (and
use the order in the file). Useful if you want to concentrate on, for
- example, I cache misses (<option>--show=I1mr,I2mr</option>), or data
- read misses (<option>--show=D1mr,D2mr</option>), or L2 data misses
- (<option>--show=D2mr,D2mw</option>). Best used in conjunction with
+ example, I cache misses (<option>--show=I1mr,ILmr</option>), or data
+ read misses (<option>--show=D1mr,DLmr</option>), or LL data misses
+ (<option>--show=DLmr,DLmw</option>). Best used in conjunction with
<option>--sort</option>.</para>
</listitem>
</varlistentry>
@@ -935,9 +947,9 @@
events by appending any events for the
<option>--sort</option> option with a colon
and a number (no spaces, though). E.g. if you want to see
- each function that covers more than 1% of L2 read misses or 1% of L2
+ each function that covers more than 1% of LL read misses or 1% of LL
write misses, use this option:</para>
- <para><option>--sort=D2mr:1,D2mw:1</option></para>
+ <para><option>--sort=DLmr:1,DLmw:1</option></para>
</listitem>
</varlistentry>
@@ -1059,13 +1071,13 @@
bottlenecks.</para>
<para>
-After that, we have found that L2 misses are typically a much bigger source
+After that, we have found that LL misses are typically a much bigger source
of slow-downs than L1 misses. So it's worth looking for any snippets of
-code with high <computeroutput>D2mr</computeroutput> or
-<computeroutput>D2mw</computeroutput> counts. (You can use
-<option>--show=D2mr
---sort=D2mr</option> with cg_annotate to focus just on
-<literal>D2mr</literal> counts, for example.) If you find any, it's still
+code with high <computeroutput>DLmr</computeroutput> or
+<computeroutput>DLmw</computeroutput> counts. (You can use
+<option>--show=DLmr
+--sort=DLmr</option> with cg_annotate to focus just on
+<literal>DLmr</literal> counts, for example.) If you find any, it's still
not always easy to work out how to improve things. You need to have a
reasonable understanding of how caches work, the principles of locality, and
your program's data access patterns. Improving things may require
@@ -1153,12 +1165,12 @@
</listitem>
<listitem>
- <para>Inclusive L2 cache: the L2 cache typically replicates all
+ <para>Inclusive LL cache: the LL cache typically replicates all
the entries of the L1 caches, because fetching into L1 involves
- fetching into L2 first (this does not guarantee strict inclusiveness,
- as lines evicted from L2 still could reside in L1). This is
+ fetching into LL first (this does not guarantee strict inclusiveness,
+ as lines evicted from LL still could reside in L1). This is
standard on Pentium chips, but AMD Opterons, Athlons and Durons
- use an exclusive L2 cache that only holds
+ use an exclusive LL cache that only holds
blocks evicted from L1. Ditto most modern VIA CPUs.</para>
</listitem>
@@ -1172,10 +1184,10 @@
Cachegrind will fall back to using a default configuration (that
of a model 3/4 Athlon). Cachegrind will tell you if this
happens. You can manually specify one, two or all three levels
-(I1/D1/L2) of the cache from the command line using the
+(I1/D1/LL) of the cache from the command line using the
<option>--I1</option>,
<option>--D1</option> and
-<option>--L2</option> options.
+<option>--LL</option> options.
For cache parameters to be valid for simulation, the number
of sets (with associativity being the number of cache lines in
each set) has to be a power of two.</para>
@@ -1186,7 +1198,7 @@
need to specify it with the
<option>--I1</option>,
<option>--D1</option> and
-<option>--L2</option> options.</para>
+<option>--LL</option> options.</para>
<para>Other noteworthy behaviour:</para>
Modified: trunk/cachegrind/tests/chdir.stderr.exp
===================================================================
--- trunk/cachegrind/tests/chdir.stderr.exp 2010-10-06 22:45:18 UTC (rev 11403)
+++ trunk/cachegrind/tests/chdir.stderr.exp 2010-10-06 22:46:31 UTC (rev 11404)
@@ -2,16 +2,16 @@
I refs:
I1 misses:
-L2i misses:
+LLi misses:
I1 miss rate:
-L2i miss rate:
+LLi miss rate:
D refs:
D1 misses:
-L2d misses:
+LLd misses:
D1 miss rate:
-L2d miss rate:
+LLd miss rate:
-L2 refs:
-L2 misses:
-L2 miss rate:
+LL refs:
+LL misses:
+LL miss rate:
Modified: trunk/cachegrind/tests/dlclose.stderr.exp
===================================================================
--- trunk/cachegrind/tests/dlclose.stderr.exp 2010-10-06 22:45:18 UTC (rev 11403)
+++ trunk/cachegrind/tests/dlclose.stderr.exp 2010-10-06 22:46:31 UTC (rev 11404)
@@ -2,16 +2,16 @@
I refs:
I1 misses:
-L2i misses:
+LLi misses:
I1 miss rate:
-L2i miss rate:
+LLi miss rate:
D refs:
D1 misses:
-L2d misses:
+LLd misses:
D1 miss rate:
-L2d miss rate:
+LLd miss rate:
-L2 refs:
-L2 misses:
-L2 miss rate:
+LL refs:
+LL misses:
+LL miss rate:
Modified: trunk/cachegrind/tests/filter_stderr
===================================================================
--- trunk/cachegrind/tests/filter_stderr 2010-10-06 22:45:18 UTC (rev 11403)
+++ trunk/cachegrind/tests/filter_stderr 2010-10-06 22:46:31 UTC (rev 11404)
@@ -7,11 +7,11 @@
# Remove "Cachegrind, ..." line and the following copyright line.
sed "/^Cachegrind, a cache and branch-prediction profiler/ , /./ d" |
-# Remove numbers from I/D/L2 "refs:" lines
-perl -p -e 's/((I|D|L2) *refs:)[ 0-9,()+rdw]*$/\1/' |
+# Remove numbers from I/D/LL "refs:" lines
+perl -p -e 's/((I|D|LL) *refs:)[ 0-9,()+rdw]*$/\1/' |
-# Remove numbers from I1/D1/L2/L2i/L2d "misses:" and "miss rates:" lines
-perl -p -e 's/((I1|D1|L2|L2i|L2d) *(misses|miss rate):)[ 0-9,()+rdw%\.]*$/\1/' |
+# Remove numbers from I1/D1/LL/LLi/LLd "misses:" and "miss rates:" lines
+perl -p -e 's/((I1|D1|LL|LLi|LLd) *(misses|miss rate):)[ 0-9,()+rdw%\.]*$/\1/' |
# Remove CPUID warnings lines for P4s and other machines
sed "/warning: Pentium 4 with 12 KB micro-op instruction trace cache/d" |
Modified: trunk/cachegrind/tests/notpower2.stderr.exp
===================================================================
--- trunk/cachegrind/tests/notpower2.stderr.exp 2010-10-06 22:45:18 UTC (rev 11403)
+++ trunk/cachegrind/tests/notpower2.stderr.exp 2010-10-06 22:46:31 UTC (rev 11404)
@@ -2,16 +2,16 @@
I refs:
I1 misses:
-L2i misses:
+LLi misses:
I1 miss rate:
-L2i miss rate:
+LLi miss rate:
D refs:
D1 misses:
-L2d misses:
+LLd misses:
D1 miss rate:
-L2d miss rate:
+LLd miss rate:
-L2 refs:
-L2 misses:
-L2 miss rate:
+LL refs:
+LL misses:
+LL miss rate:
Modified: trunk/cachegrind/tests/notpower2.vgtest
===================================================================
--- trunk/cachegrind/tests/notpower2.vgtest 2010-10-06 22:45:18 UTC (rev 11403)
+++ trunk/cachegrind/tests/notpower2.vgtest 2010-10-06 22:46:31 UTC (rev 11404)
@@ -1,3 +1,3 @@
prog: ../../tests/true
-vgopts: --I1=32768,8,64 --D1=24576,6,64 --L2=3145728,12,64
+vgopts: --I1=32768,8,64 --D1=24576,6,64 --LL=3145728,12,64
cleanup: rm cachegrind.out.*
Modified: trunk/cachegrind/tests/wrap5.stderr.exp
===================================================================
--- trunk/cachegrind/tests/wrap5.stderr.exp 2010-10-06 22:45:18 UTC (rev 11403)
+++ trunk/cachegrind/tests/wrap5.stderr.exp 2010-10-06 22:46:31 UTC (rev 11404)
@@ -2,16 +2,16 @@
I refs:
I1 misses:
-L2i misses:
+LLi misses:
I1 miss rate:
-L2i miss rate:
+LLi miss rate:
D refs:
D1 misses:
-L2d misses:
+LLd misses:
D1 miss rate:
-L2d miss rate:
+LLd miss rate:
-L2 refs:
-L2 misses:
-L2 miss rate:
+LL refs:
+LL misses:
+LL miss rate:
Modified: trunk/cachegrind/tests/x86/fpu-28-108.stderr.exp
===================================================================
--- trunk/cachegrind/tests/x86/fpu-28-108.stderr.exp 2010-10-06 22:45:18 UTC (rev 11403)
+++ trunk/cachegrind/tests/x86/fpu-28-108.stderr.exp 2010-10-06 22:46:31 UTC (rev 11404)
@@ -2,16 +2,16 @@
I refs:
I1 misses:
-L2i misses:
+LLi misses:
I1 miss rate:
-L2i miss rate:
+LLi miss rate:
D refs:
D1 misses:
-L2d misses:
+LLd misses:
D1 miss rate:
-L2d miss rate:
+LLd miss rate:
-L2 refs:
-L2 misses:
-L2 miss rate:
+LL refs:
+LL misses:
+LL miss rate:
Modified: trunk/callgrind/docs/cl-format.xml
===================================================================
--- trunk/callgrind/docs/cl-format.xml 2010-10-06 22:45:18 UTC (rev 11403)
+++ trunk/callgrind/docs/cl-format.xml 2010-10-06 22:46:31 UTC (rev 11404)
@@ -414,7 +414,7 @@
<para>This specifies various information for this dump. For some
types, the semantic is defined, but any description type is allowed.
Unknown types should be ignored.</para>
- <para>There are the types "I1 cache", "D1 cache", "L2 cache", which
+ <para>There are the types "I1 cache", "D1 cache", "LL cache", which
specify parameters used for the cache simulator. These are the only
types originally used by Cachegrind. Additionally, Callgrind uses
the following types: "Timerange" gives a rough range of the basic
@@ -457,7 +457,7 @@
<para><command>I1mr</command>: Instruction Level 1 read cache miss</para>
</listitem>
<listitem>
- <para><command>I2mr</command>: Instruction Level 2 read cache miss</para>
+ <para><command>ILmr</command>: Instruction last-level read cache miss</para>
</listitem>
<listitem>
<para>...</para>
Modified: trunk/callgrind/docs/cl-manual.xml
===================================================================
--- trunk/callgrind/docs/cl-manual.xml 2010-10-06 22:45:18 UTC (rev 11403)
+++ trunk/callgrind/docs/cl-manual.xml 2010-10-06 22:46:31 UTC (rev 11404)
@@ -933,9 +933,9 @@
<para>Specify if you want to do full cache simulation. By default,
only instruction read accesses will be counted ("Ir").
With cache simulation, further event counters are enabled:
- Cache misses on instruction reads ("I1mr"/"I2mr"),
- data read accesses ("Dr") and related cache misses ("D1mr"/"D2mr"),
- data write accesses ("Dw") and related cache misses ("D1mw"/"D2mw").
+ Cache misses on instruction reads ("I1mr"/"ILmr"),
+ data read accesses ("Dr") and related cache misses ("D1mr"/"DLmr"),
+ data write accesses ("Dw") and related cache misses ("D1mw"/"DLmw").
For more information, see <xref link...
[truncated message content] |
|
From: <sv...@va...> - 2010-10-06 22:45:31
|
Author: sewardj
Date: 2010-10-06 23:45:18 +0100 (Wed, 06 Oct 2010)
New Revision: 11403
Log:
The amd64-linux unwinder rejects stacks of smaller than 512 bytes as
bogus, and produces essentially useless traces from them. With
gcc-4.4 and later, some valid thread stacks really are smaller than
this. Hence change the limit down to 256 bytes. Investigated by
Evgeniy Stepanov, eug...@gm....
See bug 243270 comment 21.
Modified:
trunk/coregrind/m_stacktrace.c
Modified: trunk/coregrind/m_stacktrace.c
===================================================================
--- trunk/coregrind/m_stacktrace.c 2010-10-06 22:07:06 UTC (rev 11402)
+++ trunk/coregrind/m_stacktrace.c 2010-10-06 22:45:18 UTC (rev 11403)
@@ -264,7 +264,7 @@
// On Darwin, this kicks in for pthread-related stack traces, so they're
// only 1 entry long which is wrong.
# if !defined(VGO_darwin)
- if (fp_min + 512 >= fp_max) {
+ if (fp_min + 256 >= fp_max) {
/* If the stack limits look bogus, don't poke around ... but
don't bomb out either. */
if (sps) sps[0] = uregs.xsp;
|
|
From: Vince W. <vi...@cs...> - 2010-10-06 22:26:49
|
I have an qeustion about bug https://bugs.kde.org/show_bug.cgi?id=211499 This involves creating a --vex-native-cpuid=yes flag so that native cpuid info is reported. My new job is working for the PAPI perf counter group, and this fix is needed for Valgrind to do anything useful with various perf counter tools (otherwise VEX reports a CPU different than the actual one running, so the perf_events related syscalls send invalid counter settings for the wrong CPU type and thus none of the tools work). Is there any possibility of a patch like this getting in? If so I can work on cleaning it up so it applies to current SVN. Vince |
|
From: Vince W. <vi...@cs...> - 2010-10-06 22:22:53
|
On Fri, 1 Oct 2010, Eric Pouech wrote: > for the sake of record, some of those fixes are also needed to run > valgrind on Wine on amd64 > to list a few: move from/to seg, push/pop seg, fxrstor > those which are not listed here: push eflags, iret as Julian said, could you file bugs for those? Then I can link them to the "amd64 missing instructions". > I have dirty work for all of those (wine/amd64 does work on > valgind/amd64), that also needs some cleanup > not sure I'll have time to do it before vg3.6 maybe you can post those even if messy, they would be handy to have even if not completely finished. Vince |
|
From: Vince W. <vi...@cs...> - 2010-10-06 22:21:17
|
On Fri, 1 Oct 2010, Julian Seward wrote: > > I'm hoping the interger fixes might be ready before the 3.6 code freeze. > > The floating point ones are more of a long-term project. > > (ambiguous) You mean, you are hoping to finalise a package of tests + > implementations for missing integer instructions, for before the freeze? yes, though as I dig deeper into some of the missing instructions (especially the ones that are segment-related... what a mess) it looks unlikely I'll be able to get there. I'm working on it, but unfortunately this work is only tangentially related to my actual job. > Can you file a meta-bug for this, which uses the depends-on field to > reference the other bugs for the individual instructions, that you listed > in your message? Hmmm I tried that (as you probably saw) with bug https://bugs.kde.org/show_bug.cgi?id=253451 But my bugzilla skills are lacking and I somehow couldn't figure out how to get the Depends-on attribute to let me add dependencies. Vince |
|
From: <sv...@va...> - 2010-10-06 22:07:19
|
Author: sewardj
Date: 2010-10-06 23:07:06 +0100 (Wed, 06 Oct 2010)
New Revision: 11402
Log:
amd64-linux: add suitable CFI annotations so that unwinding through
the CALL_FN_*_* macros works more reliably. This is all very fiddly
and is described in a large comment in valgrind.h. Fixes #243270.
(Evgeniy Stepanov, eug...@gm...)
Modified:
trunk/configure.in
trunk/include/valgrind.h
Modified: trunk/configure.in
===================================================================
--- trunk/configure.in 2010-10-06 16:13:17 UTC (rev 11401)
+++ trunk/configure.in 2010-10-06 22:07:06 UTC (rev 11402)
@@ -1888,6 +1888,29 @@
AM_CONDITIONAL([HAVE_BUILTIN_ATOMIC], [test x$ac_have_builtin_atomic = xyes])
+# Check for __builtin_frame_address() support
+AC_MSG_CHECKING([if gcc supports __builtin_frame_address])
+
+AC_TRY_COMPILE(
+[
+], [
+ __builtin_frame_address(0);
+ return 0;
+],
+[
+ac_have_builtin_frame_address=yes
+AC_MSG_RESULT([yes])
+AC_DEFINE([HAVE_BUILTIN_FRAME_ADDRESS], 1,
+ [Define to 1 if your compiler supports __builtin_frame_address.])
+], [
+ac_have_builtin_frame_address=no
+AC_MSG_RESULT([no])
+])
+CFLAGS=$safe_CFLAGS
+
+AM_CONDITIONAL(HAVE_BUILTIN_FRAME_ADDRESS,
+ test x$ac_have_builtin_frame_address = xyes)
+
#----------------------------------------------------------------------------
# Ok. We're done checking.
#----------------------------------------------------------------------------
Modified: trunk/include/valgrind.h
===================================================================
--- trunk/include/valgrind.h 2010-10-06 16:13:17 UTC (rev 11401)
+++ trunk/include/valgrind.h 2010-10-06 22:07:06 UTC (rev 11402)
@@ -1183,6 +1183,63 @@
#define __CALLER_SAVED_REGS /*"rax",*/ "rcx", "rdx", "rsi", \
"rdi", "r8", "r9", "r10", "r11"
+/* This is all pretty complex. It's so as to make stack unwinding
+ work reliably. See bug 243270. The basic problem is the sub and
+ add of 128 of %rsp in all of the following macros. If gcc believes
+ the CFA is in %rsp, then unwinding may fail, because what's at the
+ CFA is not what gcc "expected" when it constructs the CFIs for the
+ places where the macros are instantiated.
+
+ But we can't just add a CFI annotation to increase the CFA offset
+ by 128, to match the sub of 128 from %rsp, because we don't know
+ whether gcc has chosen %rsp as the CFA at that point, or whether it
+ has chosen some other register (eg, %rbp). In the latter case,
+ adding a CFI annotation to change the CFA offset is simply wrong.
+
+ So the solution is to get hold of the CFA using
+ __builtin_frame_address(0), put it in a known register, and add a
+ CFI annotation to say what the register is. We choose %rbp for
+ this (perhaps perversely), because:
+
+ (1) %rbp is already subject to unwinding. If a new register was
+ chosen then the unwinder would have to unwind it in all stack
+ traces, which is expensive, and
+
+ (2) %rbp is already subject to precise exception updates in the
+ JIT. If a new register was chosen, we'd have to have precise
+ exceptions for it too, which reduces performance of the
+ generated code.
+
+ However .. one extra complication. We can't just whack the result
+ of __builtin_frame_address(0) into %rbp and then add %rbp to the
+ list of trashed registers at the end of the inline assembly
+ fragments; gcc won't allow %rbp to appear in that list. Hence
+ instead we need to stash %rbp in %r15 for the duration of the asm,
+ and say that %r15 is trashed instead. gcc seems happy to go with
+ that.
+
+ Oh .. and this all needs to be conditionalised so that it is
+ unchanged from before this commit, when compiled with older gccs
+ that don't support __builtin_frame_address.
+*/
+#if HAVE_BUILTIN_FRAME_ADDRESS
+# define __FRAME_POINTER \
+ ,"r"(__builtin_frame_address(0))
+# define VALGRIND_CFI_PROLOGUE \
+ ".cfi_remember_state\n\t" \
+ "movq %%rbp, %%r15\n\t" \
+ "movq %0, %%rbp\n\t" \
+ ".cfi_def_cfa rbp, 0\n\t"
+# define VALGRIND_CFI_EPILOGUE \
+ "movq %%r15, %%rbp\n\t" \
+ ".cfi_restore_state\n\t"
+#else
+# define __FRAME_POINTER
+# define VALGRIND_CFI_PROLOGUE
+# define VALGRIND_CFI_EPILOGUE
+#endif
+
+
/* These CALL_FN_ macros assume that on amd64-linux, sizeof(unsigned
long) == 8. */
@@ -1214,13 +1271,15 @@
volatile unsigned long _res; \
_argvec[0] = (unsigned long)_orig.nraddr; \
__asm__ volatile( \
+ VALGRIND_CFI_PROLOGUE \
"subq $128,%%rsp\n\t" \
"movq (%%rax), %%rax\n\t" /* target->%rax */ \
VALGRIND_CALL_NOREDIR_RAX \
"addq $128,%%rsp\n\t" \
+ VALGRIND_CFI_EPILOGUE \
: /*out*/ "=a" (_res) \
- : /*in*/ "a" (&_argvec[0]) \
- : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS \
+ : /*in*/ "a" (&_argvec[0]) __FRAME_POINTER \
+ : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS, "r15" \
); \
lval = (__typeof__(lval)) _res; \
} while (0)
@@ -1233,14 +1292,16 @@
_argvec[0] = (unsigned long)_orig.nraddr; \
_argvec[1] = (unsigned long)(arg1); \
__asm__ volatile( \
+ VALGRIND_CFI_PROLOGUE \
"subq $128,%%rsp\n\t" \
"movq 8(%%rax), %%rdi\n\t" \
"movq (%%rax), %%rax\n\t" /* target->%rax */ \
VALGRIND_CALL_NOREDIR_RAX \
"addq $128,%%rsp\n\t" \
+ VALGRIND_CFI_EPILOGUE \
: /*out*/ "=a" (_res) \
- : /*in*/ "a" (&_argvec[0]) \
- : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS \
+ : /*in*/ "a" (&_argvec[0]) __FRAME_POINTER \
+ : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS, "r15" \
); \
lval = (__typeof__(lval)) _res; \
} while (0)
@@ -1254,15 +1315,17 @@
_argvec[1] = (unsigned long)(arg1); \
_argvec[2] = (unsigned long)(arg2); \
__asm__ volatile( \
+ VALGRIND_CFI_PROLOGUE \
"subq $128,%%rsp\n\t" \
"movq 16(%%rax), %%rsi\n\t" \
"movq 8(%%rax), %%rdi\n\t" \
"movq (%%rax), %%rax\n\t" /* target->%rax */ \
VALGRIND_CALL_NOREDIR_RAX \
"addq $128,%%rsp\n\t" \
+ VALGRIND_CFI_EPILOGUE \
: /*out*/ "=a" (_res) \
- : /*in*/ "a" (&_argvec[0]) \
- : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS \
+ : /*in*/ "a" (&_argvec[0]) __FRAME_POINTER \
+ : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS, "r15" \
); \
lval = (__typeof__(lval)) _res; \
} while (0)
@@ -1277,6 +1340,7 @@
_argvec[2] = (unsigned long)(arg2); \
_argvec[3] = (unsigned long)(arg3); \
__asm__ volatile( \
+ VALGRIND_CFI_PROLOGUE \
"subq $128,%%rsp\n\t" \
"movq 24(%%rax), %%rdx\n\t" \
"movq 16(%%rax), %%rsi\n\t" \
@@ -1284,9 +1348,10 @@
"movq (%%rax), %%rax\n\t" /* target->%rax */ \
VALGRIND_CALL_NOREDIR_RAX \
"addq $128,%%rsp\n\t" \
+ VALGRIND_CFI_EPILOGUE \
: /*out*/ "=a" (_res) \
- : /*in*/ "a" (&_argvec[0]) \
- : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS \
+ : /*in*/ "a" (&_argvec[0]) __FRAME_POINTER \
+ : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS, "r15" \
); \
lval = (__typeof__(lval)) _res; \
} while (0)
@@ -1302,6 +1367,7 @@
_argvec[3] = (unsigned long)(arg3); \
_argvec[4] = (unsigned long)(arg4); \
__asm__ volatile( \
+ VALGRIND_CFI_PROLOGUE \
"subq $128,%%rsp\n\t" \
"movq 32(%%rax), %%rcx\n\t" \
"movq 24(%%rax), %%rdx\n\t" \
@@ -1310,9 +1376,10 @@
"movq (%%rax), %%rax\n\t" /* target->%rax */ \
VALGRIND_CALL_NOREDIR_RAX \
"addq $128,%%rsp\n\t" \
+ VALGRIND_CFI_EPILOGUE \
: /*out*/ "=a" (_res) \
- : /*in*/ "a" (&_argvec[0]) \
- : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS \
+ : /*in*/ "a" (&_argvec[0]) __FRAME_POINTER \
+ : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS, "r15" \
); \
lval = (__typeof__(lval)) _res; \
} while (0)
@@ -1329,6 +1396,7 @@
_argvec[4] = (unsigned long)(arg4); \
_argvec[5] = (unsigned long)(arg5); \
__asm__ volatile( \
+ VALGRIND_CFI_PROLOGUE \
"subq $128,%%rsp\n\t" \
"movq 40(%%rax), %%r8\n\t" \
"movq 32(%%rax), %%rcx\n\t" \
@@ -1338,9 +1406,10 @@
"movq (%%rax), %%rax\n\t" /* target->%rax */ \
VALGRIND_CALL_NOREDIR_RAX \
"addq $128,%%rsp\n\t" \
+ VALGRIND_CFI_EPILOGUE \
: /*out*/ "=a" (_res) \
- : /*in*/ "a" (&_argvec[0]) \
- : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS \
+ : /*in*/ "a" (&_argvec[0]) __FRAME_POINTER \
+ : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS, "r15" \
); \
lval = (__typeof__(lval)) _res; \
} while (0)
@@ -1358,6 +1427,7 @@
_argvec[5] = (unsigned long)(arg5); \
_argvec[6] = (unsigned long)(arg6); \
__asm__ volatile( \
+ VALGRIND_CFI_PROLOGUE \
"subq $128,%%rsp\n\t" \
"movq 48(%%rax), %%r9\n\t" \
"movq 40(%%rax), %%r8\n\t" \
@@ -1368,9 +1438,10 @@
"movq (%%rax), %%rax\n\t" /* target->%rax */ \
VALGRIND_CALL_NOREDIR_RAX \
"addq $128,%%rsp\n\t" \
+ VALGRIND_CFI_EPILOGUE \
: /*out*/ "=a" (_res) \
- : /*in*/ "a" (&_argvec[0]) \
- : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS \
+ : /*in*/ "a" (&_argvec[0]) __FRAME_POINTER \
+ : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS, "r15" \
); \
lval = (__typeof__(lval)) _res; \
} while (0)
@@ -1390,6 +1461,7 @@
_argvec[6] = (unsigned long)(arg6); \
_argvec[7] = (unsigned long)(arg7); \
__asm__ volatile( \
+ VALGRIND_CFI_PROLOGUE \
"subq $128,%%rsp\n\t" \
"pushq 56(%%rax)\n\t" \
"movq 48(%%rax), %%r9\n\t" \
@@ -1402,9 +1474,10 @@
VALGRIND_CALL_NOREDIR_RAX \
"addq $8, %%rsp\n" \
"addq $128,%%rsp\n\t" \
+ VALGRIND_CFI_EPILOGUE \
: /*out*/ "=a" (_res) \
- : /*in*/ "a" (&_argvec[0]) \
- : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS \
+ : /*in*/ "a" (&_argvec[0]) __FRAME_POINTER \
+ : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS, "r15" \
); \
lval = (__typeof__(lval)) _res; \
} while (0)
@@ -1425,6 +1498,7 @@
_argvec[7] = (unsigned long)(arg7); \
_argvec[8] = (unsigned long)(arg8); \
__asm__ volatile( \
+ VALGRIND_CFI_PROLOGUE \
"subq $128,%%rsp\n\t" \
"pushq 64(%%rax)\n\t" \
"pushq 56(%%rax)\n\t" \
@@ -1438,9 +1512,10 @@
VALGRIND_CALL_NOREDIR_RAX \
"addq $16, %%rsp\n" \
"addq $128,%%rsp\n\t" \
+ VALGRIND_CFI_EPILOGUE \
: /*out*/ "=a" (_res) \
- : /*in*/ "a" (&_argvec[0]) \
- : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS \
+ : /*in*/ "a" (&_argvec[0]) __FRAME_POINTER \
+ : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS, "r15" \
); \
lval = (__typeof__(lval)) _res; \
} while (0)
@@ -1462,6 +1537,7 @@
_argvec[8] = (unsigned long)(arg8); \
_argvec[9] = (unsigned long)(arg9); \
__asm__ volatile( \
+ VALGRIND_CFI_PROLOGUE \
"subq $128,%%rsp\n\t" \
"pushq 72(%%rax)\n\t" \
"pushq 64(%%rax)\n\t" \
@@ -1476,9 +1552,10 @@
VALGRIND_CALL_NOREDIR_RAX \
"addq $24, %%rsp\n" \
"addq $128,%%rsp\n\t" \
+ VALGRIND_CFI_EPILOGUE \
: /*out*/ "=a" (_res) \
- : /*in*/ "a" (&_argvec[0]) \
- : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS \
+ : /*in*/ "a" (&_argvec[0]) __FRAME_POINTER \
+ : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS, "r15" \
); \
lval = (__typeof__(lval)) _res; \
} while (0)
@@ -1501,6 +1578,7 @@
_argvec[9] = (unsigned long)(arg9); \
_argvec[10] = (unsigned long)(arg10); \
__asm__ volatile( \
+ VALGRIND_CFI_PROLOGUE \
"subq $128,%%rsp\n\t" \
"pushq 80(%%rax)\n\t" \
"pushq 72(%%rax)\n\t" \
@@ -1516,9 +1594,10 @@
VALGRIND_CALL_NOREDIR_RAX \
"addq $32, %%rsp\n" \
"addq $128,%%rsp\n\t" \
+ VALGRIND_CFI_EPILOGUE \
: /*out*/ "=a" (_res) \
- : /*in*/ "a" (&_argvec[0]) \
- : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS \
+ : /*in*/ "a" (&_argvec[0]) __FRAME_POINTER \
+ : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS, "r15" \
); \
lval = (__typeof__(lval)) _res; \
} while (0)
@@ -1542,6 +1621,7 @@
_argvec[10] = (unsigned long)(arg10); \
_argvec[11] = (unsigned long)(arg11); \
__asm__ volatile( \
+ VALGRIND_CFI_PROLOGUE \
"subq $128,%%rsp\n\t" \
"pushq 88(%%rax)\n\t" \
"pushq 80(%%rax)\n\t" \
@@ -1558,9 +1638,10 @@
VALGRIND_CALL_NOREDIR_RAX \
"addq $40, %%rsp\n" \
"addq $128,%%rsp\n\t" \
+ VALGRIND_CFI_EPILOGUE \
: /*out*/ "=a" (_res) \
- : /*in*/ "a" (&_argvec[0]) \
- : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS \
+ : /*in*/ "a" (&_argvec[0]) __FRAME_POINTER \
+ : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS, "r15" \
); \
lval = (__typeof__(lval)) _res; \
} while (0)
@@ -1585,6 +1666,7 @@
_argvec[11] = (unsigned long)(arg11); \
_argvec[12] = (unsigned long)(arg12); \
__asm__ volatile( \
+ VALGRIND_CFI_PROLOGUE \
"subq $128,%%rsp\n\t" \
"pushq 96(%%rax)\n\t" \
"pushq 88(%%rax)\n\t" \
@@ -1602,9 +1684,10 @@
VALGRIND_CALL_NOREDIR_RAX \
"addq $48, %%rsp\n" \
"addq $128,%%rsp\n\t" \
+ VALGRIND_CFI_EPILOGUE \
: /*out*/ "=a" (_res) \
- : /*in*/ "a" (&_argvec[0]) \
- : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS \
+ : /*in*/ "a" (&_argvec[0]) __FRAME_POINTER \
+ : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS, "r15" \
); \
lval = (__typeof__(lval)) _res; \
} while (0)
|
|
From: <sv...@va...> - 2010-10-06 20:47:36
|
Author: sewardj
Date: 2010-10-06 21:47:22 +0100 (Wed, 06 Oct 2010)
New Revision: 2062
Log:
NEON front end: fix bugs in VMIN, VZIP, VRSHL.
(Dmitry Zhurikhin, zh...@is...), no bug number.
Modified:
trunk/priv/guest_arm_toIR.c
Modified: trunk/priv/guest_arm_toIR.c
===================================================================
--- trunk/priv/guest_arm_toIR.c 2010-10-06 20:34:53 UTC (rev 2061)
+++ trunk/priv/guest_arm_toIR.c 2010-10-06 20:47:22 UTC (rev 2062)
@@ -3883,7 +3883,7 @@
binop(op,
mkexpr(arg_m),
unop(Iop_64to8,
- binop(op_sub,
+ binop(op_add,
mkexpr(arg_n),
mkexpr(imm_val)))),
binop(Q ? Iop_AndV128 : Iop_And64,
@@ -4125,9 +4125,9 @@
}
} else {
switch (size) {
- case 0: op = Q ? Iop_Min8Sx16 : Iop_Min8Sx8; break;
- case 1: op = Q ? Iop_Min16Sx8 : Iop_Min16Sx4; break;
- case 2: op = Q ? Iop_Min32Sx4 : Iop_Min32Sx2; break;
+ case 0: op = Q ? Iop_Min8Ux16 : Iop_Min8Ux8; break;
+ case 1: op = Q ? Iop_Min16Ux8 : Iop_Min16Ux4; break;
+ case 2: op = Q ? Iop_Min32Ux4 : Iop_Min32Ux2; break;
case 3: return False;
default: vassert(0);
}
@@ -7286,7 +7286,7 @@
}
switch (size) {
case 0:
- op_lo = Q ? Iop_InterleaveLO8x16 : Iop_InterleaveHI8x8;
+ op_lo = Q ? Iop_InterleaveHI8x16 : Iop_InterleaveHI8x8;
op_hi = Q ? Iop_InterleaveLO8x16 : Iop_InterleaveLO8x8;
break;
case 1:
|
|
From: <sv...@va...> - 2010-10-06 20:35:04
|
Author: sewardj
Date: 2010-10-06 21:34:53 +0100 (Wed, 06 Oct 2010)
New Revision: 2061
Log:
Fix some enum type confusion in host_arm_defs.[ch].
(Dmitry Zhurikhin, zh...@is...), no bug number.
Modified:
trunk/priv/host_arm_defs.c
trunk/priv/host_arm_defs.h
Modified: trunk/priv/host_arm_defs.c
===================================================================
--- trunk/priv/host_arm_defs.c 2010-10-05 22:29:49 UTC (rev 2060)
+++ trunk/priv/host_arm_defs.c 2010-10-06 20:34:53 UTC (rev 2061)
@@ -783,10 +783,8 @@
case ARMneon_VQRDMULH: return "vqrdmulh";
case ARMneon_VQDMULL: return "vqdmull";
case ARMneon_VTBL: return "vtbl";
- case ARMneon_SETELEM: return "vmov";
- case ARMneon_VABSFP: return "vabsfp";
- case ARMneon_VRSQRTEFP: return "vrsqrtefp";
- case ARMneon_VRSQRTE: return "vrsqrte";
+ case ARMneon_VRECPS: return "vrecps";
+ case ARMneon_VRSQRTS: return "vrecps";
/* ... */
default: vpanic("showARMNeonBinOp");
}
@@ -802,7 +800,6 @@
case ARMneon_VSUB:
case ARMneon_VEXT:
case ARMneon_VMUL:
- case ARMneon_SETELEM:
case ARMneon_VPADD:
case ARMneon_VTBL:
case ARMneon_VCEQ:
@@ -817,7 +814,6 @@
case ARMneon_VMULLU:
case ARMneon_VPMINU:
case ARMneon_VPMAXU:
- case ARMneon_VRSQRTE:
return ".u";
case ARMneon_VRHADDS:
case ARMneon_VMINS:
@@ -843,13 +839,13 @@
case ARMneon_VMULFP:
case ARMneon_VMINF:
case ARMneon_VMAXF:
- case ARMneon_VABSFP:
- case ARMneon_VRSQRTEFP:
case ARMneon_VPMINF:
case ARMneon_VPMAXF:
case ARMneon_VCGTF:
case ARMneon_VCGEF:
case ARMneon_VCEQF:
+ case ARMneon_VRECPS:
+ case ARMneon_VRSQRTS:
return ".f";
/* ... */
default: vpanic("showARMNeonBinOpDataType");
@@ -891,10 +887,11 @@
case ARMneon_VCVTF16toF32: return "vcvt";
case ARMneon_VRECIP: return "vrecip";
case ARMneon_VRECIPF: return "vrecipf";
- case ARMneon_VRECPS: return "vrecps";
case ARMneon_VNEGF: return "vneg";
- case ARMneon_VRSQRTS: return "vrecps";
case ARMneon_ABS: return "vabs";
+ case ARMneon_VABSFP: return "vabsfp";
+ case ARMneon_VRSQRTEFP: return "vrsqrtefp";
+ case ARMneon_VRSQRTE: return "vrsqrte";
/* ... */
default: vpanic("showARMNeonUnOp");
}
@@ -918,6 +915,7 @@
case ARMneon_COPYQNUU:
case ARMneon_VQSHLNUU:
case ARMneon_VRECIP:
+ case ARMneon_VRSQRTE:
return ".u";
case ARMneon_CLS:
case ARMneon_CLZ:
@@ -930,9 +928,9 @@
case ARMneon_ABS:
return ".s";
case ARMneon_VRECIPF:
- case ARMneon_VRECPS:
case ARMneon_VNEGF:
- case ARMneon_VRSQRTS:
+ case ARMneon_VABSFP:
+ case ARMneon_VRSQRTEFP:
return ".f";
case ARMneon_VCVTFtoU: return ".u32.f32";
case ARMneon_VCVTFtoS: return ".s32.f32";
@@ -3305,7 +3303,7 @@
UInt insn;
UInt opc, opc1, opc2;
switch (i->ARMin.NUnaryS.op) {
- case ARMneon_VDUP:
+ case ARMneon_VDUP:
if (i->ARMin.NUnaryS.size >= 16)
goto bad;
if (i->ARMin.NUnaryS.dst->tag != ARMNRS_Reg)
@@ -3326,7 +3324,7 @@
(i->ARMin.NUnaryS.size & 0xf), regD,
X1100, BITS4(0,Q,M,0), regM);
*p++ = insn;
- goto done;
+ goto done;
case ARMneon_SETELEM:
regD = Q ? (qregNo(i->ARMin.NUnaryS.dst->reg) << 1) :
dregNo(i->ARMin.NUnaryS.dst->reg);
@@ -3665,6 +3663,7 @@
insn = XXXXXXXX(0xF, X0011, BITS4(1,D,1,1), X1001, regD, X0111,
BITS4(1,Q,M,0), regM);
break;
+
default:
goto bad;
}
Modified: trunk/priv/host_arm_defs.h
===================================================================
--- trunk/priv/host_arm_defs.h 2010-10-05 22:29:49 UTC (rev 2060)
+++ trunk/priv/host_arm_defs.h 2010-10-06 20:34:53 UTC (rev 2061)
@@ -463,14 +463,8 @@
ARMneon_VPMAXF,
ARMneon_VTBL,
ARMneon_VQDMULL,
- ARMneon_VDUP,
- ARMneon_VRECIP,
ARMneon_VRECPS,
- ARMneon_VRECIPF,
ARMneon_VRSQRTS,
- ARMneon_VABSFP,
- ARMneon_VRSQRTEFP,
- ARMneon_VRSQRTE
/* ... */
}
ARMNeonBinOp;
@@ -520,6 +514,11 @@
ARMneon_REV64,
ARMneon_ABS,
ARMneon_VNEGF,
+ ARMneon_VRECIP,
+ ARMneon_VRECIPF,
+ ARMneon_VABSFP,
+ ARMneon_VRSQRTEFP,
+ ARMneon_VRSQRTE
/* ... */
}
ARMNeonUnOp;
@@ -528,7 +527,8 @@
enum {
ARMneon_SETELEM=200,
ARMneon_GETELEMU,
- ARMneon_GETELEMS
+ ARMneon_GETELEMS,
+ ARMneon_VDUP,
}
ARMNeonUnOpS;
@@ -861,7 +861,7 @@
ARMAModeN *amode;
} NLdStD;
struct {
- ARMNeonUnOp op;
+ ARMNeonUnOpS op;
ARMNRS* dst;
ARMNRS* src;
UInt size;
|
|
From: <sv...@va...> - 2010-10-06 16:13:26
|
Author: bart
Date: 2010-10-06 17:13:17 +0100 (Wed, 06 Oct 2010)
New Revision: 11401
Log:
ppc/jm-insns.c: Use proper integer types.
Modified:
trunk/none/tests/ppc32/jm-insns.c
Modified: trunk/none/tests/ppc32/jm-insns.c
===================================================================
--- trunk/none/tests/ppc32/jm-insns.c 2010-10-06 15:55:59 UTC (rev 11400)
+++ trunk/none/tests/ppc32/jm-insns.c 2010-10-06 16:13:17 UTC (rev 11401)
@@ -169,6 +169,8 @@
#include "tests/sys_mman.h"
#include "tests/malloc.h" // memalign16
+#define STATIC_ASSERT(e) sizeof(struct { int:-!(e); })
+
/* Something of the same size as void*, so can be safely be coerced
* to/from a pointer type. Also same size as the host's gp registers.
* According to the AltiVec section of the GCC manual, the syntax does
@@ -176,16 +178,25 @@
* with the vector keyword, so typedefs uint[32|64]_t are #undef'ed here
* and redefined using #define.
*/
-#ifndef __powerpc64__
#undef uint32_t
+#undef uint64_t
#define uint32_t unsigned int
+#ifndef __powerpc64__
+#define uint64_t unsigned long long
+#else
+#define uint64_t unsigned long
+#endif /* __powerpc64__ */
+
+#ifndef __powerpc64__
typedef uint32_t HWord_t;
#else
-#undef uint64_t
-#define uint64_t unsigned long
typedef uint64_t HWord_t;
-#endif // #ifndef __powerpc64__
+#endif /* __powerpc64__ */
+enum {
+ compile_time_test1 = STATIC_ASSERT(sizeof(uint32_t) == 4),
+ compile_time_test2 = STATIC_ASSERT(sizeof(uint64_t) == 8),
+};
#define ALLCR "cr0","cr1","cr2","cr3","cr4","cr5","cr6","cr7"
|
|
From: <sv...@va...> - 2010-10-06 15:56:09
|
Author: sewardj Date: 2010-10-06 16:55:59 +0100 (Wed, 06 Oct 2010) New Revision: 11400 Log: Define VKI_SHMLBA for all supported Linux targets, thereby unbreaking the breakage created by r11399. Part of #222545. Modified: trunk/include/vki/vki-amd64-linux.h trunk/include/vki/vki-arm-linux.h trunk/include/vki/vki-ppc32-linux.h trunk/include/vki/vki-ppc64-linux.h trunk/include/vki/vki-x86-linux.h Modified: trunk/include/vki/vki-amd64-linux.h =================================================================== --- trunk/include/vki/vki-amd64-linux.h 2010-10-06 15:24:39 UTC (rev 11399) +++ trunk/include/vki/vki-amd64-linux.h 2010-10-06 15:55:59 UTC (rev 11400) @@ -63,6 +63,12 @@ #define VKI_MAX_PAGE_SIZE VKI_PAGE_SIZE //---------------------------------------------------------------------- +// From linux-2.6.35.4/arch/x86/include/asm/shmparam.h +//---------------------------------------------------------------------- + +#define VKI_SHMLBA VKI_PAGE_SIZE + +//---------------------------------------------------------------------- // From linux-2.6.9/include/asm-x86_64/signal.h //---------------------------------------------------------------------- Modified: trunk/include/vki/vki-arm-linux.h =================================================================== --- trunk/include/vki/vki-arm-linux.h 2010-10-06 15:24:39 UTC (rev 11399) +++ trunk/include/vki/vki-arm-linux.h 2010-10-06 15:55:59 UTC (rev 11400) @@ -64,6 +64,12 @@ #define VKI_MAX_PAGE_SIZE VKI_PAGE_SIZE //---------------------------------------------------------------------- +// From linux-2.6.35.4/arch/arm/include/asm/shmparam.h +//---------------------------------------------------------------------- + +#define VKI_SHMLBA (4 * VKI_PAGE_SIZE) + +//---------------------------------------------------------------------- // From linux-2.6.8.1/include/asm-i386/signal.h //---------------------------------------------------------------------- @@ -731,9 +737,6 @@ #define VKI_SHMGET 23 #define VKI_SHMCTL 24 -#define VKI_SHMLBA (4 * VKI_PAGE_SIZE) - - //---------------------------------------------------------------------- // From linux-2.6.8.1/include/asm-i386/shmbuf.h //---------------------------------------------------------------------- Modified: trunk/include/vki/vki-ppc32-linux.h =================================================================== --- trunk/include/vki/vki-ppc32-linux.h 2010-10-06 15:24:39 UTC (rev 11399) +++ trunk/include/vki/vki-ppc32-linux.h 2010-10-06 15:55:59 UTC (rev 11400) @@ -69,6 +69,12 @@ #define VKI_MAX_PAGE_SIZE (1UL << VKI_MAX_PAGE_SHIFT) //---------------------------------------------------------------------- +// From linux-2.6.35.4/arch/powerpc/include/asm/shmparam.h +//---------------------------------------------------------------------- + +#define VKI_SHMLBA VKI_PAGE_SIZE + +//---------------------------------------------------------------------- // From linux-2.6.9/include/asm-ppc/signal.h //---------------------------------------------------------------------- Modified: trunk/include/vki/vki-ppc64-linux.h =================================================================== --- trunk/include/vki/vki-ppc64-linux.h 2010-10-06 15:24:39 UTC (rev 11399) +++ trunk/include/vki/vki-ppc64-linux.h 2010-10-06 15:55:59 UTC (rev 11400) @@ -70,6 +70,12 @@ #define VKI_MAX_PAGE_SIZE (1UL << VKI_MAX_PAGE_SHIFT) //---------------------------------------------------------------------- +// From linux-2.6.35.4/arch/powerpc/include/asm/shmparam.h +//---------------------------------------------------------------------- + +#define VKI_SHMLBA VKI_PAGE_SIZE + +//---------------------------------------------------------------------- // From linux-2.6.13/include/asm-ppc64/signal.h //---------------------------------------------------------------------- Modified: trunk/include/vki/vki-x86-linux.h =================================================================== --- trunk/include/vki/vki-x86-linux.h 2010-10-06 15:24:39 UTC (rev 11399) +++ trunk/include/vki/vki-x86-linux.h 2010-10-06 15:55:59 UTC (rev 11400) @@ -64,6 +64,12 @@ #define VKI_MAX_PAGE_SIZE VKI_PAGE_SIZE //---------------------------------------------------------------------- +// From linux-2.6.35.4/arch/x86/include/asm/shmparam.h +//---------------------------------------------------------------------- + +#define VKI_SHMLBA VKI_PAGE_SIZE + +//---------------------------------------------------------------------- // From linux-2.6.8.1/include/asm-i386/signal.h //---------------------------------------------------------------------- |
|
From: <sv...@va...> - 2010-10-06 15:24:47
|
Author: sewardj
Date: 2010-10-06 16:24:39 +0100 (Wed, 06 Oct 2010)
New Revision: 11399
Log:
Make client sys_shmat work properly on arm-linux by taking into
account rounding requirements to SHMLBA. Modified version of a patch
by Kirill Batuzov, bat...@is.... This fixes the main bug in
#222545. Temporarily breaks the build on all other platforms though.
Modified:
trunk/coregrind/m_syswrap/syswrap-arm-linux.c
trunk/coregrind/m_syswrap/syswrap-generic.c
trunk/include/vki/vki-arm-linux.h
trunk/include/vki/vki-linux.h
Modified: trunk/coregrind/m_syswrap/syswrap-arm-linux.c
===================================================================
--- trunk/coregrind/m_syswrap/syswrap-arm-linux.c 2010-10-06 12:59:44 UTC (rev 11398)
+++ trunk/coregrind/m_syswrap/syswrap-arm-linux.c 2010-10-06 15:24:39 UTC (rev 11399)
@@ -831,6 +831,12 @@
PRINT("wrap_sys_shmat ( %ld, %#lx, %ld )",ARG1,ARG2,ARG3);
PRE_REG_READ3(long, "shmat",
int, shmid, const void *, shmaddr, int, shmflg);
+ /* Round the attach address down to an VKI_SHMLBA boundary if the
+ client requested rounding. See #222545. This is necessary only
+ on arm-linux because VKI_SHMLBA is 4 * VKI_PAGE size; on all
+ other linux targets it is the same as the page size. */
+ if (ARG3 & VKI_SHM_RND)
+ ARG2 = VG_ROUNDDN(ARG2, VKI_SHMLBA);
arg2tmp = ML_(generic_PRE_sys_shmat)(tid, ARG1,ARG2,ARG3);
if (arg2tmp == 0)
SET_STATUS_Failure( VKI_EINVAL );
Modified: trunk/coregrind/m_syswrap/syswrap-generic.c
===================================================================
--- trunk/coregrind/m_syswrap/syswrap-generic.c 2010-10-06 12:59:44 UTC (rev 11398)
+++ trunk/coregrind/m_syswrap/syswrap-generic.c 2010-10-06 15:24:39 UTC (rev 11399)
@@ -1741,9 +1741,26 @@
UWord tmp;
Bool ok;
if (arg1 == 0) {
+ /* arm-linux only: work around the fact that
+ VG_(am_get_advisory_client_simple) produces something that is
+ VKI_PAGE_SIZE aligned, whereas what we want is something
+ VKI_SHMLBA aligned, and VKI_SHMLBA >= VKI_PAGE_SIZE. Hence
+ increase the request size by VKI_SHMLBA - VKI_PAGE_SIZE and
+ then round the result up to the next VKI_SHMLBA boundary.
+ See bug 222545 comment 15. So far, arm-linux is the only
+ platform where this is known to be necessary. */
+ vg_assert(VKI_SHMLBA >= VKI_PAGE_SIZE);
+ if (VKI_SHMLBA > VKI_PAGE_SIZE) {
+ segmentSize += VKI_SHMLBA - VKI_PAGE_SIZE;
+ }
tmp = VG_(am_get_advisory_client_simple)(0, segmentSize, &ok);
- if (ok)
- arg1 = tmp;
+ if (ok) {
+ if (VKI_SHMLBA > VKI_PAGE_SIZE) {
+ arg1 = VG_ROUNDUP(tmp, VKI_SHMLBA);
+ } else {
+ arg1 = tmp;
+ }
+ }
}
else if (!ML_(valid_client_addr)(arg1, segmentSize, tid, "shmat"))
arg1 = 0;
Modified: trunk/include/vki/vki-arm-linux.h
===================================================================
--- trunk/include/vki/vki-arm-linux.h 2010-10-06 12:59:44 UTC (rev 11398)
+++ trunk/include/vki/vki-arm-linux.h 2010-10-06 15:24:39 UTC (rev 11399)
@@ -731,7 +731,9 @@
#define VKI_SHMGET 23
#define VKI_SHMCTL 24
+#define VKI_SHMLBA (4 * VKI_PAGE_SIZE)
+
//----------------------------------------------------------------------
// From linux-2.6.8.1/include/asm-i386/shmbuf.h
//----------------------------------------------------------------------
Modified: trunk/include/vki/vki-linux.h
===================================================================
--- trunk/include/vki/vki-linux.h 2010-10-06 12:59:44 UTC (rev 11398)
+++ trunk/include/vki/vki-linux.h 2010-10-06 15:24:39 UTC (rev 11399)
@@ -1451,6 +1451,7 @@
};
#define VKI_SHM_RDONLY 010000 /* read-only access */
+#define VKI_SHM_RND 020000 /* round attach address to SHMLBA boundary */
#define VKI_SHM_STAT 13
#define VKI_SHM_INFO 14
|
|
From: <sv...@va...> - 2010-10-06 12:59:52
|
Author: sewardj
Date: 2010-10-06 13:59:44 +0100 (Wed, 06 Oct 2010)
New Revision: 11398
Log:
get_shm_size(): pass VKI_IPC_64 to our shmctl call if it is available,
except on amd64-linux. This fixes a secondary problem discussed
in bug 222545. (Kirill Batuzov, bat...@is...)
Modified:
trunk/coregrind/m_syswrap/syswrap-generic.c
Modified: trunk/coregrind/m_syswrap/syswrap-generic.c
===================================================================
--- trunk/coregrind/m_syswrap/syswrap-generic.c 2010-10-06 11:38:01 UTC (rev 11397)
+++ trunk/coregrind/m_syswrap/syswrap-generic.c 2010-10-06 12:59:44 UTC (rev 11398)
@@ -1709,11 +1709,18 @@
#ifdef __NR_shmctl
# ifdef VKI_IPC_64
struct vki_shmid64_ds buf;
- SysRes __res = VG_(do_syscall3)(__NR_shmctl, shmid, VKI_IPC_STAT, (UWord)&buf);
-# else
+# ifdef VGP_amd64_linux
+ /* See bug 222545 comment 7 */
+ SysRes __res = VG_(do_syscall3)(__NR_shmctl, shmid,
+ VKI_IPC_STAT, (UWord)&buf);
+# else
+ SysRes __res = VG_(do_syscall3)(__NR_shmctl, shmid,
+ VKI_IPC_STAT|VKI_IPC_64, (UWord)&buf);
+# endif
+# else /* !def VKI_IPC_64 */
struct vki_shmid_ds buf;
SysRes __res = VG_(do_syscall3)(__NR_shmctl, shmid, VKI_IPC_STAT, (UWord)&buf);
-# endif
+# endif /* def VKI_IPC_64 */
#else
struct vki_shmid_ds buf;
SysRes __res = VG_(do_syscall5)(__NR_ipc, 24 /* IPCOP_shmctl */, shmid,
|
|
From: <sv...@va...> - 2010-10-06 11:38:11
|
Author: sewardj
Date: 2010-10-06 12:38:01 +0100 (Wed, 06 Oct 2010)
New Revision: 11397
Log:
When opening an mmaped file to see if it's an ELF file that we should
read debuginfo from, use VKI_O_LARGEFILE, so as to ensure the open
succeeds for large files on 32-bit systems. Fixes #234064.
Modified:
trunk/coregrind/m_debuginfo/debuginfo.c
Modified: trunk/coregrind/m_debuginfo/debuginfo.c
===================================================================
--- trunk/coregrind/m_debuginfo/debuginfo.c 2010-10-06 11:25:29 UTC (rev 11396)
+++ trunk/coregrind/m_debuginfo/debuginfo.c 2010-10-06 11:38:01 UTC (rev 11397)
@@ -727,7 +727,7 @@
/* Peer at the first few bytes of the file, to see if it is an ELF */
/* object file. Ignore the file if we do not have read permission. */
VG_(memset)(buf1k, 0, sizeof(buf1k));
- fd = VG_(open)( filename, VKI_O_RDONLY, 0 );
+ fd = VG_(open)( filename, VKI_O_RDONLY|VKI_O_LARGEFILE, 0 );
if (sr_isError(fd)) {
if (sr_Err(fd) != VKI_EACCES) {
DebugInfo fake_di;
|
|
From: <sv...@va...> - 2010-10-06 11:25:42
|
Author: sewardj
Date: 2010-10-06 12:25:29 +0100 (Wed, 06 Oct 2010)
New Revision: 11396
Log:
Handle mq_* syscalls. Fixes #243884. (David Fenger, dkf...@gm...)
Modified:
trunk/exp-ptrcheck/h_main.c
Modified: trunk/exp-ptrcheck/h_main.c
===================================================================
--- trunk/exp-ptrcheck/h_main.c 2010-10-04 20:55:21 UTC (rev 11395)
+++ trunk/exp-ptrcheck/h_main.c 2010-10-06 11:25:29 UTC (rev 11396)
@@ -2483,6 +2483,14 @@
ADD(0, __NR_wait4);
ADD(0, __NR_write);
ADD(0, __NR_writev);
+# if defined(__NR_mq_open)
+ ADD(0, __NR_mq_open);
+ ADD(0, __NR_mq_unlink);
+ ADD(0, __NR_mq_timedsend);
+ ADD(0, __NR_mq_timedreceive);
+ ADD(0, __NR_mq_notify);
+ ADD(0, __NR_mq_getsetattr);
+# endif
/* Whereas the following need special treatment */
# if defined(__NR_arch_prctl)
|
|
From: Julian S. <js...@ac...> - 2010-10-06 11:06:10
|
On Wednesday, October 06, 2010, Christian Borntraeger wrote: > The following ltp testcase trigger a VALGRIND INTERNAL ERROR by having > wrong system call input [...] > > Do you need a bugzilla? Is one bugzilla for all system calls ok? Excellent work, but please .. put it in bugzilla, so it can be tracked properly. Patches that go only to the mailing list tend to get forgotten about. J |
|
From: Christian B. <bor...@de...> - 2010-10-06 10:26:03
|
Am 06.10.2010 12:14, schrieb Christian Borntraeger: > The following ltp testcase trigger a VALGRIND INTERNAL ERROR by having wrong system call input: > > bind01 > connect01 > io_submit01 > recvmsg01 > rt_sigaction02 > rt_sigprocmask02 > sendto01 > setrlimit02 > > Here is a patch that uses ML_(valid_client_addr) to check memory that is > inspected by our syscall wrappers. Please review and apply if appropriate. > > Do you need a bugzilla? Is one bugzilla for all system calls ok? > > Credits for finding the bug go to Stefan Wild from our test department. > > > Christian Sorry, I attached an unrefreshed patch, here is a new one. |
|
From: Christian B. <bor...@de...> - 2010-10-06 10:15:01
|
The following ltp testcase trigger a VALGRIND INTERNAL ERROR by having wrong system call input: bind01 connect01 io_submit01 recvmsg01 rt_sigaction02 rt_sigprocmask02 sendto01 setrlimit02 Here is a patch that uses ML_(valid_client_addr) to check memory that is inspected by our syscall wrappers. Please review and apply if appropriate. Do you need a bugzilla? Is one bugzilla for all system calls ok? Credits for finding the bug go to Stefan Wild from our test department. Christian |