|
From: Christian B. <bor...@de...> - 2015-04-21 02:20:54
Attachments:
diffs.txt
|
valgrind revision: 15120
VEX revision: 3138
C compiler: gcc (SUSE Linux) 4.3.4 [gcc-4_3-branch revision 152973]
GDB: GNU gdb (GDB) SUSE (7.5.1-0.7.29)
Assembler: GNU assembler (GNU Binutils; SUSE Linux Enterprise 11) 2.23.1
C library: GNU C Library stable release version 2.11.3 (20110527)
uname -mrs: Linux 3.0.101-0.47.52-default s390x
Vendor version: Welcome to SUSE Linux Enterprise Server 11 SP3 (s390x) - Kernel %r (%t).
Nightly build on sless390 ( SUSE Linux Enterprise Server 11 SP3 gcc 4.3.4 on z196 (s390x) )
Started at 2015-04-21 03:45:01 CEST
Ended at 2015-04-21 04:20:38 CEST
Results unchanged from 24 hours ago
Checking out valgrind source tree ... done
Configuring valgrind ... done
Building valgrind ... done
Running regression tests ... failed
Regression test results follow
== 673 tests, 5 stderr failures, 1 stdout failure, 0 stderrB failures, 0 stdoutB failures, 0 post failures ==
memcheck/tests/memcmptest (stderr)
memcheck/tests/origin5-bz2 (stderr)
none/tests/bigcode (stdout)
none/tests/bigcode (stderr)
helgrind/tests/pth_cond_destroy_busy (stderr)
helgrind/tests/tc20_verifywrap (stderr)
--tools=none,memcheck,callgrind,helgrind,cachegrind,drd,massif --reps=3 --vg=../valgrind-new --vg=../valgrind-old
-- Running tests in perf ----------------------------------------------
-- bigcode1 --
bigcode1 valgrind-new:0.22s no:
*** Command returned non-zero (2816)
*** See perf.{cmd,stdout,stderr} to determine what went wrong.
real 0m2.907s
user 0m2.848s
sys 0m0.033s
|
|
From: Christian B. <bor...@de...> - 2015-04-21 09:09:38
|
Am 21.04.2015 um 04:20 schrieb Christian Borntraeger:
> none/tests/bigcode (stdout)
> none/tests/bigcode (stderr)
[...]
> --tools=none,memcheck,callgrind,helgrind,cachegrind,drd,massif --reps=3 --vg=../valgrind-new --vg=../valgrind-old
> -- Running tests in perf ----------------------------------------------
> -- bigcode1 --
> bigcode1 valgrind-new:0.22s no:
> *** Command returned non-zero (2816)
> *** See perf.{cmd,stdout,stderr} to determine what went wrong.
I looked into these bugs and the problem is that the gcc on SLES11 compiles for z900 which
only has instructions with small immediates. So gcc thinks that f should load some values
from the literal pool with a relative instruction:
00000000800007b4 <f>:
800007b4: e3 d0 f0 68 00 24 stg %r13,104(%r15)
800007ba: c0 d0 00 00 02 3f larl %r13,80000c38 <------ relative load (2*0x23f bytes)
800007c0: b9 04 00 02 lgr %r0,%r2
800007c4: b9 04 00 23 lgr %r2,%r3
800007c8: 18 10 lr %r1,%r0
800007ca: 54 10 d0 04 n %r1,4(%r13)
800007ce: 12 11 ltr %r1,%r1
800007d0: a7 a4 00 08 jhe 800007e0 <f+0x2c>
800007d4: a7 1a ff ff ahi %r1,-1
[...]
This of course fails miserably as soon as the function is copied (wrong values
even without values, crashes with valgrind) and the literal is not.
Several ways of fixing this
(a) using -march=z9-109 or later avoids literal pool (this system introduced extended immidiate values). This will happen anyway on recent distros (z9 was introduced 10 years ago, so most distros
have a default compiler option for z9 or later)
(b) further increase func size.
I guess the tilegx fix r15095 recently fixes the same issue - the code can not be that big. ZhiGang,
can you double check if your fix was also actually a literal pool value and not code?
In our case we are beyond 1024 bytes, see above: 7ba ---> c38 is 1150 bytes
Now this patch fixes the issue:
Index: perf/bigcode.c
===================================================================
--- perf/bigcode.c (revision 15120)
+++ perf/bigcode.c (working copy)
@@ -20,7 +20,8 @@
#endif
#include "tests/sys_mman.h"
-#define FN_SIZE 1024 // Must be big enough to hold the compiled f()
+#define FN_SIZE 1280 // Must be big enough to hold the compiled f()
+ // and any literal pool that might be used
#define N_LOOPS 20000 // Should be divisible by four
#define RATIO 4 // Ratio of code sizes between the two modes
So I am tempted to check in aboves patch.
Now: This actually shows, that the mmap PROT_WRITE | PROT_EXEC is wrong. Strictly
speaking we __need__ PROT_READ as well. It just does not matter as PROT_EXEC implies
PROT_READ on a page table level.
Opinions?
Christian
|