You can subscribe to this list here.
| 2002 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(1) |
Oct
(122) |
Nov
(152) |
Dec
(69) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2003 |
Jan
(6) |
Feb
(25) |
Mar
(73) |
Apr
(82) |
May
(24) |
Jun
(25) |
Jul
(10) |
Aug
(11) |
Sep
(10) |
Oct
(54) |
Nov
(203) |
Dec
(182) |
| 2004 |
Jan
(307) |
Feb
(305) |
Mar
(430) |
Apr
(312) |
May
(187) |
Jun
(342) |
Jul
(487) |
Aug
(637) |
Sep
(336) |
Oct
(373) |
Nov
(441) |
Dec
(210) |
| 2005 |
Jan
(385) |
Feb
(480) |
Mar
(636) |
Apr
(544) |
May
(679) |
Jun
(625) |
Jul
(810) |
Aug
(838) |
Sep
(634) |
Oct
(521) |
Nov
(965) |
Dec
(543) |
| 2006 |
Jan
(494) |
Feb
(431) |
Mar
(546) |
Apr
(411) |
May
(406) |
Jun
(322) |
Jul
(256) |
Aug
(401) |
Sep
(345) |
Oct
(542) |
Nov
(308) |
Dec
(481) |
| 2007 |
Jan
(427) |
Feb
(326) |
Mar
(367) |
Apr
(255) |
May
(244) |
Jun
(204) |
Jul
(223) |
Aug
(231) |
Sep
(354) |
Oct
(374) |
Nov
(497) |
Dec
(362) |
| 2008 |
Jan
(322) |
Feb
(482) |
Mar
(658) |
Apr
(422) |
May
(476) |
Jun
(396) |
Jul
(455) |
Aug
(267) |
Sep
(280) |
Oct
(253) |
Nov
(232) |
Dec
(304) |
| 2009 |
Jan
(486) |
Feb
(470) |
Mar
(458) |
Apr
(423) |
May
(696) |
Jun
(461) |
Jul
(551) |
Aug
(575) |
Sep
(134) |
Oct
(110) |
Nov
(157) |
Dec
(102) |
| 2010 |
Jan
(226) |
Feb
(86) |
Mar
(147) |
Apr
(117) |
May
(107) |
Jun
(203) |
Jul
(193) |
Aug
(238) |
Sep
(300) |
Oct
(246) |
Nov
(23) |
Dec
(75) |
| 2011 |
Jan
(133) |
Feb
(195) |
Mar
(315) |
Apr
(200) |
May
(267) |
Jun
(293) |
Jul
(353) |
Aug
(237) |
Sep
(278) |
Oct
(611) |
Nov
(274) |
Dec
(260) |
| 2012 |
Jan
(303) |
Feb
(391) |
Mar
(417) |
Apr
(441) |
May
(488) |
Jun
(655) |
Jul
(590) |
Aug
(610) |
Sep
(526) |
Oct
(478) |
Nov
(359) |
Dec
(372) |
| 2013 |
Jan
(467) |
Feb
(226) |
Mar
(391) |
Apr
(281) |
May
(299) |
Jun
(252) |
Jul
(311) |
Aug
(352) |
Sep
(481) |
Oct
(571) |
Nov
(222) |
Dec
(231) |
| 2014 |
Jan
(185) |
Feb
(329) |
Mar
(245) |
Apr
(238) |
May
(281) |
Jun
(399) |
Jul
(382) |
Aug
(500) |
Sep
(579) |
Oct
(435) |
Nov
(487) |
Dec
(256) |
| 2015 |
Jan
(338) |
Feb
(357) |
Mar
(330) |
Apr
(294) |
May
(191) |
Jun
(108) |
Jul
(142) |
Aug
(261) |
Sep
(190) |
Oct
(54) |
Nov
(83) |
Dec
(22) |
| 2016 |
Jan
(49) |
Feb
(89) |
Mar
(33) |
Apr
(50) |
May
(27) |
Jun
(34) |
Jul
(53) |
Aug
(53) |
Sep
(98) |
Oct
(206) |
Nov
(93) |
Dec
(53) |
| 2017 |
Jan
(65) |
Feb
(82) |
Mar
(102) |
Apr
(86) |
May
(187) |
Jun
(67) |
Jul
(23) |
Aug
(93) |
Sep
(65) |
Oct
(45) |
Nov
(35) |
Dec
(17) |
| 2018 |
Jan
(26) |
Feb
(35) |
Mar
(38) |
Apr
(32) |
May
(8) |
Jun
(43) |
Jul
(27) |
Aug
(30) |
Sep
(43) |
Oct
(42) |
Nov
(38) |
Dec
(67) |
| 2019 |
Jan
(32) |
Feb
(37) |
Mar
(53) |
Apr
(64) |
May
(49) |
Jun
(18) |
Jul
(14) |
Aug
(53) |
Sep
(25) |
Oct
(30) |
Nov
(49) |
Dec
(31) |
| 2020 |
Jan
(87) |
Feb
(45) |
Mar
(37) |
Apr
(51) |
May
(99) |
Jun
(36) |
Jul
(11) |
Aug
(14) |
Sep
(20) |
Oct
(24) |
Nov
(40) |
Dec
(23) |
| 2021 |
Jan
(14) |
Feb
(53) |
Mar
(85) |
Apr
(15) |
May
(19) |
Jun
(3) |
Jul
(14) |
Aug
(1) |
Sep
(57) |
Oct
(73) |
Nov
(56) |
Dec
(22) |
| 2022 |
Jan
(3) |
Feb
(22) |
Mar
(6) |
Apr
(55) |
May
(46) |
Jun
(39) |
Jul
(15) |
Aug
(9) |
Sep
(11) |
Oct
(34) |
Nov
(20) |
Dec
(36) |
| 2023 |
Jan
(79) |
Feb
(41) |
Mar
(99) |
Apr
(169) |
May
(48) |
Jun
(16) |
Jul
(16) |
Aug
(57) |
Sep
(19) |
Oct
|
Nov
|
Dec
|
| S | M | T | W | T | F | S |
|---|---|---|---|---|---|---|
|
|
|
|
1
(14) |
2
(16) |
3
(13) |
4
(3) |
|
5
(18) |
6
(1) |
7
(6) |
8
(2) |
9
(16) |
10
(19) |
11
(14) |
|
12
(1) |
13
(6) |
14
(20) |
15
(26) |
16
(18) |
17
(15) |
18
(16) |
|
19
(7) |
20
(8) |
21
(19) |
22
(19) |
23
(21) |
24
(15) |
25
(15) |
|
26
(11) |
27
(17) |
28
(21) |
29
(14) |
|
|
|
|
From: Julian S. <js...@ac...> - 2012-02-20 23:20:40
|
> Any idea if the patch below is the way to go ? > Or if there is something which I have not understood/did wrong ? Your analysis + patch sound plausible, is the best I can say. You are in unexplored territory -- I don't think anybody has been down this road before. J |
|
From: Julian S. <js...@ac...> - 2012-02-20 23:05:29
|
I've been slowly putting together an implementation of translation chaining. This allows translations to be patched, so that guest conditional and unconditional branches to addresses known at JIT time are converted into jumps from one translation to the next, with no need to return to the dispatcher each time. This gets rid of the cache miss and branch mispredict caused by each trip through the dispatcher. Branches to addresses known only at runtime of course still need to be looked up, but those are relatively uncommon. Getting this to work has proven a swamp of complexity, considering it needs to work for all architectures, and all the stuff that the dispatcher needs to do -- event checks, dealing with no-redirect translations -- needs to be JITted into the code. Deleting translations also becomes more complex, since any others that jump to the one to be deleted first need to be un-chained. First performance numbers are below, on amd64-linux, relative to trunk, for tools none and memcheck. The speedups are good for "none", especially on integer code with a lot of short blocks and hence many branches (bz2, tinycc). Speedups are smaller (even in absolute terms, eg, bz2) for memcheck, which is a bit disappointing. Makes me think that the performance of memcheck for large programs is ultimately determined by the performance of the memory system, and removing instructions doesn't have much effect. Currently this is very incomplete -- deletion of translations is not handled yet, and there are many rough edges to tidy up. When I have something that's functionally complete for amd64-linux I'll post a patch. J $ /usr/bin/perl perf/vg_perf --reps=5 --tools=none,memcheck -- vg=/home/sewardj/VgTRUNK/tchain --vg=/home/sewardj/VgTRUNK/trunk perf/ -- Running tests in perf ---------------------------------------------- -- bigcode1 -- bigcode1 tchain :0.11s no: 1.8s (16.6x, -----) me: 3.6s (32.4x, -----) bigcode1 trunk :0.11s no: 2.1s (19.3x,-15.8%) me: 3.9s (35.0x, -8.1%) -- bigcode2 -- bigcode2 tchain :0.11s no: 4.3s (39.0x, -----) me: 8.9s (81.4x, -----) bigcode2 trunk :0.11s no: 4.5s (40.9x, -4.9%) me: 9.2s (83.7x, -2.9%) -- bz2 -- bz2 tchain :0.63s no: 2.0s ( 3.1x, -----) me: 6.6s (10.4x, -----) bz2 trunk :0.63s no: 2.7s ( 4.2x,-35.0%) me: 6.8s (10.8x, -3.3%) -- fbench -- fbench tchain :0.24s no: 1.1s ( 4.5x, -----) me: 3.9s (16.2x, -----) fbench trunk :0.24s no: 1.3s ( 5.3x,-17.6%) me: 4.2s (17.5x, -7.7%) -- ffbench -- ffbench tchain :0.21s no: 0.9s ( 4.4x, -----) me: 3.0s (14.5x, -----) ffbench trunk :0.21s no: 0.9s ( 4.4x, 0.0%) me: 3.1s (14.6x, -1.0%) -- heap -- heap tchain :0.09s no: 0.6s ( 7.1x, -----) me: 5.7s (63.4x, -----) heap trunk :0.09s no: 0.8s ( 9.0x,-26.6%) me: 5.5s (61.3x, 3.3%) -- heap_pdb4 -- heap_pdb4 tchain :0.11s no: 0.8s ( 6.8x, -----) me: 9.3s (84.5x, -----) heap_pdb4 trunk :0.11s no: 1.0s ( 9.1x,-33.3%) me: 9.9s (89.6x, -6.1%) -- many-loss-records -- many-loss-records tchain :0.02s no: 0.2s (11.5x, -----) me: 1.4s (71.0x, -----) many-loss-records trunk :0.02s no: 0.2s (12.0x, -4.3%) me: 1.3s (67.0x, 5.6%) -- many-xpts -- many-xpts tchain :0.03s no: 0.3s ( 9.3x, -----) me: 1.8s (61.0x, -----) many-xpts trunk :0.03s no: 0.3s (11.0x,-17.9%) me: 1.8s (60.7x, 0.5%) -- sarp -- sarp tchain :0.03s no: 0.2s ( 7.3x, -----) me: 2.4s (80.0x, -----) sarp trunk :0.03s no: 0.2s ( 7.3x, 0.0%) me: 2.5s (83.7x, -4.6%) -- tinycc -- tinycc tchain :0.16s no: 1.5s ( 9.2x, -----) me: 9.7s (60.5x, -----) tinycc trunk :0.16s no: 2.2s (14.1x,-53.1%) me:10.2s (64.1x, -5.9%) -- Finished tests in perf ---------------------------------------------- == 11 programs, 44 timings ================= |
|
From: Philippe W. <phi...@sk...> - 2012-02-20 20:24:55
|
On Mon, 2012-02-20 at 19:43 +0800, unbutun wrote: > Now, we wrote a dynamic lib with a set of malloc and free, and we > build the uclibc and our lib in a elf. > The malloc in uclibc is weak symbol, our is strong symbol, so the elf > could run with no error > but when we use valgrind, we found that valgrind is worked with > nothing report > > The dynamic lib which we wrote is to do something we need and uclibc > could not provide I assume that your library contains implementations of malloc/free/... which have the same behaviour as the "standard" malloc/free/... To have Valgrind properly tracking the memory malloc-ed/free-d by your lib, you must tell that these functions have to be replaced for e.g. memcheck. For this, you must modify the file coregrind/m_replacemalloc/vg_replace_malloc.c For example, for malloc, you currently have: ALLOC_or_NULL(VG_Z_LIBSTDCXX_SONAME, malloc, malloc); ALLOC_or_NULL(VG_Z_LIBC_SONAME, malloc, malloc); I believe if you add a line ALLOC_or_NULL(VG_Z_YOURLIB_SONAME, malloc, malloc); where VG_Z_YOURLIB_SONAME is to be defined similarly to others in include/pub_tool_redir.h, then this should work. (of course, you must do that for all the "malloc" related functions you have implemented in your library). Alternatively, if you add a line ALLOC_or_NULL(NONE, malloc, malloc); then malloc will be replaced whatever the lib (static or dynamic) where they are (re-)defined. Philippe |
|
From: Patrick J. L. <lop...@gm...> - 2012-02-20 20:24:34
|
I have filed the following bug:
https://bugs.kde.org/show_bug.cgi?id=294523
The executive summary is that the current behavior of
"--partial-loads-ok=yes" needlessly creates false negatives; i.e., it
causes actual errors to be missed. I believe a simple change to its
behavior can eliminate all false negatives, while still suppressing
almost all of the false positives that inspired the addition of this
option in the first place.
The issue is this. When loading bytes from unaddressable memory,
Memcheck emits an error but marks the bytes read as _valid_. The
rationale for this is to avoid a cascade of errors; after all, once
the invalid memory access is flagged, the user has all the information
they need.
But lots of optimized code relies on the following property: An
aligned load cannot fault unless all of its bytes fault. So it is
common (especially in vectorized code, but put that aside for now) to
load a chunk of data from an aligned address that only partially
overlaps an allocated region. This is perfectly fine unless your
optimized code relies on the (unknown) bytes that were read from
outside the allocated region.
So Memcheck has an option "--partial-loads-ok=yes" designed to
suppress the error when (a) the load is aligned and (b) one or more of
the bytes are addressable. The problem is that it still marks all
bytes read as "defined". This means that even if your optimized code
erroneously depends on the data loaded from the unallocated region, no
error will be issued.
The solution is simple: When --partial-loads-ok=yes and the error is
being suppressed, mark the bytes read from unaccessable memory as
_undefined_. This will result in zero false negatives, since any use
of the data from the unaccessable memory will still emit an error, and
any use of data read from addressable memory is not an error. But
this will still massively reduce false positives, depending on how
conservatively the validity bits are propagated.
I attached a test case to my bug demonstrating the kind of optimized
code that I believe is extremely common and the kind of false negative
that --partial-loads-ok creates. I also attached a patch implementing
my proposed fix.
I would appreciate any feedback. Thanks!
- Pat
|
|
From: <sv...@va...> - 2012-02-20 15:38:01
|
Author: sewardj Date: 2012-02-20 15:33:24 +0000 (Mon, 20 Feb 2012) New Revision: 12393 Log: ARM/Thumb only: fix a bug in which stack unwinding halts in some functions that do FP arithmetic. This is due to the Dwarf3 CFI mentioning Dwarf registers above N_CFI_REGS, in particular FP registers, which have values of about 80. This fixes the problem by increasing N_CFI_REGS to a level that covers all known registers. (n-i-bz) Modified: trunk/coregrind/m_debuginfo/readdwarf.c Modified: trunk/coregrind/m_debuginfo/readdwarf.c =================================================================== --- trunk/coregrind/m_debuginfo/readdwarf.c 2012-02-20 15:03:02 UTC (rev 12392) +++ trunk/coregrind/m_debuginfo/readdwarf.c 2012-02-20 15:33:24 UTC (rev 12393) @@ -1846,9 +1846,14 @@ # error "Unknown platform" #endif -/* the number of regs we are prepared to unwind */ +/* The number of regs we are prepared to unwind. The number for + arm-linux (320) seems ludicrously high, but the ARM IHI 0040A page + 7 (DWARF for the ARM Architecture) specifies that values up to 320 + might exist, for Neon/VFP-v3. */ #if defined(VGP_ppc32_linux) || defined(VGP_ppc64_linux) # define N_CFI_REGS 72 +#elif defined(VGP_arm_linux) +# define N_CFI_REGS 320 #else # define N_CFI_REGS 20 #endif |
|
From: <sv...@va...> - 2012-02-20 15:07:38
|
Author: florian
Date: 2012-02-20 15:03:02 +0000 (Mon, 20 Feb 2012)
New Revision: 12392
Log:
With the change in VEX r2258 the guest IA will now also be accessed
in the low word only. Adjust code accordingly.
Modified:
trunk/memcheck/mc_machine.c
Modified: trunk/memcheck/mc_machine.c
===================================================================
--- trunk/memcheck/mc_machine.c 2012-02-17 15:13:55 UTC (rev 12391)
+++ trunk/memcheck/mc_machine.c 2012-02-20 15:03:02 UTC (rev 12392)
@@ -811,6 +811,7 @@
if (o == GOF(IP_AT_SYSCALL)) return -1;
if (o == GOF(fpc)) return -1;
if (o == GOF(IA)) return -1;
+ if (o == GOF(IA) + 4) return -1;
if (o == GOF(SYSNO)) return -1;
VG_(printf)("MC_(get_otrack_shadow_offset)(s390x)(off=%d,sz=%d)\n",
offset,szB);
|
|
From: <sv...@va...> - 2012-02-20 15:05:49
|
Author: florian
Date: 2012-02-20 15:01:14 +0000 (Mon, 20 Feb 2012)
New Revision: 2258
Log:
Improve code generation on s390x for assignment of constant
values to guest registers. Motivated by the observation that
piecing together a 64-bit value requires 4 insns on z900 and 2 insns
on newer models. Specifically:
(1) Assigning 0 can be done by using XC
(2) Assigning a value that differs by a small amount from the
value previously assigned can be done using AGSI
(Happens a lot for guest IA updates).
(3) If the new value differs from the previous one only
in the lower word it is sufficient to assign the lower word.
(4) If the new value equals the old value the assignment is redundant
and can be eliminated. This happens surprisingly often.
This buys us somewhere between 5% and 11.8% of insns (as measured
on the perf bucket).
Modified:
trunk/auxprogs/genoffsets.c
trunk/priv/guest_s390_defs.h
trunk/priv/guest_s390_toIR.c
trunk/priv/host_s390_defs.c
trunk/priv/host_s390_defs.h
trunk/priv/host_s390_isel.c
Modified: trunk/auxprogs/genoffsets.c
===================================================================
--- trunk/auxprogs/genoffsets.c 2012-02-17 15:07:09 UTC (rev 2257)
+++ trunk/auxprogs/genoffsets.c 2012-02-20 15:01:14 UTC (rev 2258)
@@ -169,6 +169,10 @@
GENOFFSET(S390X,s390x,SYSNO);
GENOFFSET(S390X,s390x,IP_AT_SYSCALL);
GENOFFSET(S390X,s390x,fpc);
+ GENOFFSET(S390X,s390x,CC_OP);
+ GENOFFSET(S390X,s390x,CC_DEP1);
+ GENOFFSET(S390X,s390x,CC_DEP2);
+ GENOFFSET(S390X,s390x,CC_NDEP);
}
/*--------------------------------------------------------------------*/
Modified: trunk/priv/guest_s390_defs.h
===================================================================
--- trunk/priv/guest_s390_defs.h 2012-02-17 15:07:09 UTC (rev 2257)
+++ trunk/priv/guest_s390_defs.h 2012-02-20 15:01:14 UTC (rev 2258)
@@ -37,6 +37,7 @@
#include "libvex_ir.h" // IRSB (needed by bb_to_IR.h)
#include "libvex.h" // VexArch (needed by bb_to_IR.h)
#include "guest_generic_bb_to_IR.h" // DisResult
+#include "libvex_guest_s390x.h" // VexGuestS390XState
/* Convert one s390 insn to IR. See the type DisOneInstrFn in
Modified: trunk/priv/guest_s390_toIR.c
===================================================================
--- trunk/priv/guest_s390_toIR.c 2012-02-17 15:07:09 UTC (rev 2257)
+++ trunk/priv/guest_s390_toIR.c 2012-02-20 15:01:14 UTC (rev 2258)
@@ -13593,7 +13593,7 @@
DisResult
disInstr_S390(IRSB *irsb_IN,
- Bool put_IP,
+ Bool put_IP __attribute__((unused)),
Bool (*resteerOkFn)(void *, Addr64),
Bool resteerCisOk,
void *callback_opaque,
@@ -13616,10 +13616,9 @@
resteer_fn = resteerOkFn;
resteer_data = callback_opaque;
- /* We may be asked to update the guest IA before going further. */
- if (put_IP)
- addStmtToIRSB(irsb, IRStmt_Put(S390X_GUEST_OFFSET(guest_IA),
- mkaddr_expr(guest_IA_curr_instr)));
+ /* Always update the guest IA. See comment in s390_isel_stmt for Ist_Put. */
+ addStmtToIRSB(irsb, IRStmt_Put(S390X_GUEST_OFFSET(guest_IA),
+ mkaddr_expr(guest_IA_curr_instr)));
return disInstr_S390_WRK(guest_code + delta);
}
Modified: trunk/priv/host_s390_defs.c
===================================================================
--- trunk/priv/host_s390_defs.c 2012-02-17 15:07:09 UTC (rev 2257)
+++ trunk/priv/host_s390_defs.c 2012-02-20 15:01:14 UTC (rev 2258)
@@ -714,6 +714,8 @@
break;
case S390_INSN_MFENCE:
+ case S390_INSN_GZERO:
+ case S390_INSN_GADD:
break;
default:
@@ -917,6 +919,8 @@
break;
case S390_INSN_MFENCE:
+ case S390_INSN_GZERO:
+ case S390_INSN_GADD:
break;
default:
@@ -1115,6 +1119,35 @@
}
+static UChar *
+emit_SIY(UChar *p, ULong op, UChar i2, UChar b1, UShort dl1, UChar dh1)
+{
+ ULong the_insn = op;
+
+ the_insn |= ((ULong)i2) << 32;
+ the_insn |= ((ULong)b1) << 28;
+ the_insn |= ((ULong)dl1) << 16;
+ the_insn |= ((ULong)dh1) << 8;
+
+ return emit_6bytes(p, the_insn);
+}
+
+
+static UChar *
+emit_SSa(UChar *p, ULong op, UChar l, UChar b1, UShort d1, UChar b2, UShort d2)
+{
+ ULong the_insn = op;
+
+ the_insn |= ((ULong)l) << 32;
+ the_insn |= ((ULong)b1) << 28;
+ the_insn |= ((ULong)d1) << 16;
+ the_insn |= ((ULong)b2) << 12;
+ the_insn |= ((ULong)d2) << 0;
+
+ return emit_6bytes(p, the_insn);
+}
+
+
/*------------------------------------------------------------*/
/*--- Functions to emit particular instructions ---*/
/*------------------------------------------------------------*/
@@ -1240,6 +1273,18 @@
static UChar *
+s390_emit_AGSI(UChar *p, UChar i2, UChar b1, UShort dl1, UChar dh1)
+{
+ vassert(s390_host_has_gie);
+
+ if (UNLIKELY(vex_traceflags & VEX_TRACE_ASM))
+ s390_disasm(ENC3(MNM, INT, SDXB), "agsi", (Int)(Char)i2, dh1, dl1, 0, b1);
+
+ return emit_SIY(p, 0xeb000000007aULL, i2, b1, dl1, dh1);
+}
+
+
+static UChar *
s390_emit_NR(UChar *p, UChar r1, UChar r2)
{
if (UNLIKELY(vex_traceflags & VEX_TRACE_ASM))
@@ -1688,6 +1733,16 @@
static UChar *
+s390_emit_XC(UChar *p, UInt l, UChar b1, UShort d1, UChar b2, UShort d2)
+{
+ if (UNLIKELY(vex_traceflags & VEX_TRACE_ASM))
+ s390_disasm(ENC3(MNM, UDLB, UDXB), "xc", d1, l, b1, d2, 0, b2);
+
+ return emit_SSa(p, 0xd70000000000ULL, l, b1, d1, b2, d2);
+}
+
+
+static UChar *
s390_emit_FLOGR(UChar *p, UChar r1, UChar r2)
{
vassert(s390_host_has_eimm);
@@ -4406,6 +4461,34 @@
}
+s390_insn *
+s390_insn_gzero(UChar size, UInt offset)
+{
+ s390_insn *insn = LibVEX_Alloc(sizeof(s390_insn));
+
+ insn->tag = S390_INSN_GZERO;
+ insn->size = size;
+ insn->variant.gzero.offset = offset;
+
+ return insn;
+}
+
+
+s390_insn *
+s390_insn_gadd(UChar size, UInt offset, UChar delta, ULong value)
+{
+ s390_insn *insn = LibVEX_Alloc(sizeof(s390_insn));
+
+ insn->tag = S390_INSN_GADD;
+ insn->size = size;
+ insn->variant.gadd.offset = offset;
+ insn->variant.gadd.delta = delta;
+ insn->variant.gadd.value = value;
+
+ return insn;
+}
+
+
/*---------------------------------------------------------------*/
/*--- Debug print ---*/
/*---------------------------------------------------------------*/
@@ -4477,6 +4560,10 @@
s390_amode_as_string(va_arg(args, s390_amode *)));
continue;
+ case 'G': /* %G = guest state @ offset */
+ p += vex_sprintf(p, "guest[%d]", va_arg(args, UInt));
+ continue;
+
case 'C': /* %C = condition code */
p += vex_sprintf(p, "%s", s390_cc_as_string(va_arg(args, s390_cc_t)));
continue;
@@ -4821,6 +4908,17 @@
s390_sprintf(buf, "%M", "v-mfence");
return buf; /* avoid printing "size = ..." which is meaningless */
+ case S390_INSN_GZERO:
+ s390_sprintf(buf, "%M %G", "v-gzero", insn->variant.gzero.offset);
+ break;
+
+ case S390_INSN_GADD:
+ s390_sprintf(buf, "%M %G += %I (= %I)", "v-gadd",
+ insn->variant.gadd.offset,
+ (Long)(Char)insn->variant.gadd.delta,
+ insn->variant.gadd.value);
+ break;
+
default: goto fail;
}
@@ -7042,6 +7140,24 @@
}
+static UChar *
+s390_insn_gzero_emit(UChar *buf, const s390_insn *insn)
+{
+ return s390_emit_XC(buf, insn->size - 1,
+ S390_REGNO_GUEST_STATE_POINTER, insn->variant.gzero.offset,
+ S390_REGNO_GUEST_STATE_POINTER, insn->variant.gzero.offset);
+}
+
+
+static UChar *
+s390_insn_gadd_emit(UChar *buf, const s390_insn *insn)
+{
+ return s390_emit_AGSI(buf, insn->variant.gadd.delta,
+ S390_REGNO_GUEST_STATE_POINTER,
+ DISP20(insn->variant.gadd.offset));
+}
+
+
Int
emit_S390Instr(UChar *buf, Int nbuf, s390_insn *insn, Bool mode64,
void *dispatch_unassisted, void *dispatch_assisted)
@@ -7159,6 +7275,14 @@
end = s390_insn_mfence_emit(buf, insn);
break;
+ case S390_INSN_GZERO:
+ end = s390_insn_gzero_emit(buf, insn);
+ break;
+
+ case S390_INSN_GADD:
+ end = s390_insn_gadd_emit(buf, insn);
+ break;
+
default:
vpanic("s390_insn_emit");
}
Modified: trunk/priv/host_s390_defs.h
===================================================================
--- trunk/priv/host_s390_defs.h 2012-02-17 15:07:09 UTC (rev 2257)
+++ trunk/priv/host_s390_defs.h 2012-02-20 15:01:14 UTC (rev 2258)
@@ -142,7 +142,9 @@
S390_INSN_BFP128_COMPARE,
S390_INSN_BFP128_CONVERT_TO,
S390_INSN_BFP128_CONVERT_FROM,
- S390_INSN_MFENCE
+ S390_INSN_MFENCE,
+ S390_INSN_GZERO, /* Assign zero to a guest register */
+ S390_INSN_GADD /* Add a value to a guest register */
} s390_insn_tag;
@@ -397,6 +399,14 @@
HReg op2_hi; /* right operand; high part */
HReg op2_lo; /* right operand; low part */
} bfp128_compare;
+ struct {
+ UInt offset;
+ } gzero;
+ struct {
+ UInt offset;
+ UChar delta;
+ ULong value; /* for debugging only */
+ } gadd;
} variant;
} s390_insn;
@@ -447,6 +457,8 @@
HReg dst, HReg op_hi, HReg op_lo,
s390_round_t);
s390_insn *s390_insn_mfence(void);
+s390_insn *s390_insn_gzero(UChar size, UInt offset);
+s390_insn *s390_insn_gadd(UChar size, UInt offset, UChar delta, ULong value);
UInt s390_insn_emit(UChar *buf, Int nbuf, const s390_insn *insn,
void *dispatch);
Modified: trunk/priv/host_s390_isel.c
===================================================================
--- trunk/priv/host_s390_isel.c 2012-02-17 15:07:09 UTC (rev 2257)
+++ trunk/priv/host_s390_isel.c 2012-02-20 15:01:14 UTC (rev 2258)
@@ -34,10 +34,11 @@
#include "libvex_ir.h"
#include "libvex.h"
#include "libvex_s390x_common.h"
+#include "libvex_guest_offsets.h"
-#include "ir_match.h"
#include "main_util.h"
#include "main_globals.h"
+#include "guest_s390_defs.h" /* guest_s390x_state_requires_precise_mem_exns */
#include "host_generic_regs.h"
#include "host_s390_defs.h"
@@ -68,8 +69,27 @@
- The host subarchitecture we are selecting insns for.
This is set at the start and does not change.
+
+ - A flag to indicate whether the guest IA has been assigned to.
+
+ - Values of certain guest registers which are often assigned constants.
*/
+/* Symbolic names for guest registers whose value we're tracking */
+enum {
+ GUEST_IA,
+ GUEST_CC_OP,
+ GUEST_CC_DEP1,
+ GUEST_CC_DEP2,
+ GUEST_CC_NDEP,
+ GUEST_SYSNO,
+ GUEST_UNKNOWN /* must be the last entry */
+};
+
+/* Number of registers we're tracking. */
+#define NUM_TRACKED_REGS GUEST_UNKNOWN
+
+
typedef struct {
IRTypeEnv *type_env;
@@ -79,10 +99,12 @@
HInstrArray *code;
+ ULong old_value[NUM_TRACKED_REGS];
UInt vreg_ctr;
UInt hwcaps;
-
+ Bool first_IA_assignment;
+ Bool old_value_valid[NUM_TRACKED_REGS];
} ISelEnv;
@@ -96,6 +118,33 @@
static void s390_isel_float128_expr(HReg *, HReg *, ISelEnv *, IRExpr *);
+static Int
+get_guest_reg(Int offset)
+{
+ switch (offset) {
+ case OFFSET_s390x_IA: return GUEST_IA;
+ case OFFSET_s390x_CC_OP: return GUEST_CC_OP;
+ case OFFSET_s390x_CC_DEP1: return GUEST_CC_DEP1;
+ case OFFSET_s390x_CC_DEP2: return GUEST_CC_DEP2;
+ case OFFSET_s390x_CC_NDEP: return GUEST_CC_NDEP;
+ case OFFSET_s390x_SYSNO: return GUEST_SYSNO;
+
+ /* Also make sure there is never a partial write to one of
+ these registers. That would complicate matters. */
+ case OFFSET_s390x_IA+1 ... OFFSET_s390x_IA+7:
+ case OFFSET_s390x_CC_OP+1 ... OFFSET_s390x_CC_OP+7:
+ case OFFSET_s390x_CC_DEP1+1 ... OFFSET_s390x_CC_DEP1+7:
+ case OFFSET_s390x_CC_DEP2+1 ... OFFSET_s390x_CC_DEP2+7:
+ case OFFSET_s390x_CC_NDEP+1 ... OFFSET_s390x_CC_NDEP+7:
+ vassert("partial update of this guest state register is not allowed");
+ break;
+
+ default: break;
+ }
+
+ return GUEST_UNKNOWN;
+}
+
/* Add an instruction */
static void
addInstr(ISelEnv *env, s390_insn *insn)
@@ -203,6 +252,16 @@
}
+static __inline__ Bool
+ulong_fits_signed_8bit(ULong val)
+{
+ Long v = val & 0xFFu;
+
+ v = (v << 56) >> 56; /* sign extend */
+
+ return val == (ULong)v;
+}
+
/* EXPR is an expression that is used as an address. Return an s390_amode
for it. */
static s390_amode *
@@ -2139,7 +2198,98 @@
IRType tyd = typeOfIRExpr(env->type_env, stmt->Ist.Put.data);
HReg src;
s390_amode *am;
+ ULong new_value, old_value, difference;
+ /* Detect updates to certain guest registers. We track the contents
+ of those registers as long as they contain constants. If the new
+ constant is either zero or in the 8-bit neighbourhood of the
+ current value we can use a memory-to-memory insn to do the update. */
+
+ Int offset = stmt->Ist.Put.offset;
+
+ /* Check necessary conditions:
+ (1) must be one of the registers we care about
+ (2) assigned value must be a constant */
+ Int guest_reg = get_guest_reg(offset);
+
+ if (guest_reg == GUEST_UNKNOWN) goto not_special;
+
+ if (guest_reg == GUEST_IA) {
+ /* If this is the first assignment to the IA reg, don't special case
+ it. We need to do a full 8-byte assignment here. The reason is
+ that in case of a redirected translation the guest IA does not
+ contain the redirected-to address. Instead it contains the
+ redirected-from address and those can be far apart. So in order to
+ do incremnetal updates if the IA in the future we need to get the
+ initial address of the super block correct. */
+ if (env->first_IA_assignment) {
+ env->first_IA_assignment = False;
+ goto not_special;
+ }
+ }
+
+ if (stmt->Ist.Put.data->tag != Iex_Const) {
+ /* Invalidate guest register contents */
+ env->old_value_valid[guest_reg] = False;
+ goto not_special;
+ }
+
+ /* OK. Necessary conditions are satisfied. */
+
+ /* Get the old value and update it */
+ vassert(tyd == Ity_I64);
+
+ old_value = env->old_value[guest_reg];
+ new_value = stmt->Ist.Put.data->Iex.Const.con->Ico.U64;
+ env->old_value[guest_reg] = new_value;
+
+ Bool old_value_is_valid = env->old_value_valid[guest_reg];
+ env->old_value_valid[guest_reg] = True;
+
+ /* If the register already contains the new value, there is nothing
+ to do here. Unless the guest register requires precise memory
+ exceptions. */
+ if (old_value_is_valid && new_value == old_value) {
+ if (! guest_s390x_state_requires_precise_mem_exns(offset, offset + 8)) {
+ return;
+ }
+ }
+
+ /* guest register = 0 */
+ if (new_value == 0) {
+ addInstr(env, s390_insn_gzero(sizeofIRType(tyd), offset));
+ return;
+ }
+
+ if (old_value_is_valid == False) goto not_special;
+
+ /* If the new value is in the neighbourhood of the old value
+ we can use a memory-to-memory insn */
+ difference = new_value - old_value;
+
+ if (s390_host_has_gie && ulong_fits_signed_8bit(difference)) {
+ addInstr(env, s390_insn_gadd(sizeofIRType(tyd), offset,
+ (difference & 0xFF), new_value));
+ return;
+ }
+
+ /* If the high word is the same it is sufficient to load the low word.
+ Use R0 as a scratch reg. */
+ if ((old_value >> 32) == (new_value >> 32)) {
+ HReg r0 = make_gpr(env, 0);
+ HReg gsp = make_gpr(env, S390_REGNO_GUEST_STATE_POINTER);
+ s390_amode *gam;
+
+ gam = s390_amode_b12(offset + 4, gsp);
+ addInstr(env, s390_insn_load_immediate(4, r0,
+ new_value & 0xFFFFFFFF));
+ addInstr(env, s390_insn_store(4, gam, r0));
+ return;
+ }
+
+ /* No special case applies... fall through */
+
+ not_special:
am = s390_amode_for_guest_state(stmt->Ist.Put.offset);
switch (tyd) {
@@ -2230,7 +2380,18 @@
IRType retty;
IRDirty* d = stmt->Ist.Dirty.details;
Bool passBBP;
+ Int i;
+ /* Invalidate tracked values of those guest state registers that are
+ modified by this helper. */
+ for (i = 0; i < d->nFxState; ++i) {
+ if ((d->fxState[i].fx == Ifx_Write || d->fxState[i].fx == Ifx_Modify)) {
+ Int guest_reg = get_guest_reg(d->fxState[i].offset);
+ if (guest_reg != GUEST_UNKNOWN)
+ env->old_value_valid[guest_reg] = False;
+ }
+ }
+
if (d->nFxState == 0)
vassert(!d->needsBBP);
@@ -2372,6 +2533,13 @@
/* Copy BB's type env. */
env->type_env = bb->tyenv;
+ /* Set up data structures for tracking guest register values. */
+ env->first_IA_assignment = True;
+ for (i = 0; i < NUM_TRACKED_REGS; ++i) {
+ env->old_value[i] = 0; /* just something to have a defined value */
+ env->old_value_valid[i] = False;
+ }
+
/* Make up an IRTemp -> virtual HReg mapping. This doesn't
change as we go along. For some reason types_used has Int type -- but
it should be unsigned. Internally we use an unsigned type; so we
|
|
From: unbutun <un...@so...> - 2012-02-20 11:43:41
|
Hi, guys This is Gavin in China, first, forgive my poor english please Now, we wrote a dynamic lib with a set of malloc and free, and we build the uclibc and our lib in a elf. The malloc in uclibc is weak symbol, our is strong symbol, so the elf could run with no error but when we use valgrind, we found that valgrind is worked with nothing report The dynamic lib which we wrote is to do something we need and uclibc could not provide My boss tell me the problem must be resolved, could you give me some advice of how to fix that(add some code in valgrind or do some other thing to fix it) ? Any advice would be much appreciated, thanks advance Regards, Gavin |