|
From: Christian B. <bor...@de...> - 2016-06-08 09:40:26
|
Ok to push this change?
Index: cachegrind/cg_main.c
===================================================================
--- cachegrind/cg_main.c (revision 15851)
+++ cachegrind/cg_main.c (working copy)
@@ -1845,6 +1851,12 @@
VG_(umsg)(" but it is not. Exiting now.\n");
VG_(exit)(1);
}
+ if (clo_branch_sim && (VG_(clo_vex_control).guest_chase_thresh != 0)) {
+ VG_(message)(Vg_UserMsg,
+ "branch simulation only works with --vex-guest-chase-thresh=0\n"
+ "=> resetting it back to 0\n");
+ VG_(clo_vex_control).guest_chase_thresh = 0; // cannot be overriden.
+ }
cachesim_initcaches(I1c, D1c, LLc);
}
|
|
From: Josef W. <Jos...@gm...> - 2016-06-17 13:03:53
|
Not sure this is needed, and it makes cachegrind slower (it is done
for callgrind, which wants to track calls).
Chasing of BBs in VEX is only done on unconditional branches, isn't it?
I do not think these are relevant for branch simulation?
I actually didn't check both of these my assumptions, so I may be wrong.
Josef
Am 08.06.2016 um 11:10 schrieb Christian Borntraeger:
> Ok to push this change?
>
> Index: cachegrind/cg_main.c
> ===================================================================
> --- cachegrind/cg_main.c (revision 15851)
> +++ cachegrind/cg_main.c (working copy)
> @@ -1845,6 +1851,12 @@
> VG_(umsg)(" but it is not. Exiting now.\n");
> VG_(exit)(1);
> }
> + if (clo_branch_sim && (VG_(clo_vex_control).guest_chase_thresh != 0)) {
> + VG_(message)(Vg_UserMsg,
> + "branch simulation only works with --vex-guest-chase-thresh=0\n"
> + "=> resetting it back to 0\n");
> + VG_(clo_vex_control).guest_chase_thresh = 0; // cannot be overriden.
> + }
>
> cachesim_initcaches(I1c, D1c, LLc);
> }
|
|
From: Christian B. <bor...@de...> - 2016-06-17 13:22:16
|
On 06/17/2016 03:01 PM, Josef Weidendorfer wrote:
> Not sure this is needed, and it makes cachegrind slower (it is done
> for callgrind, which wants to track calls).
>
> Chasing of BBs in VEX is only done on unconditional branches, isn't it?
> I do not think these are relevant for branch simulation?
>
> I actually didn't check both of these my assumptions, so I may be wrong.
I can certainly say that for s390 almost all function calls are lost due
to chaining, so I needed this to get sane results on s390.
>
> Josef
>
>
> Am 08.06.2016 um 11:10 schrieb Christian Borntraeger:
>> Ok to push this change?
>>
>> Index: cachegrind/cg_main.c
>> ===================================================================
>> --- cachegrind/cg_main.c (revision 15851)
>> +++ cachegrind/cg_main.c (working copy)
>> @@ -1845,6 +1851,12 @@
>> VG_(umsg)(" but it is not. Exiting now.\n");
>> VG_(exit)(1);
>> }
>> + if (clo_branch_sim && (VG_(clo_vex_control).guest_chase_thresh != 0)) {
>> + VG_(message)(Vg_UserMsg,
>> + "branch simulation only works with --vex-guest-chase-thresh=0\n"
>> + "=> resetting it back to 0\n");
>> + VG_(clo_vex_control).guest_chase_thresh = 0; // cannot be overriden.
>> + }
>>
>> cachesim_initcaches(I1c, D1c, LLc);
>> }
>>
>>
>> ------------------------------------------------------------------------------
>> What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
>> patterns at an interface-level. Reveals which users, apps, and protocols are
>> consuming the most bandwidth. Provides multi-vendor support for NetFlow,
>> J-Flow, sFlow and other flows. Make informed decisions using capacity
>> planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e
>> _______________________________________________
>> Valgrind-developers mailing list
>> Val...@li...
>> https://lists.sourceforge.net/lists/listinfo/valgrind-developers
>>
>
|
|
From: Josef W. <Jos...@gm...> - 2016-06-17 14:02:45
|
Am 17.06.2016 um 15:22 schrieb Christian Borntraeger:
> On 06/17/2016 03:01 PM, Josef Weidendorfer wrote:
>> Chasing of BBs in VEX is only done on unconditional branches, isn't it?
Ah, this is wrong.
An IRSB of course has multiple conditional side exits when chasing
multiple BBs.
But... this seems to be handled correctly in cg_main.c. Before every
IRSB side exit (case Ist_Exit), the branch prediction simulation is called.
Still. Most calls should be unconditional, and then the branch predictor
is not called anyway, so there should be no difference between doing chasing
or not.
>> I do not think these are relevant for branch simulation?
>>
>> I actually didn't check both of these my assumptions, so I may be wrong.
>
> I can certainly say that for s390 almost all function calls are lost due
> to chaining, so I needed this to get sane results on s390.
For such a lost branch, can you check if this is a conditional or indirect
branch? I wonder why the branch predictor is not called if chasing is done.
Josef
>
>
>
>>
>> Josef
>>
>>
>> Am 08.06.2016 um 11:10 schrieb Christian Borntraeger:
>>> Ok to push this change?
>>>
>>> Index: cachegrind/cg_main.c
>>> ===================================================================
>>> --- cachegrind/cg_main.c (revision 15851)
>>> +++ cachegrind/cg_main.c (working copy)
>>> @@ -1845,6 +1851,12 @@
>>> VG_(umsg)(" but it is not. Exiting now.\n");
>>> VG_(exit)(1);
>>> }
>>> + if (clo_branch_sim && (VG_(clo_vex_control).guest_chase_thresh != 0)) {
>>> + VG_(message)(Vg_UserMsg,
>>> + "branch simulation only works with --vex-guest-chase-thresh=0\n"
>>> + "=> resetting it back to 0\n");
>>> + VG_(clo_vex_control).guest_chase_thresh = 0; // cannot be overriden.
>>> + }
>>>
>>> cachesim_initcaches(I1c, D1c, LLc);
>>> }
>>>
>>>
|
|
From: Christian B. <bor...@de...> - 2016-06-20 19:42:01
|
On 06/17/2016 04:02 PM, Josef Weidendorfer wrote:
> Am 17.06.2016 um 15:22 schrieb Christian Borntraeger:
>> On 06/17/2016 03:01 PM, Josef Weidendorfer wrote:
>>> Chasing of BBs in VEX is only done on unconditional branches, isn't it?
>
> Ah, this is wrong.
> An IRSB of course has multiple conditional side exits when chasing multiple BBs.
> But... this seems to be handled correctly in cg_main.c. Before every
> IRSB side exit (case Ist_Exit), the branch prediction simulation is called.
>
> Still. Most calls should be unconditional, and then the branch predictor
> is not called anyway, so there should be no difference between doing chasing
> or not.
>
>>> I do not think these are relevant for branch simulation?
>>>
>>> I actually didn't check both of these my assumptions, so I may be wrong.
>>
>> I can certainly say that for s390 almost all function calls are lost due
>> to chaining, so I needed this to get sane results on s390.
>
> For such a lost branch, can you check if this is a conditional or indirect
> branch? I wonder why the branch predictor is not called if chasing is done.
>
Its about unconditional direct branches (calls)
e.g.
==== SB 37 (evchecks 68) [tid 1] 0x4004e92 _dl_start+1274 /usr/lib64/ld-2.22.so+0x4e92
------------------------ Front end ------------------------
oi 3244(%r7),32
------ IMark(0x4004E92, 4, 0) ------
t0 = Add64(0xCAC:I64,GET:I64(248))
t1 = LDbe:I8(t0)
t2 = Or8(t1,0x20:I8)
PUT(352) = 0x0:I64
PUT(360) = 8Uto64(t2)
PUT(368) = 0x0:I64
PUT(376) = 0x0:I64
STbe(t0) = t2
PUT(336) = 0x4004E96:I64
larl %r2,.+142074
------ IMark(0x4004E96, 6, 0) ------
PUT(208) = 0x4027990:I64
PUT(336) = 0x4004E9C:I64
brasl %r14,.+29460
------ IMark(0x4004E9C, 6, 0) ------
PUT(304) = 0x4004EA2:I64
PUT(336) = 0x400C1B0:I64
<--- This branch becomes a nop IR-wise due to resteering.
ltg %r1,664(%r2)
------ IMark(0x400C1B0, 6, 0) ------
t4 = 0x298:I64
t3 = Add64(Add64(t4,GET:I64(208)),0x0:I64)
t5 = LDbe:I64(t3)
PUT(200) = t5
PUT(352) = 0xF:I64
PUT(360) = t5
PUT(368) = 0x0:I64
PUT(376) = 0x0:I64
PUT(336) = 0x400C1B6:I64
je .+104
------ IMark(0x400C1B6, 4, 0) ------
t6 = s390_calculate_cond[mcx=0x13]{0x8010ff188}(0x8:I64,GET:I64(352),GET:I64(360),GET:I64(368),GET:I64(376)):I32
if (CmpNE32(t6,0x0:I32)) { PUT(336) = 0x400C21E:I64; exit-Boring }
PUT(336) = 0x400C1BA:I64
PUT(336) = GET:I64(336); exit-Boring
can't show code due to extents > 1
|
|
From: Josef W. <Jos...@gm...> - 2016-06-20 20:14:35
|
Am 20.06.2016 um 21:41 schrieb Christian Borntraeger: > On 06/17/2016 04:02 PM, Josef Weidendorfer wrote: >> >> For such a lost branch, can you check if this is a conditional or indirect >> branch? I wonder why the branch predictor is not called if chasing is done. >> > > Its about unconditional direct branches (calls) > e.g. > > > ==== SB 37 (evchecks 68) [tid 1] 0x4004e92 _dl_start+1274 /usr/lib64/ld-2.22.so+0x4e92 > > ... > brasl %r14,.+29460 > ------ IMark(0x4004E9C, 6, 0) ------ > PUT(304) = 0x4004EA2:I64 > PUT(336) = 0x400C1B0:I64 > > > <--- This branch becomes a nop IR-wise due to resteering. I still do not understand. If chasing is switched off, this would become the end of an IRSB. But this still does not result in the branch simulator to be invoked: in cachegrind/cg_main.c, addEvent_Bc is only called with Ist_Exit statements, and addEvent_Bi only with unknown targets. Both is not the case here. So it should not matter for branch simulation for this branch to become a nop? If you get different results with/without chasing here, something must be wrong... Josef PS: on real hardware, the branch prediction probably would get active here, as we may want to predict the control flow change before we even can decode the "brasl"... but such behavior never was implemented in the branch simulation in cachegrind. |
|
From: Christian B. <bor...@de...> - 2016-06-20 20:26:57
|
On 06/20/2016 10:14 PM, Josef Weidendorfer wrote: > Am 20.06.2016 um 21:41 schrieb Christian Borntraeger: >> On 06/17/2016 04:02 PM, Josef Weidendorfer wrote: >>> >>> For such a lost branch, can you check if this is a conditional or indirect >>> branch? I wonder why the branch predictor is not called if chasing is done. >>> >> >> Its about unconditional direct branches (calls) >> e.g. >> >> >> ==== SB 37 (evchecks 68) [tid 1] 0x4004e92 _dl_start+1274 /usr/lib64/ld-2.22.so+0x4e92 >> >> ... > >> brasl %r14,.+29460 >> ------ IMark(0x4004E9C, 6, 0) ------ >> PUT(304) = 0x4004EA2:I64 >> PUT(336) = 0x400C1B0:I64 >> >> >> <--- This branch becomes a nop IR-wise due to resteering. > > > > I still do not understand. > If chasing is switched off, this would become the end of an IRSB. > > But this still does not result in the branch simulator to be invoked: > in cachegrind/cg_main.c, addEvent_Bc is only called with Ist_Exit statements, > and addEvent_Bi only with unknown targets. Both is not the case here. > > So it should not matter for branch simulation for this branch to become a nop? > If you get different results with/without chasing here, something must be wrong... Yes, I do see differences in the numbers when changing the the chase value to 0. The differences are not that big, so indeed it might be a bug somewhere else, but it seems that the branch prediction (do_ind_branch_predict) __IS__ called for the brasl calls. It just does not decide about mispredict/predict, but the calls are at least counted, it seems. Simple testcase with 2 branches, both are detected: # cat test.s .globl _start _start: brasl 14,test # save next address into r14, call test svc 1 # exit system call test: br 14 # jump to r14 # gcc -nostdlib test.s # valgrind --tool=cachegrind --branch-sim=yes --cache-sim=no ./a.out ==72006== Cachegrind, a cache and branch-prediction profiler ==72006== Copyright (C) 2002-2015, and GNU GPL'd, by Nicholas Nethercote et al. ==72006== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info ==72006== Command: ./a.out ==72006== --72006-- warning: L4 cache found, using its data for the LL simulation. ==72006== ==72006== I refs: 3 ==72006== ==72006== Branches: 2 (1 cond + 1 ind) ==72006== Mispredicts: 0 (0 cond + 0 ind) ==72006== Mispred rate: 0.0% (0.0% + 0.0% ) > Josef > > > PS: on real hardware, the branch prediction probably would get active here, as > we may want to predict the control flow change before we even can decode > the "brasl"... but such behavior never was implemented in the branch simulation > in cachegrind. > |
|
From: Josef W. <Jos...@gm...> - 2016-06-20 21:39:19
|
Am 20.06.2016 um 22:26 schrieb Christian Borntraeger: > Simple testcase with 2 branches, both are detected: > > # cat test.s > .globl _start > _start: > brasl 14,test # save next address into r14, call test > svc 1 # exit system call > > test: > br 14 # jump to r14 What is the VEX IR both with/without chase value set to 0 for this nice small example? I would be interested to see the IR before and after instrumentation, ie. "--trace-flags=01100000". Josef > # gcc -nostdlib test.s > # valgrind --tool=cachegrind --branch-sim=yes --cache-sim=no ./a.out > ==72006== Cachegrind, a cache and branch-prediction profiler > ==72006== Copyright (C) 2002-2015, and GNU GPL'd, by Nicholas Nethercote et al. > ==72006== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info > ==72006== Command: ./a.out > ==72006== > --72006-- warning: L4 cache found, using its data for the LL simulation. > ==72006== > ==72006== I refs: 3 > ==72006== > ==72006== Branches: 2 (1 cond + 1 ind) > ==72006== Mispredicts: 0 (0 cond + 0 ind) > ==72006== Mispred rate: 0.0% (0.0% + 0.0% ) > > > >> Josef >> >> >> PS: on real hardware, the branch prediction probably would get active here, as >> we may want to predict the control flow change before we even can decode >> the "brasl"... but such behavior never was implemented in the branch simulation >> in cachegrind. >> > > |
|
From: Christian B. <bor...@de...> - 2016-06-23 11:11:30
|
On 06/20/2016 11:39 PM, Josef Weidendorfer wrote:
> Am 20.06.2016 um 22:26 schrieb Christian Borntraeger:
>> Simple testcase with 2 branches, both are detected:
>>
>> # cat test.s
>> .globl _start
>> _start:
>> brasl 14,test # save next address into r14, call test
>> svc 1 # exit system call
>>
>> test:
>> br 14 # jump to r14
>
> What is the VEX IR both with/without chase value set to 0
> for this nice small example?
Interestingly enough for this example no different can be seen
in the counts, but the IR differs.
>
> I would be interested to see the IR before and after
> instrumentation, ie. "--trace-flags=01100000".
with --vex-guest-chase-thresh=0
==== SB 0 (evchecks 0) [tid 1] 0x800000d4 UNKNOWN_FUNCTION UNKNOWN_OBJECT+0x0
------------------------ Front end ------------------------
brasl %r14,.+8
------ IMark(0x800000D4, 6, 0) ------
PUT(304) = 0x800000DA:I64
PUT(336) = 0x800000DC:I64
PUT(336) = GET:I64(336); exit-Call
GuestBytes 800000D4 6 C0 E5 00 00 00 04 00001654
------------------------ After pre-instr IR optimisation ------------------------
IRSB {
t0:I64
------ IMark(0x800000D4, 6, 0) ------
PUT(304) = 0x800000DA:I64
PUT(336) = 0x800000DC:I64; exit-Call
}
VexExpansionRatio 6 96 160 :10
==== SB 1 (evchecks 1) [tid 1] 0x800000dc UNKNOWN_FUNCTION UNKNOWN_OBJECT+0x0
------------------------ Front end ------------------------
br %r14
------ IMark(0x800000DC, 2, 0) ------
PUT(336) = GET:I64(304)
PUT(336) = GET:I64(336); exit-Return
GuestBytes 800000DC 2 07 FE 000000F0
------------------------ After pre-instr IR optimisation ------------------------
IRSB {
t0:I32 t1:I64 t2:I64
------ IMark(0x800000DC, 2, 0) ------
t1 = GET:I64(304)
PUT(336) = t1; exit-Return
}
VexExpansionRatio 2 82 410 :10
==== SB 2 (evchecks 2) [tid 1] 0x800000da UNKNOWN_FUNCTION UNKNOWN_OBJECT+0x0
------------------------ Front end ------------------------
svc 1
------ IMark(0x800000DA, 2, 0) ------
t0 = 0x1:I64
PUT(344) = t0
PUT(408) = 0x800000DA:I64
PUT(336) = 0x800000DC:I64
PUT(336) = GET:I64(336); exit-Sys_syscall
GuestBytes 800000DA 2 0A 01 00000015
------------------------ After pre-instr IR optimisation ------------------------
IRSB {
t0:I64 t1:I64
------ IMark(0x800000DA, 2, 0) ------
PUT(344) = 0x1:I64
PUT(408) = 0x800000DA:I64
PUT(336) = 0x800000DC:I64; exit-Sys_syscall
}
default:
==== SB 0 (evchecks 0) [tid 1] 0x800000d4 UNKNOWN_FUNCTION UNKNOWN_OBJECT+0x0
------------------------ Front end ------------------------
brasl %r14,.+8
------ IMark(0x800000D4, 6, 0) ------
PUT(304) = 0x800000DA:I64
PUT(336) = 0x800000DC:I64
br %r14
------ IMark(0x800000DC, 2, 0) ------
PUT(336) = GET:I64(304)
PUT(336) = GET:I64(336); exit-Return
can't show code due to extents > 1
------------------------ After pre-instr IR optimisation ------------------------
IRSB {
t0:I32 t1:I64 t2:I64
------ IMark(0x800000D4, 6, 0) ------
PUT(304) = 0x800000DA:I64
------ IMark(0x800000DC, 2, 0) ------
PUT(336) = 0x800000DA:I64; exit-Return
}
VexExpansionRatio 8 110 137 :10
==== SB 1 (evchecks 1) [tid 1] 0x800000da UNKNOWN_FUNCTION UNKNOWN_OBJECT+0x0
------------------------ Front end ------------------------
svc 1
------ IMark(0x800000DA, 2, 0) ------
t0 = 0x1:I64
PUT(344) = t0
PUT(408) = 0x800000DA:I64
PUT(336) = 0x800000DC:I64
PUT(336) = GET:I64(336); exit-Sys_syscall
GuestBytes 800000DA 2 0A 01 00000015
------------------------ After pre-instr IR optimisation ------------------------
IRSB {
t0:I64 t1:I64
------ IMark(0x800000DA, 2, 0) ------
PUT(344) = 0x1:I64
PUT(408) = 0x800000DA:I64
PUT(336) = 0x800000DC:I64; exit-Sys_syscall
}
|
|
From: Josef W. <Jos...@gm...> - 2016-06-23 12:21:57
|
Am 23.06.2016 um 13:11 schrieb Christian Borntraeger: > On 06/20/2016 11:39 PM, Josef Weidendorfer wrote: >> Am 20.06.2016 um 22:26 schrieb Christian Borntraeger: >>> Simple testcase with 2 branches, both are detected: >>> >>> # cat test.s >>> .globl _start >>> _start: >>> brasl 14,test # save next address into r14, call test >>> svc 1 # exit system call >>> >>> test: >>> br 14 # jump to r14 >> >> What is the VEX IR both with/without chase value set to 0 >> for this nice small example? > > Interestingly enough for this example no different can be seen > in the counts, but the IR differs. > > >> >> I would be interested to see the IR before and after >> instrumentation, ie. "--trace-flags=01100000". > > with --vex-guest-chase-thresh=0 > ... > default: > ... Hmm. Looks as expected. It would be interesting to see the instrumentation added by cachegrind when branch simulation is switched on. Perhaps off-topic: > br %r14 > ------ IMark(0x800000DC, 2, 0) ------ > PUT(336) = GET:I64(304) > PUT(336) = GET:I64(336); exit-Return How can the translation to VEX see that this is a Return? Is %r14 expected to save the return address according to the ABI? Josef |
|
From: Christian B. <bor...@de...> - 2016-06-23 12:31:08
|
On 06/23/2016 02:21 PM, Josef Weidendorfer wrote: > Am 23.06.2016 um 13:11 schrieb Christian Borntraeger: >> On 06/20/2016 11:39 PM, Josef Weidendorfer wrote: >>> Am 20.06.2016 um 22:26 schrieb Christian Borntraeger: >>>> Simple testcase with 2 branches, both are detected: >>>> >>>> # cat test.s >>>> .globl _start >>>> _start: >>>> brasl 14,test # save next address into r14, call test >>>> svc 1 # exit system call >>>> >>>> test: >>>> br 14 # jump to r14 >>> >>> What is the VEX IR both with/without chase value set to 0 >>> for this nice small example? >> >> Interestingly enough for this example no different can be seen >> in the counts, but the IR differs. >> >> >>> >>> I would be interested to see the IR before and after >>> instrumentation, ie. "--trace-flags=01100000". >> >> with --vex-guest-chase-thresh=0 >> ... >> default: >> ... > > Hmm. Looks as expected. > > It would be interesting to see the instrumentation added by cachegrind > when branch simulation is switched on. Sure, will do in a week as I am on my way into vacation ;-) > > Perhaps off-topic: > >> br %r14 >> ------ IMark(0x800000DC, 2, 0) ------ >> PUT(336) = GET:I64(304) >> PUT(336) = GET:I64(336); exit-Return > > How can the translation to VEX see that this is a Return? > Is %r14 expected to save the return address according to the ABI? Its tricky. There are no explicit return opcodes, but abi requires to have the return address as r14 on function entry. Of course gcc is free to shuffle things around in that function, so you can see all kind of registers for return,e.g. a br %r4 at the end of the function. when gcc spilled the r14 content on the stack and loaded the return into r4 before jumping back. |
|
From: Josef W. <Jos...@gm...> - 2016-06-23 17:10:00
|
Am 23.06.2016 um 14:30 schrieb Christian Borntraeger: > On 06/23/2016 02:21 PM, Josef Weidendorfer wrote: >>> >>> br %r14 >>> ------ IMark(0x800000DC, 2, 0) ------ >>> PUT(336) = GET:I64(304) >>> PUT(336) = GET:I64(336); exit-Return >> >> How can the translation to VEX see that this is a Return? >> Is %r14 expected to save the return address according to the ABI? > > Its tricky. There are no explicit return opcodes, but abi requires to have the > return address as r14 on function entry. > Of course gcc is free to shuffle things around in that function, so you can > see all kind of registers for return,e.g. a br %r4 at the end of the function. when > gcc spilled the r14 content on the stack and loaded the return into r4 before jumping > back. He. And cachegrind's branch simulator relies on the information: for returns, no simulation is done, as it is assumed that branch prediction for returns always is correct. On the other hand, a return address stack predictor in hardware probably also would need to rely an that information. This may be enough motivation for a compiler to mostly use "br %r14" as return instruction (?) This actually is similar to x86: compilers could produce "push/ret" pairs as jumps, or "call next; next: pop" to find out the PC address, but this is bad as it confuses the return address stack prediction and slows down code... Josef |