|
From: Christian B. <bor...@de...> - 2014-10-09 18:00:22
|
Am 09.10.2014 16:19, schrieb Josef Weidendorfer:
> Am 09.10.2014 um 10:33 schrieb Christian Borntraeger:
>> Am 08.10.2014 20:26, schrieb Josef Weidendorfer:
>>> Am 08.10.2014 um 13:29 schrieb Christian Borntraeger:
>>>>>> many-xpts valgrind-new:0.07s no: 0.6s ( 9.0x, -----) [...] ca:374.8s (5353.6x, -----) [...]
>>>>>> many-xpts valgrind-old:0.07s no: 0.6s ( 9.0x, 0.0%) [...] ca:371.9s (5312.4x, 0.8%) [...]
>>>>
>>> Can you send me the a callgrind.out result of a Callgrind run of
>>> many-xpts on s390?
>>
>>
>> see attachement.
>
> If I run callgrind on many-xpts here on my laptop, it just calls a1 from a0.
> In your file, there are recursive calls of a0 to a0 with huge call cost.
>
> How does the disassembled code for a0 look like?
>
> Josef
Its pretty straightforward:
stmg save register to memory (r15 is stack)
lmg loads them back
tm?? is test under mask (testing for bits)
j* are jumps
brasl is branch and save long. here r14 is filled with the next instruction address (r14 is return register)
0000000080000938 <a1>:
80000938: eb ef f0 70 00 24 stmg %r14,%r15,112(%r15)
8000093e: a7 fb ff 60 aghi %r15,-160
80000942: a7 21 00 02 tmll %r2,2
80000946: a7 84 00 07 je 80000954 <a1+0x1c>
8000094a: c0 e5 ff ff ff e1 brasl %r14,8000090c <a2>
80000950: a7 f4 00 05 j 8000095a <a1+0x22>
80000954: c0 e5 ff ff ff dc brasl %r14,8000090c <a2>
8000095a: eb ef f1 10 00 04 lmg %r14,%r15,272(%r15)
80000960: 07 fe br %r14
80000962: 07 07 nopr %r7
0000000080000964 <a0>:
80000964: eb ef f0 70 00 24 stmg %r14,%r15,112(%r15)
8000096a: a7 fb ff 60 aghi %r15,-160
8000096e: a7 21 00 01 tmll %r2,1
80000972: a7 84 00 07 je 80000980 <a0+0x1c>
80000976: c0 e5 ff ff ff e1 brasl %r14,80000938 <a1>
8000097c: a7 f4 00 05 j 80000986 <a0+0x22>
80000980: c0 e5 ff ff ff dc brasl %r14,80000938 <a1>
80000986: eb ef f1 10 00 04 lmg %r14,%r15,272(%r15)
8000098c: 07 fe br %r14
8000098e: 07 07 nopr %r7
0000000080000990 <main>:
80000990: eb bf f0 58 00 24 stmg %r11,%r15,88(%r15)
80000996: c0 d0 00 00 00 c1 larl %r13,80000b18 <_IO_stdin_used+0x18>
8000099c: a7 fb ff 60 aghi %r15,-160
800009a0: a7 b8 00 00 lhi %r11,0
800009a4: a5 ce 00 04 llilh %r12,4
800009a8: b9 14 00 2b lgfr %r2,%r11
800009ac: c0 e5 ff ff ff dc brasl %r14,80000964 <a0>
800009b2: a7 ba 00 01 ahi %r11,1
800009b6: a7 c6 ff f9 brct %r12,800009a8 <main+0x18>
800009ba: a7 b8 00 00 lhi %r11,0
800009be: 58 c0 d0 00 l %r12,0(%r13)
800009c2: a7 29 00 ea lghi %r2,234
800009c6: c0 e5 ff ff ff b9 brasl %r14,80000938 <a1>
800009cc: c0 e5 ff ff fd 88 brasl %r14,800004dc <free@plt>
800009d2: a7 29 00 6f lghi %r2,111
800009d6: c0 e5 ff ff ff 9b brasl %r14,8000090c <a2>
800009dc: c0 e5 ff ff fd 80 brasl %r14,800004dc <free@plt>
800009e2: a7 ba 00 01 ahi %r11,1
800009e6: a7 c6 ff ee brct %r12,800009c2 <main+0x32>
800009ea: a7 29 00 00 lghi %r2,0
800009ee: e3 40 f1 10 00 04 lg %r4,272(%r15)
800009f4: eb bf f0 f8 00 04 lmg %r11,%r15,248(%r15)
800009fa: 07 f4 br %r4
Let me know if you need the VEX tracing.
|