|
From: Eliot M. <mo...@cs...> - 2020-05-09 12:17:16
|
On 5/9/2020 2:42 AM, Paul FLOYD wrote: > Hi > > I've been looking at some differences that I get when building with clang. > > One kind of difference that I see is that Helgrind displays one less element in callstacks. For instance with a GCC build I might get > > ==34086== ---Thread-Announcement------------------------------------------ > ==34086== > ==34086== Thread #2 was created > ==34086== at 0x4D144BA: thr_new (in /lib/libc.so.7) > ==34086== by 0x4C6639C: pthread_create (in /lib/libthr.so.3) > ==34086== by 0x4A5098A: pthread_create_WRK (hg_intercepts.c:433) > ==34086== by 0x4A5199C: pthread_create (hg_intercepts.c:472) > ==34086== by 0x400935: main (tc01_simple_race.c:22) > > but the same with a clang build gives > > ==37539== ---Thread-Announcement------------------------------------------ > ==37539== > ==37539== Thread #3 was created > ==37539== at 0x491E4BA: thr_new (in /lib/libc.so.7) > ==37539== by 0x487039C: pthread_create (in /lib/libthr.so.3) > ==37539== by 0x4855B44: pthread_create_WRK (hg_intercepts.c:434) > ==37539== by 0x400B7A: main (tc21_pthonce.c:87) > > (note there is no pthread_create/hg_intercepts.c line). > > I think that the cause of this is the clang codegen in the helgrind preload lib. > > Here is the GCC codegen, a classic function call > > > 000000000000999f <_vgw00000ZZ_libthrZdsoZa_pthreadZujoin>: > 999f: 55 push %rbp > 99a0: 48 89 e5 mov %rsp,%rbp > 99a3: e8 a5 c1 ff ff callq 5b4d > > 99a8: 5d pop %rbp > 99a9: c3 retq > > Clang optimizes the call and uses a jump > > 000000000000a9f0 <_vgw00000ZZ_libthrZdsoZa_pthreadZucreate>: > a9f0: 55 push %rbp > a9f1: 48 89 e5 mov %rsp,%rbp > a9f4: 5d pop %rbp > a9f5: eb 09 jmp aa00 > > a9f7: 66 0f 1f 84 00 00 00 nopw 0x0(%rax,%rax,1) > a9fe: 00 00 > > Am I right in assuming it is this 'jmp' rather than 'callq / retq' that is causing "VG_(get_StackTrace_wrk)" to see one less element in the callstack? > > The difference goes away if I force 'pthread_create' to be not optimized. > > #define PTH_FUNC(ret_ty, f, args...) \ > ret_ty I_WRAP_SONAME_FNNAME_ZZ(VG_Z_LIBPTHREAD_SONAME,f)(args); \ > __attribute__((optnone)) \ > ret_ty I_WRAP_SONAME_FNNAME_ZZ(VG_Z_LIBPTHREAD_SONAME,f)(args) > > (Not that I'm seriously suggesting that). I believe what you are seeing is called "tail call elimination" - if a function ends with a call, that call can be optimized to a jump, at least under some circumstances. This is perfectly legitimate. Regards - Eliot Moss |