|
From: Konstantin S. <kon...@gm...> - 2009-02-06 07:31:15
|
Hi Valgrind developers, I've seen the recent checkins related to stack unwinding. Are they expected to fix all known issues with stack traces on x86_64? This is what I see with the fresh trunk: ==27019== ==27019== Use of uninitialised value of size 8 ==27019== at 0xBD3214F: something_meaningful ==27019== by 0x7A2C730C750BD1E8: ??? ==27019== by 0xD075D9E9E5D0101: ??? ==27019== by 0x4EAF8BD8815341D7: ??? ==27019== by 0x262956DDE630E67F: ??? ==27019== by 0xEBD8C81E497E5F10: ??? ==27019== by 0x6C6DCB5AF54DD8FF: ??? ==27019== by 0xBF7C9DAE3F6B4DF1: ??? ==27019== by 0x7094A51C580E33E: ??? ==27019== by 0x691AAF338D45C9C8: ??? ==27019== by 0x96D13F92611B33C5: ??? ==27019== by 0x1A2633D708F8FA01: ??? ==27019== ==27019== Use of uninitialised value of size 8 ==27019== at 0xBD3214F: something_meaningful ==27019== by 0x7A2C730C750BD1E8: ??? ==27019== by 0xD075D9E9E5D0101: ??? ==27019== by 0xF: ??? ==27019== by 0xF: ??? ==27019== by 0xF: ??? ==27019== by 0xF: ??? ==27019== by 0xF: ??? ==27019== by 0xF: ??? ==27019== by 0xF: ??? ==27019== by 0xF: ??? ==27019== by 0xF: ??? Is there anything I can provide you to help debug this? Thanks, --kcc |
|
From: Julian S. <js...@ac...> - 2009-02-06 10:17:28
|
On Friday 06 February 2009, Konstantin Serebryany wrote: > Hi Valgrind developers, > > I've seen the recent checkins related to stack unwinding. > Are they expected to fix all known issues with stack traces on x86_64? Yes. > ==27019== > ==27019== Use of uninitialised value of size 8 > ==27019== at 0xBD3214F: something_meaningful Urr. > Is there anything I can provide you to help debug this? First, try out the 3.4 branch: svn co svn://svn.valgrind.org/valgrind/branches/VALGRIND_3_4_BRANCH branch34 See if it still has the problem. Secondly, do the usual two things: 1. See if you can make a small test case that shows it. Without that it will be close to impossible to fix. 2. If (1) does not work, do a binary search on the svn versions to find the commit number that caused the problem. J |
|
From: Konstantin S. <kon...@gm...> - 2009-02-06 11:11:57
|
> svn co svn://svn.valgrind.org/valgrind/branches/VALGRIND_3_4_BRANCH branch34 > > See if it still has the problem. Same thing. > > Secondly, do the usual two things: > > 1. See if you can make a small test case that shows it. Without that > it will be close to impossible to fix. > That's a challenge. :) Right now I have only one reproducer somewhere inside the openssl's assembly file: ==11896== Use of uninitialised value of size 8 ==11896== at 0xC32B140: bn_mul_mont (x86_64-mont.s:151) ==11896== by 0xEFD8ADCFE9793F71: ??? ==11896== by 0x4DC04AA2FB5DAAB0: ??? ==11896== by 0xB18F5B34F8340518: ??? ==11896== by 0x9629706EA81DAD54: ??? ... Maybe this file (ftp://ftp.free.fr/.mirrors1/ftp.netbsd.org/NetBSD-current/src/crypto/dist/openssl/crypto/bn/asm/x86_64-mont.pl) has some valgrind-unfriendly stuff? Will see if I can get something simpler... --kcc > 2. If (1) does not work, do a binary search on the svn versions to find > the commit number that caused the problem. > > J > |
|
From: Julian S. <js...@ac...> - 2009-02-06 11:25:41
|
> That's a challenge. :) > Right now I have only one reproducer somewhere inside the openssl's > assembly file: > > ==11896== Use of uninitialised value of size 8 > ==11896== at 0xC32B140: bn_mul_mont (x86_64-mont.s:151) Ah, handwritten assembly. A known source of problems. If the authors did not also write by hand, correct unwind information, then unwinding will have problems. Next step is to get gdb to stop at that precise instruction and see if can unwind the stack. (Maybe simplest to use --db-attach=yes). J |
|
From: Tom H. <to...@co...> - 2009-02-06 11:27:42
|
Konstantin Serebryany wrote: > That's a challenge. :) > Right now I have only one reproducer somewhere inside the openssl's > assembly file: > > ==11896== Use of uninitialised value of size 8 > ==11896== at 0xC32B140: bn_mul_mont (x86_64-mont.s:151) > ==11896== by 0xEFD8ADCFE9793F71: ??? > ==11896== by 0x4DC04AA2FB5DAAB0: ??? > ==11896== by 0xB18F5B34F8340518: ??? > ==11896== by 0x9629706EA81DAD54: ??? > ... That's a hand crafted assembler routine, so unless the author has either taken the trouble to setup a traditional x86 stack frame by pushing the frame pointer, or has added DWARF declarations to describe how to unwind the stack, then valgrind won't be able to unwind out of it. Can gdb unwind out of that function if you set a break point inside it? Tom -- Tom Hughes (to...@co...) http://www.compton.nu/ |
|
From: Konstantin S. <kon...@gm...> - 2009-02-06 12:09:04
|
On Fri, Feb 6, 2009 at 2:27 PM, Tom Hughes <to...@co...> wrote:
> Konstantin Serebryany wrote:
>
>> That's a challenge. :)
>> Right now I have only one reproducer somewhere inside the openssl's
>> assembly file:
>>
>> ==11896== Use of uninitialised value of size 8
>> ==11896== at 0xC32B140: bn_mul_mont (x86_64-mont.s:151)
>> ==11896== by 0xEFD8ADCFE9793F71: ???
>> ==11896== by 0x4DC04AA2FB5DAAB0: ???
>> ==11896== by 0xB18F5B34F8340518: ???
>> ==11896== by 0x9629706EA81DAD54: ???
>> ...
>
> That's a hand crafted assembler routine, so unless the author has either
> taken the trouble to setup a traditional x86 stack frame by pushing the
> frame pointer, or has added DWARF declarations to describe how to unwind the
> stack, then valgrind won't be able to unwind out of it.
>
> Can gdb unwind out of that function if you set a break point inside it?
If I run the program under gdb and set a break point in that function,
gdb can unwind..
Breakpoint 1, bn_mul_mont () at x86_64-mont.s:7
7 x86_64-mont.s: No such file or directory.
in x86_64-mont.s
Current language: auto; currently asm
(gdb) bt
#0 bn_mul_mont () at x86_64-mont.s:7
#1 0x00002aaab23cabe5 in BN_mod_mul_montgomery (r=0x2aaab692f220,
a=0x2aaab737a718, b=0x2aaab7374770, mont=0x2aaab7374768,
ctx=0x2aaab698fcc8) at bn_mont.c:159
#2 0x00002aaab23c089d in BN_mod_exp_mont (rr=0x2aaab737a718,
a=0x2aaab737a718, p=0x2aaab737a698, m=<value optimized out>,
ctx=0x2aaab698fcc8, in_mont=0x0) at bn_exp.c:434
#3 0x00002aaab23c6431 in BN_BLINDING_create_param (b=0x0, e=<value
optimized out>, m=<value optimized out>, ctx=0x2aaab698fcc8,
bn_mod_exp=0x2aaab23c0630 <BN_mod_exp_mont>, m_ctx=0x0) at
bn_blind.c:352
#4 0x00002aaab23dd66f in RSA_setup_blinding (rsa=0x2aaab69604f8,
in_ctx=0x0) at rsa_lib.c:424
#5 0x00002aaab23dd89e in RSA_blinding_on (rsa=0x2aaab69604f8,
ctx=0x0) at rsa_lib.c:337
....
If I run valgrind with --db-attach=yes, gdb can't unwind.
(gdb) bt
#0 bn_mul_mont () at x86_64-mont.s:151
#1 0xc798652805958912 in ?? ()
#2 0x45729487728cd440 in ?? ()
#3 0x9d5b9d6a21ce321c in ?? ()
#4 0xbc472223dd03bce1 in ?? ()
#5 0xf832d2e8fb669bc2 in ?? ()
#6 0xdcfeae38f9da1b0d in ?? ()
#7 0x87511babdc7fa779 in ?? ()
So, I'll let you know if I find cases w/o hand written assembly.
Thanks for explanation!
--kcc
>
> Tom
>
> --
> Tom Hughes (to...@co...)
> http://www.compton.nu/
>
|
|
From: Julian S. <js...@ac...> - 2009-02-06 12:13:54
|
> >> ==11896== Use of uninitialised value of size 8 > >> ==11896== at 0xC32B140: bn_mul_mont (x86_64-mont.s:151) > >> ==11896== by 0xEFD8ADCFE9793F71: ??? > >> ==11896== by 0x4DC04AA2FB5DAAB0: ??? > >> ==11896== by 0xB18F5B34F8340518: ??? > >> ==11896== by 0x9629706EA81DAD54: ??? > >> ... > > > > That's a hand crafted assembler routine, so unless the author has either > > taken the trouble to setup a traditional x86 stack frame by pushing the > > frame pointer, or has added DWARF declarations to describe how to unwind > > the stack, then valgrind won't be able to unwind out of it. > > > > Can gdb unwind out of that function if you set a break point inside it? > > If I run the program under gdb and set a break point in that function, > gdb can unwind.. > > Breakpoint 1, bn_mul_mont () at x86_64-mont.s:7 > 7 x86_64-mont.s: No such file or directory. > in x86_64-mont.s > Current language: auto; currently asm > (gdb) bt > #0 bn_mul_mont () at x86_64-mont.s:7 > #1 0x00002aaab23cabe5 in BN_mod_mul_montgomery (r=0x2aaab692f220, > a=0x2aaab737a718, b=0x2aaab7374770, mont=0x2aaab7374768, > ctx=0x2aaab698fcc8) at bn_mont.c:159 You need to check that GDB can unwind from that specific instruction, not just from somewhere inside the function. Since the page number will be different, but the page offset will be the same, you need to get GDB to the instruction whose lowest 12 bits is 0x140 (since Valgrind reports 0xC32B140) and see if you can unwind from there. J |
|
From: Konstantin S. <kon...@gm...> - 2009-02-06 12:58:17
|
> You need to check that GDB can unwind from that specific instruction, > not just from somewhere inside the function. Since the page number > will be different, but the page offset will be the same, you need to > get GDB to the instruction whose lowest 12 bits is 0x140 (since > Valgrind reports 0xC32B140) and see if you can unwind from there. Indeed... If I break at the function in gdb, I can get the stack trace. But after typing 'n' several times, I loose this ability. wow! --kcc |
|
From: Julian S. <js...@ac...> - 2009-02-06 14:39:36
|
On Friday 06 February 2009, Konstantin Serebryany wrote: > > You need to check that GDB can unwind from that specific instruction, > > not just from somewhere inside the function. Since the page number > > will be different, but the page offset will be the same, you need to > > get GDB to the instruction whose lowest 12 bits is 0x140 (since > > Valgrind reports 0xC32B140) and see if you can unwind from there. > > Indeed... > If I break at the function in gdb, I can get the stack trace. > But after typing 'n' several times, I loose this ability. > wow! Yes. It's a common problem for handwritten assembly on x86_64 Linux. J |
|
From: Konstantin S. <kon...@gm...> - 2009-02-06 13:08:33
|
Just checking: the mechanism used to get stack traces in exp-ptrcheck will *not* be confused by such hand-written assembly, right? --kcc On Fri, Feb 6, 2009 at 3:58 PM, Konstantin Serebryany <kon...@gm...> wrote: >> You need to check that GDB can unwind from that specific instruction, >> not just from somewhere inside the function. Since the page number >> will be different, but the page offset will be the same, you need to >> get GDB to the instruction whose lowest 12 bits is 0x140 (since >> Valgrind reports 0xC32B140) and see if you can unwind from there. > > Indeed... > If I break at the function in gdb, I can get the stack trace. > But after typing 'n' several times, I loose this ability. > wow! > > --kcc > |
|
From: Tom H. <to...@co...> - 2009-02-06 13:45:33
|
Konstantin Serebryany wrote: > Just checking: the mechanism used to get stack traces in exp-ptrcheck > will *not* be confused by such hand-written assembly, right? Of course it will. To start with we only have one piece of code for getting stack traces, and that is used everywhere. Plus if we had a magic way to get the stack trace in one place we'd use it in all the other places as well... Tom -- Tom Hughes (to...@co...) http://www.compton.nu/ |
|
From: Konstantin S. <kon...@gm...> - 2009-02-06 13:47:36
|
On Fri, Feb 6, 2009 at 4:45 PM, Tom Hughes <to...@co...> wrote: > Konstantin Serebryany wrote: > >> Just checking: the mechanism used to get stack traces in exp-ptrcheck >> will *not* be confused by such hand-written assembly, right? > > Of course it will. To start with we only have one piece of code for getting > stack traces, and that is used everywhere. Plus if we had a magic way to get > the stack trace in one place we'd use it in all the other places as well... exp-ptrcheck has it's own stack trace machinery. (right, Julian?) It does not unwind the stack, instead it tracks each call/return (roughly speaking). ThreadSanitizer does the same. --kcc > > Tom > > -- > Tom Hughes (to...@co...) > http://www.compton.nu/ > |
|
From: Julian S. <js...@ac...> - 2009-02-06 14:20:37
|
On Friday 06 February 2009, Konstantin Serebryany wrote: > On Fri, Feb 6, 2009 at 4:45 PM, Tom Hughes <to...@co...> wrote: > > Konstantin Serebryany wrote: > >> Just checking: the mechanism used to get stack traces in exp-ptrcheck > >> will *not* be confused by such hand-written assembly, right? > > > > Of course it will. To start with we only have one piece of code for > > getting stack traces, and that is used everywhere. Plus if we had a magic > > way to get the stack trace in one place we'd use it in all the other > > places as well... > > exp-ptrcheck has it's own stack trace machinery. (right, Julian?) > It does not unwind the stack, instead it tracks each call/return > (roughly speaking). > ThreadSanitizer does the same. You're both right :-) exp-ptrcheck uses the same scheme as Callgrind has for years, to track call and return instructions and thereby create a shadow stack. I believe this will not be confused by missing CFI data since it does not use it. However, exp-ptrcheck also uses the "standard" stack unwinding to construct error messages, and so that will be confused, yes. J |