|
From: Kerrick S. <ma...@ke...> - 2012-10-04 18:19:43
|
Hello, I'm running Valgrind on Mono as it executes a C# program. The C# program in question invokes several native routines in several different shared object files, and it's crashing in one of these native routines. The following is a snippet from Valgrind's output: ==3652== Thread 5: ==3652== Conditional jump or move depends on uninitialised value(s) ==3652== at 0xA55A385: ??? ==3652== by 0xA55A1F7: ??? ==3652== by 0x78C8074: ??? ==3652== by 0x8064415: mono_jit_runtime_invoke (mini.c:5791) ==3652== by 0x81AF4EE: mono_runtime_invoke (object.c:2755) ==3652== by 0x81AF8F3: mono_runtime_delegate_invoke (object.c:3420) ==3652== by 0x8202FEB: start_wrapper (threads.c:790) ==3652== by 0x82305EE: thread_start_routine (wthreads.c:287) ==3652== by 0x416FD5D: clone (clone.S:130) ==3652== ==3652== Invalid read of size 4 ==3652== at 0xA55A543: ??? ==3652== by 0xA55A1F7: ??? ==3652== by 0x78C8074: ??? ==3652== by 0x8064415: mono_jit_runtime_invoke (mini.c:5791) ==3652== by 0x81AF4EE: mono_runtime_invoke (object.c:2755) ==3652== by 0x81AF8F3: mono_runtime_delegate_invoke (object.c:3420) ==3652== by 0x8202FEB: start_wrapper (threads.c:790) ==3652== by 0x82305EE: thread_start_routine (wthreads.c:287) ==3652== by 0x416FD5D: clone (clone.S:130) ==3652== Address 0x0 is not stack'd, malloc'd or (recently) free'd ==3652== ==3652== Thread 5 return signal frame corrupted. Killing process. ==3652== ==3652== Process terminating with default action of signal 11 (SIGSEGV) ==3652== General Protection Fault ==3652== at 0x408F655: ??? (in /lib/libpthread-2.11.3.so) ==3652== by 0xA55A1F7: ??? ==3652== by 0x78C8074: ??? ==3652== by 0x8064415: mono_jit_runtime_invoke (mini.c:5791) ==3652== by 0x81AF4EE: mono_runtime_invoke (object.c:2755) ==3652== by 0x81AF8F3: mono_runtime_delegate_invoke (object.c:3420) ==3652== by 0x8202FEB: start_wrapper (threads.c:790) ==3652== by 0x82305EE: thread_start_routine (wthreads.c:287) ==3652== by 0x416FD5D: clone (clone.S:130) I can't track down the error since the stack trace doesn't indicate which shared object and function it occurs in. According to http://valgrind.org/docs/manual/faq.html#faq.unhelpful, if a shared object is unloaded before the program terminates, ??? entries will appear in the stack trace, so I'm guessing that Mono is dynamically unloading the shared object after the segfault. I'm unsure as to whether this hunch even makes sense, though. Is there anything I can do on either the Valgrind or the Mono side to get more information from the stack trace? Thanks, Kerrick |
|
From: Tom H. <to...@co...> - 2012-10-04 18:40:35
|
On 04/10/12 19:19, Kerrick Staley wrote: > I'm running Valgrind on Mono as it executes a C# program. The C# program > in question invokes several native routines in several different shared > object files, and it's crashing in one of these native routines. The > following is a snippet from Valgrind's output: > > ==3652== Thread 5: > ==3652== Conditional jump or move depends on uninitialised value(s) > ==3652== at 0xA55A385: ??? > ==3652== by 0xA55A1F7: ??? > ==3652== by 0x78C8074: ??? > ==3652== by 0x8064415: mono_jit_runtime_invoke (mini.c:5791) > ==3652== by 0x81AF4EE: mono_runtime_invoke (object.c:2755) > ==3652== by 0x81AF8F3: mono_runtime_delegate_invoke (object.c:3420) > ==3652== by 0x8202FEB: start_wrapper (threads.c:790) > ==3652== by 0x82305EE: thread_start_routine (wthreads.c:287) > ==3652== by 0x416FD5D: clone (clone.S:130) The "mono_jit_runtime_invoke" is a big clue - those will be run time generated code that the JIT has created so they won't be in the symbol table. > ==3652== Invalid read of size 4 > ==3652== at 0xA55A543: ??? > ==3652== by 0xA55A1F7: ??? > ==3652== by 0x78C8074: ??? > ==3652== by 0x8064415: mono_jit_runtime_invoke (mini.c:5791) > ==3652== by 0x81AF4EE: mono_runtime_invoke (object.c:2755) > ==3652== by 0x81AF8F3: mono_runtime_delegate_invoke (object.c:3420) > ==3652== by 0x8202FEB: start_wrapper (threads.c:790) > ==3652== by 0x82305EE: thread_start_routine (wthreads.c:287) > ==3652== by 0x416FD5D: clone (clone.S:130) > ==3652== Address 0x0 is not stack'd, malloc'd or (recently) free'd Something has tried to read through a null pointer. > ==3652== Thread 5 return signal frame corrupted. Killing process. I think this means that something in a signal handler has written to bits of the stack it shouldn't have, corrupting the information that is stored there for use when returning from the signal. Tom -- Tom Hughes (to...@co...) http://compton.nu/ |
|
From: Philippe W. <phi...@sk...> - 2012-10-04 18:50:23
|
On Thu, 2012-10-04 at 13:19 -0500, Kerrick Staley wrote: > I can't track down the error since the stack trace doesn't indicate > which shared object and function it occurs in. > > According to http://valgrind.org/docs/manual/faq.html#faq.unhelpful, > if a shared object is unloaded before the program terminates, ??? > entries will appear in the stack trace, so I'm guessing that Mono is > dynamically unloading the shared object after the segfault. I'm unsure > as to whether this hunch even makes sense, though. Is there anything I > can do on either the Valgrind or the Mono side to get more information > from the stack trace? To my knowledge, ??? can only appear for stacktraces which are produced after the object is unloaded (e.g. for stacktraces for leaks). Your stacktraces are for errors which are reported directly. I suppose that a shared object is not unloaded while it is being executed (i.e. is on the call stack) as this would not behave properly I guess. I know close to 0 about C# but IIUC, C# is typically run in by in a JITted environment. If that is effectively the case, have you given the argument --smc-check=all (or =all-non-file) ? One of these two is for sure mandatory in a JITted env (on x86/amd64 at least). Otherwise, you might always try using gdb/vgdb to connect to the process under Valgrind when the error is raised : gdb might maybe help to see what is going on. Philippe |
|
From: Kerrick S. <ma...@ke...> - 2012-10-09 17:10:45
|
On Thu, Oct 4, 2012 at 1:40 PM, Tom Hughes <to...@co...> wrote: > The "mono_jit_runtime_invoke" is a big clue - those will be run time > generated code that the JIT has created so they won't be in the symbol table. Ah, that makes a lot more sense than my theory. On Thu, Oct 4, 2012 at 1:50 PM, Philippe Waroquiers <phi...@sk...> wrote: > If that is effectively the case, have you given > the argument --smc-check=all (or =all-non-file) ? > One of these two is for sure mandatory in a JITted env (on x86/amd64 > at least). Yes, I'm passing this argument. > Otherwise, you might always try using gdb/vgdb to connect to the process > under Valgrind when the error is raised : gdb might maybe help > to see what is going on. You mean I should use --db-attach=yes (as Greg suggested)? On Thu, Oct 4, 2012 at 5:38 PM, Gregory Junker <gj...@da...> wrote: > Well, for starters, you can look at what line 5791 in mini.c is doing: > > https://github.com/mono/mono/blob/master/mono/mini/mini.c Not everyone runs master :) https://github.com/mono/mono/blob/mono-2-10-9/mono/mini/mini.c#L5791 It looks like the line is calling runtime_invoke, which means it's running JIT code (as others suggested). > One thing you can do with Valgrind is have it break to debugger on errors, > by supplying the --attach-db=yes option. If you still can't get function > names in GDB, you may be able to get module names (the addresses look to be > in different modules than the Mono runtime), and from there you might have > to get your hands dirty with nm or objdump. OK, I'll try seeing what I can turn up with gdb. Initial results aren't promising; when I try the disassemble command at the site of the error, it tells me "No function contains program counter for selected frame". On Fri, Oct 5, 2012 at 1:11 PM, Rodrigo Kumpera <ku...@gm...> wrote: > Mono features a JIT that dynamically generates code and those unknown frames > are precisely that. > Look at the man page to see how to enable valgrind integration to show > actual names instead. Per the manpage, I tried setting MONO_XDEBUG=true and passing --debug=gdb to Mono; neither caused the ???'s to be replaced with useful information. Mono generated an xdb.il and an xdb.s file when I did this, but it doesn't seem like GDB can use these files. I'm going to see if I can get anywhere with the --aot flag (enabling ahead-of-time compilation). I also noticed this blog post describing Valgrind integration for Mono: http://tirania.org/blog/archive/2007/Jun-29.html but it doesn't appear that that patch ever made it into Mono: https://github.com/search?q=repo%3Amono%2Fmono+JitEntry&type=Code - Kerrick |
|
From: Philippe W. <phi...@sk...> - 2012-10-09 21:02:15
|
On Tue, 2012-10-09 at 11:40 -0500, Kerrick Staley wrote: > > Otherwise, you might always try using gdb/vgdb to connect to the process > > under Valgrind when the error is raised : gdb might maybe help > > to see what is going on. > > You mean I should use --db-attach=yes (as Greg suggested)? Since Valgrind 3.7.0, Valgrind contains an embedded gdbserver, to which you connect from gdb using vgdb as a relay application. The advantages of gdb/vgdb compared to --db-attach is that you get all the usual gdb commands (e.g. breakpoints, continue, jump, inferior function calls, ...) + interactive calls of Valgrind functionality (e.g. search for memleaks e.g. when a breakpoint is reached). vgdb also allows to look at a multi-threaded application, allows inferior function calls, etc. You can also start to debug your application under Valgrind from the beginning (so, before an error has been reported). To use it, give argument --vgdb-error=0 to Valgrind, and follow the instructions to connect your gdb using vgdb. Now, not clear this will help you a lot :(. Maybe there is a function in the mono environment to map a "JITted program counter" to a source line ? Then you could call it from gdb/vgdb to translate these ??? addresses to the source lines that was JITed to these instructions. Philippe |
|
From: Kerrick S. <ma...@ke...> - 2012-10-13 03:25:11
|
Miraculously, just as all hope seemed lost, I discovered that ahead-of-time indeed causes a crash, but before the crash a shared object file is written containing the compiled code. The presence of this shared object file apparently causes function call information to appear in stacktraces (when using just-in-time). Here's a question more for the Mono people: before any of these crashes, I get some errors in Valgrind, all of them with the same message: ==27672== Conditional jump or move depends on uninitialised value(s) ==27672== at 0x81D736F: conservatively_pin_objects_from (sgen-gc.c:2036) In GDB, this is the first frame of the corresponding backtrace: #0 0x081d736f in conservatively_pin_objects_from (start=0xbec83a20, end=<value optimized out>, start_nursery=0x4800000, end_nursery=0x4c00000, pin_type=0) at sgen-gc.c:2036 This message is printed 4 times: twice for line 2016, and twice for line 2036 (all at different instruction addresses). All errors are apparently from the same invocation (same arguments). The code is here: https://github.com/mono/mono/blob/mono-2-10-9/mono/metadata/sgen-gc.c#L2016 I've been assuming that this isn't a bug but rather is is the result of an omission in Mono's Valgrind suppression file. I'm not entirely sure, though; could this error ultimately be causing the crash? Also, thanks Phillippe for the info about --vgdb-error=0; I tried it and it worked, but I haven't had a chance to delve into the additional functionality vgdb provides. - Kerrick |