|
From: Joël K. <jo...@we...> - 2013-11-25 22:27:32
|
Hi, I get this nice error message out of valgrind while starting or using my application and I don't know why. http://sf.net/p/ags Please help. --9862-- memcheck GC: 2827 nodes, 2355 survivors ( 83.3%) --9862-- memcheck GC: 3997 new table size (stepup) vex amd64->IR: unhandled instruction bytes: 0xC6 0xF8 0xFD 0xF 0xB7 0x6 0x66 0x85 vex amd64->IR: REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0 vex amd64->IR: VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=NONE vex amd64->IR: PFX.66=0 PFX.F2=0 PFX.F3=0 ==9862== valgrind: Unrecognised instruction at address 0x34f3411390. ==9862== at 0x34F3411390: __lll_trylock_elision (in /usr/lib64/libpthread-2.18.so) ==9862== by 0x34F340A22B: pthread_mutex_trylock (in /usr/lib64/libpthread-2.18.so) ==9862== by 0x4A1264: ags_gui_thread_do_gtk_iteration.60959 (ags_gui_thread.c:199) ==9862== by 0x4A1539: ags_gui_thread_run (ags_gui_thread.c:251) ==9862== by 0x34F70104C6: ??? (in /usr/lib64/libgobject-2.0.so.0.3800.2) ==9862== by 0x34F7029748: g_signal_emit_valist (in /usr/lib64/libgobject-2.0.so.0.3800.2) ==9862== by 0x34F702A3AE: g_signal_emit (in /usr/lib64/libgobject-2.0.so.0.3800.2) ==9862== by 0x49F625: ags_thread_run (ags_thread.c:1720) ==9862== by 0x49F99E: ags_thread_loop (ags_thread.c:1643) ==9862== by 0x34F3407F32: start_thread (in /usr/lib64/libpthread-2.18.so) ==9862== by 0x34F2CF4EAC: clone (in /usr/lib64/libc-2.18.so) ==9862== Your program just tried to execute an instruction that Valgrind ==9862== did not recognise. There are two possible reasons for this. ==9862== 1. Your program has a bug and erroneously jumped to a non-code ==9862== location. If you are running Memcheck and you just saw a ==9862== warning about a bad jump, it's probably your program's fault. ==9862== 2. The instruction is legitimate but Valgrind doesn't handle it, ==9862== i.e. it's Valgrind's fault. If you think this is the case or ==9862== you are not sure, please let us know and we'll try to fix it. ==9862== Either way, Valgrind will now raise a SIGILL signal which will ==9862== probably kill your program. ==9862== ==9862== Process terminating with default action of signal 4 (SIGILL) ==9862== Illegal opcode at address 0x34F3411390 ==9862== at 0x34F3411390: __lll_trylock_elision (in /usr/lib64/libpthread-2.18.so) ==9862== by 0x34F340A22B: pthread_mutex_trylock (in /usr/lib64/libpthread-2.18.so) ==9862== by 0x4A1264: ags_gui_thread_do_gtk_iteration.60959 (ags_gui_thread.c:199) ==9862== by 0x4A1539: ags_gui_thread_run (ags_gui_thread.c:251) ==9862== by 0x34F70104C6: ??? (in /usr/lib64/libgobject-2.0.so.0.3800.2) ==9862== by 0x34F7029748: g_signal_emit_valist (in /usr/lib64/libgobject-2.0.so.0.3800.2) ==9862== by 0x34F702A3AE: g_signal_emit (in /usr/lib64/libgobject-2.0.so.0.3800.2) ==9862== by 0x49F625: ags_thread_run (ags_thread.c:1720) ==9862== by 0x49F99E: ags_thread_loop (ags_thread.c:1643) ==9862== by 0x34F3407F32: start_thread (in /usr/lib64/libpthread-2.18.so) ==9862== by 0x34F2CF4EAC: clone (in /usr/lib64/libc-2.18.so) ==9862== ==9862== HEAP SUMMARY: ==9862== in use at exit: 2,241,959 bytes in 29,002 blocks ==9862== total heap usage: 91,386 allocs, 62,384 frees, 7,205,816 bytes allocated ==9862== ==9862== Searching for pointers to 27,363 not-freed blocks ==9862== Checked 36,580,456 bytes ==9862== ==9862== LEAK SUMMARY: ==9862== definitely lost: 2,909 bytes in 41 blocks ==9862== indirectly lost: 14,861 bytes in 633 blocks ==9862== possibly lost: 66,719 bytes in 921 blocks ==9862== still reachable: 1,865,142 bytes in 25,768 blocks ==9862== suppressed: 0 bytes in 0 blocks ==9862== Rerun with --leak-check=full to see details of leaked memory ==9862== ==9862== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 2 from 2) --9862-- --9862-- used_suppression: 2 glibc-2.5.x-on-SUSE-10.2-(PPC)-2a /usr/lib64/valgrind/default.supp:1286 ==9862== ==9862== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 2 from 2) |
|
From: Tom H. <to...@co...> - 2013-11-25 22:40:53
|
On 25/11/13 22:06, Joël Krähemann wrote: > Hi, I get this nice error message out of valgrind while starting or > using my application and I don't know why. > > http://sf.net/p/ags > > Please help. Given the location of the instruction I'm guessing you have a Haswell processor and you've built glibc with transactional memory support so that it is trying to do lock ellision using the transactional memory instructions. The transactional memory instructions aren't fully supported in valgrind yet, so you'll probably have to rebuild glibc without that support for now. Tom -- Tom Hughes (to...@co...) http://compton.nu/ |
|
From: Mark W. <mj...@re...> - 2013-11-25 23:19:04
|
On Mon, Nov 25, 2013 at 10:40:40PM +0000, Tom Hughes wrote: > On 25/11/13 22:06, Joël Krähemann wrote: > > > Hi, I get this nice error message out of valgrind while starting or > > using my application and I don't know why. > > > > http://sf.net/p/ags > > > > Please help. > > Given the location of the instruction I'm guessing you have a Haswell > processor and you've built glibc with transactional memory support so > that it is trying to do lock ellision using the transactional memory > instructions. > > The transactional memory instructions aren't fully supported in valgrind > yet, so you'll probably have to rebuild glibc without that support for now. This is somewhat unfortunate. We do have prelimenary support, but only implemented xstart and xtest to indicate the transaction always immediately fails. The instruction hit is xabort, which in the valgrind case would always just be a NOP since no transaction is (can be) active. If there isn't a bug report for XABORT not being implemented, please do file one. We really should add it. Thanks, Mark |
|
From: Joël K. <jo...@we...> - 2013-11-26 08:25:28
|
Am Dienstag, den 26.11.2013, 00:18 +0100 schrieb Mark Wielaard: > On Mon, Nov 25, 2013 at 10:40:40PM +0000, Tom Hughes wrote: > > On 25/11/13 22:06, Joël Krähemann wrote: > > > > > Hi, I get this nice error message out of valgrind while starting or > > > using my application and I don't know why. > > > > > > http://sf.net/p/ags > > > > > > Please help. > > > > Given the location of the instruction I'm guessing you have a Haswell > > processor and you've built glibc with transactional memory support so > > that it is trying to do lock ellision using the transactional memory > > instructions. I'm using g_atomic_int_set, g_atomic_int_get, g_atomic_int_xor and g_atomic_int_and to implement thread synchronization. Could theses functions be the problem or aren't they working proper? > > > > The transactional memory instructions aren't fully supported in valgrind > > yet, so you'll probably have to rebuild glibc without that support for now. > > This is somewhat unfortunate. We do have prelimenary support, but only > implemented xstart and xtest to indicate the transaction always immediately > fails. The instruction hit is xabort, which in the valgrind case would > always just be a NOP since no transaction is (can be) active. > > If there isn't a bug report for XABORT not being implemented, please do > file one. We really should add it. Without running my application within valgrind I get the following out of GDB. ***MEMORY-ERROR***: ags[14888]: GSlice: assertion failed: sinfo->n_allocated > 0 Program received signal SIGABRT, Aborted. [Switching to Thread 0x7fffe37fe700 (LWP 14892)] 0x00000034f2c35c59 in raise () from /lib64/libc.so.6 (gdb) bt #0 0x00000034f2c35c59 in raise () from /lib64/libc.so.6 #1 0x00000034f2c37368 in abort () from /lib64/libc.so.6 #2 0x00000034f5c1ab12 in mem_error () from /lib64/libglib-2.0.so.0 #3 0x00000034f5c6453b in slab_allocator_free_chunk () from /lib64/libglib-2.0.so.0 #4 0x00000034f5c647df in magazine_cache_push_magazine () from /lib64/libglib-2.0.so.0 #5 0x00000034f5c1b0ab in thread_memory_magazine2_unload.isra.11 () from /lib64/libglib-2.0.so.0 #6 0x00000034f5c656a8 in g_slice_free1 () from /lib64/libglib-2.0.so.0 #7 0x0000003509632de8 in _gtk_tree_data_list_header_free () from /lib64/libgtk-x11-2.0.so.0 #8 0x00000035094fdc7e in gtk_file_system_model_finalize () from /lib64/libgtk-x11-2.0.so.0 #9 0x00000034f7014fcb in g_object_unref () from /lib64/libgobject-2.0.so.0 #10 0x00000035094e4899 in recent_clear_model () from /lib64/libgtk-x11-2.0.so.0 #11 0x00000035094eb386 in operation_mode_set () from /lib64/libgtk-x11-2.0.so.0 #12 0x00000035094ec8b2 in shortcuts_activate_iter () from /lib64/libgtk-x11-2.0.so.0 #13 0x00000035094ecb33 in shortcuts_selection_changed_cb () from /lib64/libgtk-x11-2.0.so.0 #14 0x00000034f70104c7 in _g_closure_invoke_va () from /lib64/libgobject-2.0.so.0 #15 0x00000034f7029749 in g_signal_emit_valist () from /lib64/libgobject-2.0.so.0 #16 0x00000034f702a3af in g_signal_emit () from /lib64/libgobject-2.0.so.0 #17 0x00000035096582a6 in gtk_tree_view_real_set_cursor () from /lib64/libgtk-x11-2.0.so.0 #18 0x000000350965d175 in gtk_tree_view_button_press () from /lib64/libgtk-x11-2.0.so.0 #19 0x00000035095485bc in _gtk_marshal_BOOLEAN__BOXED () from /lib64/libgtk-x11-2.0.so.0 #20 0x00000034f7010298 in g_closure_invoke () from /lib64/libgobject-2.0.so.0 #21 0x00000034f702211b in signal_emit_unlocked_R () from /lib64/libgobject-2.0.so.0 #22 0x00000034f7029ddd in g_signal_emit_valist () from /lib64/libgobject-2.0.so.0 #23 0x00000034f702a3af in g_signal_emit () from /lib64/libgobject-2.0.so.0 #24 0x0000003509677994 in gtk_widget_event_internal () from /lib64/libgtk-x11-2.0.so.0 #25 0x00000035095467e4 in gtk_propagate_event () from /lib64/libgtk-x11-2.0.so.0 #26 0x0000003509546bdb in gtk_main_do_event () from /lib64/libgtk-x11-2.0.so.0 #27 0x000000350846046c in gdk_event_dispatch () from /lib64/libgdk-x11-2.0.so.0 #28 0x00000034f5c492a6 in g_main_context_dispatch () from /lib64/libglib-2.0.so.0 #29 0x00000034f5c49628 in g_main_context_iterate.isra.24 () from /lib64/libglib-2.0.so.0 #30 0x00000034f5c496dc in g_main_context_iteration () from /lib64/libglib-2.0.so.0 #31 0x00000000004a1205 in ags_gui_thread_do_gtk_iteration () at ./src/ags/thread/ags_gui_thread.c:221 #32 0x00000000004a14ba in ags_gui_thread_run () at ./src/ags/thread/ags_gui_thread.c:255 #33 0x00000034f701043f in _g_closure_invoke_va () from /lib64/libgobject-2.0.so.0 #34 0x00000034f7029749 in g_signal_emit_valist () from /lib64/libgobject-2.0.so.0 #35 0x00000034f702a3af in g_signal_emit () from /lib64/libgobject-2.0.so.0 #36 0x000000000049f586 in ags_thread_run (thread=0x7f0000) at ./src/ags/thread/ags_thread.c:1720 #37 0x000000000049f8ff in ags_thread_loop (ptr=<optimized out>) at ./src/ags/thread/ags_thread.c:1643 #38 0x00000034f3407f33 in start_thread () from /lib64/libpthread.so.0 #39 0x00000034f2cf4ead in clone () from /lib64/libc.so.6 Is this abort caused by XABORT? I don't know much about CPU instructions because I normally just do C and I expect that everything just works fine. By the way in what bugzilla should I file that bug. On valgrind.org or maybe on fedoraproject.org? I'm using fedora 20-beta. Here's the output of lscpu. Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 8 On-line CPU(s) list: 0-7 Thread(s) per core: 2 Core(s) per socket: 4 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 58 Model name: Intel(R) Core(TM) i7-3740QM CPU @ 2.70GHz Stepping: 9 CPU MHz: 2754.000 CPU max MHz: 3700.0000 CPU min MHz: 1200.0000 BogoMIPS: 5387.62 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 6144K NUMA node0 CPU(s): 0-7 > > Thanks, > > Mark regards Joël |
|
From: Tom H. <to...@co...> - 2013-11-26 08:29:25
|
On 26/11/13 08:25, Joël Krähemann wrote: > I'm using g_atomic_int_set, g_atomic_int_get, g_atomic_int_xor and > g_atomic_int_and to implement thread synchronization. Could theses > functions be the problem or aren't they working proper? No, it's not any of those. It's the low level locking in the C library. > Without running my application within valgrind I get the following out > of GDB. > > ***MEMORY-ERROR***: ags[14888]: GSlice: assertion failed: sinfo->n_allocated > 0 That's just a bug of some sort in your program - valgrind may well help you find it if it was working for yuo. > Is this abort caused by XABORT? I don't know much about CPU instructions > because I normally just do C and I expect that everything just works > fine. It's caused by the C library trying to execute an XABORT instruction. > By the way in what bugzilla should I file that bug. On valgrind.org or > maybe on fedoraproject.org? I'm using fedora 20-beta. Oh if it's Fedora then the C library must be doing feature detection to enable the TM support. It's definitely an upstream bug so, the valgrind tracker is fine. Mark is the Fedora maintainer for valgrind anyway so he's aware of this and I'm guessing will patch in XABORT support once we have it. Sounds like it should be simple. Tom -- Tom Hughes (to...@co...) http://compton.nu/ |
|
From: Mark W. <mj...@re...> - 2013-11-26 08:37:16
|
On Tue, 2013-11-26 at 09:25 +0100, Joël Krähemann wrote: > > Am Dienstag, den 26.11.2013, 00:18 +0100 schrieb Mark Wielaard: > > On Mon, Nov 25, 2013 at 10:40:40PM +0000, Tom Hughes wrote: > > > On 25/11/13 22:06, Joël Krähemann wrote: > > > > > > > Hi, I get this nice error message out of valgrind while starting or > > > > using my application and I don't know why. > > > > > > > > http://sf.net/p/ags > > > > > > > > Please help. > > > > > > Given the location of the instruction I'm guessing you have a Haswell > > > processor and you've built glibc with transactional memory support so > > > that it is trying to do lock ellision using the transactional memory > > > instructions. > I'm using g_atomic_int_set, g_atomic_int_get, g_atomic_int_xor and > g_atomic_int_and to implement thread synchronization. Could theses > functions be the problem or aren't they working proper? I don't think so. > > > The transactional memory instructions aren't fully supported in valgrind > > > yet, so you'll probably have to rebuild glibc without that support for now. > > > > This is somewhat unfortunate. We do have prelimenary support, but only > > implemented xstart and xtest to indicate the transaction always immediately > > fails. The instruction hit is xabort, which in the valgrind case would > > always just be a NOP since no transaction is (can be) active. > > > > If there isn't a bug report for XABORT not being implemented, please do > > file one. We really should add it. > Without running my application within valgrind I get the following out > of GDB. > > ***MEMORY-ERROR***: ags[14888]: GSlice: assertion failed: sinfo->n_allocated > 0 > > Program received signal SIGABRT, Aborted. > [Switching to Thread 0x7fffe37fe700 (LWP 14892)] > 0x00000034f2c35c59 in raise () from /lib64/libc.so.6 > (gdb) bt > #0 0x00000034f2c35c59 in raise () from /lib64/libc.so.6 > #1 0x00000034f2c37368 in abort () from /lib64/libc.so.6 > #2 0x00000034f5c1ab12 in mem_error () from /lib64/libglib-2.0.so.0 > #3 0x00000034f5c6453b in slab_allocator_free_chunk () from /lib64/libglib-2.0.so.0 > #4 0x00000034f5c647df in magazine_cache_push_magazine () from /lib64/libglib-2.0.so.0 > #5 0x00000034f5c1b0ab in thread_memory_magazine2_unload.isra.11 () from /lib64/libglib-2.0.so.0 > #6 0x00000034f5c656a8 in g_slice_free1 () from /lib64/libglib-2.0.so.0 > #7 0x0000003509632de8 in _gtk_tree_data_list_header_free () from /lib64/libgtk-x11-2.0.so.0 > #8 0x00000035094fdc7e in gtk_file_system_model_finalize () from /lib64/libgtk-x11-2.0.so.0 > #9 0x00000034f7014fcb in g_object_unref () from /lib64/libgobject-2.0.so.0 > #10 0x00000035094e4899 in recent_clear_model () from /lib64/libgtk-x11-2.0.so.0 > #11 0x00000035094eb386 in operation_mode_set () from /lib64/libgtk-x11-2.0.so.0 > #12 0x00000035094ec8b2 in shortcuts_activate_iter () from /lib64/libgtk-x11-2.0.so.0 > #13 0x00000035094ecb33 in shortcuts_selection_changed_cb () from /lib64/libgtk-x11-2.0.so.0 > #14 0x00000034f70104c7 in _g_closure_invoke_va () from /lib64/libgobject-2.0.so.0 > #15 0x00000034f7029749 in g_signal_emit_valist () from /lib64/libgobject-2.0.so.0 > #16 0x00000034f702a3af in g_signal_emit () from /lib64/libgobject-2.0.so.0 > #17 0x00000035096582a6 in gtk_tree_view_real_set_cursor () from /lib64/libgtk-x11-2.0.so.0 > #18 0x000000350965d175 in gtk_tree_view_button_press () from /lib64/libgtk-x11-2.0.so.0 > #19 0x00000035095485bc in _gtk_marshal_BOOLEAN__BOXED () from /lib64/libgtk-x11-2.0.so.0 > #20 0x00000034f7010298 in g_closure_invoke () from /lib64/libgobject-2.0.so.0 > #21 0x00000034f702211b in signal_emit_unlocked_R () from /lib64/libgobject-2.0.so.0 > #22 0x00000034f7029ddd in g_signal_emit_valist () from /lib64/libgobject-2.0.so.0 > #23 0x00000034f702a3af in g_signal_emit () from /lib64/libgobject-2.0.so.0 > #24 0x0000003509677994 in gtk_widget_event_internal () from /lib64/libgtk-x11-2.0.so.0 > #25 0x00000035095467e4 in gtk_propagate_event () from /lib64/libgtk-x11-2.0.so.0 > #26 0x0000003509546bdb in gtk_main_do_event () from /lib64/libgtk-x11-2.0.so.0 > #27 0x000000350846046c in gdk_event_dispatch () from /lib64/libgdk-x11-2.0.so.0 > #28 0x00000034f5c492a6 in g_main_context_dispatch () from /lib64/libglib-2.0.so.0 > #29 0x00000034f5c49628 in g_main_context_iterate.isra.24 () from /lib64/libglib-2.0.so.0 > #30 0x00000034f5c496dc in g_main_context_iteration () from /lib64/libglib-2.0.so.0 > #31 0x00000000004a1205 in ags_gui_thread_do_gtk_iteration () at ./src/ags/thread/ags_gui_thread.c:221 > #32 0x00000000004a14ba in ags_gui_thread_run () at ./src/ags/thread/ags_gui_thread.c:255 > #33 0x00000034f701043f in _g_closure_invoke_va () from /lib64/libgobject-2.0.so.0 > #34 0x00000034f7029749 in g_signal_emit_valist () from /lib64/libgobject-2.0.so.0 > #35 0x00000034f702a3af in g_signal_emit () from /lib64/libgobject-2.0.so.0 > #36 0x000000000049f586 in ags_thread_run (thread=0x7f0000) at ./src/ags/thread/ags_thread.c:1720 > #37 0x000000000049f8ff in ags_thread_loop (ptr=<optimized out>) at ./src/ags/thread/ags_thread.c:1643 > #38 0x00000034f3407f33 in start_thread () from /lib64/libpthread.so.0 > #39 0x00000034f2cf4ead in clone () from /lib64/libc.so.6 > > Is this abort caused by XABORT? I don't know much about CPU instructions > because I normally just do C and I expect that everything just works > fine. No, this is a different issue. Valgrind might be able to help pinpoint it, if not for the xabort bug it currently has. Basically the problem (on the valgrind side) is that there is one instruction (xabort) that valgrind doesn't know about. Valgrind emulates all instructions, so when it encounters one it doesn't know about it just cannot proceed. > By the way in what bugzilla should I file that bug. On valgrind.org or > maybe on fedoraproject.org? I'm using fedora 20-beta. bugs.kde.org would be best, under the valgrind project, component vex (that is the binary translation unit). I happen to be the valgrind maintainer for fedora, so would also see a bug report there, but I probably would just forward it upstream in this case anyway. But feel free whatever is easiest for you. If nobody else picks it up soon, I'll try and hack in xabort support myself and make sure to backport it to the fedora package. Thanks, Mark |
|
From: Mark W. <mj...@re...> - 2013-11-28 09:45:35
|
On Tue, 2013-11-26 at 09:36 +0100, Mark Wielaard wrote: > Basically the problem (on the valgrind side) is that there is one > instruction (xabort) that valgrind doesn't know about. Valgrind emulates > all instructions, so when it encounters one it doesn't know about it > just cannot proceed. > > > By the way in what bugzilla should I file that bug. On valgrind.org or > > maybe on fedoraproject.org? I'm using fedora 20-beta. > > bugs.kde.org would be best, under the valgrind project, component vex > (that is the binary translation unit). I happen to be the valgrind > maintainer for fedora, so would also see a bug report there, but I > probably would just forward it upstream in this case anyway. But feel > free whatever is easiest for you. If nobody else picks it up soon, I'll > try and hack in xabort support myself and make sure to backport it to > the fedora package. Thanks for filing the bug. https://bugs.kde.org/show_bug.cgi?id=328100 I have attached a proposed patch http://bugsfiles.kde.org/attachment.cgi?id=83780 Some quick local testing seems to indicate it works. But I don't currently have the hardware and new glibc setup to test it fully. I did create a scratch fedora package with the patch included. http://koji.fedoraproject.org/koji/taskinfo?taskID=6236008 (Note: It is a scratch package, so only available for a couple of days.) Would you be able to test that to see if it resolves your issue? Thanks, Mark |