|
From: Chris J. <jo...@he...> - 2014-08-05 09:35:07
|
Hi, I have an application that is causing valgrind to seg. fault, but runs just fine natively. The seg. fault is below. This is using the SVN trunk build of valgrind as of an hour or so ago (revision 14231). The primary message to me seems to be vex amd64->IR: unhandled instruction bytes: 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF vex amd64->IR: REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0 vex amd64->IR: VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=NONE vex amd64->IR: PFX.66=0 PFX.F2=0 PFX.F3=0 http://valgrind.org/docs/manual/faq.html Seems to suggest this could be one of two things; an error in the application (but then it runs fine outside valgrind) or an error in valgrind. Does anyone have any insights into which one this is here ? cheers Chris ==23684== Jump to the invalid address stated on the next line ==23684== at 0x2F769650: ??? ==23684== by 0xFFEFEFFEF: ??? ==23684== by 0x13E72601: ??? ==23684== by 0x1151A3FF: ??? ==23684== by 0x12E89588: clang::Decl::getASTContext() const (in /afs/cern.ch/sw/lcg/releases/ROOT/6.00.02-1c3e2/x86_64-slc6-gcc48-opt/lib/libCling.so) ==23684== by 0xFFEFF047F: ??? ==23684== by 0xFFEFF0197: ??? ==23684== by 0x212E92D91: ??? ==23684== Address 0x2f769650 is 0 bytes inside a block of size 32 alloc'd ==23684== at 0x4A07FD5: operator new(unsigned long) (vg_replace_malloc.c:326) ==23684== by 0x13E725DB: ??? ==23684== by 0x125E3AE5: TClingCallFunc::exec(void*, void*) const (in /afs/cern.ch/sw/lcg/releases/ROOT/6.00.02-1c3e2/x86_64-slc6-gcc48-opt/lib/libCling.so) ==23684== by 0x125E4E28: TClingCallFunc::exec_with_valref_return(void*, cling::Value*) const (in /afs/cern.ch/sw/lcg/releases/ROOT/6.00.02-1c3e2/x86_64-slc6-gcc48-opt/lib/libCling.so) ==23684== by 0x125E7852: long TClingCallFunc::ExecT<long>(void*) (in /afs/cern.ch/sw/lcg/releases/ROOT/6.00.02-1c3e2/x86_64-slc6-gcc48-opt/lib/libCling.so) ==23684== by 0xE98106F: PyROOT::TMethodHolder::CallSafe(void*, bool) (in /afs/cern.ch/sw/lcg/releases/ROOT/6.00.02-1c3e2/x86_64-slc6-gcc48-opt/lib/libPyROOT.so) ==23684== by 0xE97EFAC: PyROOT::TMethodHolder::Execute(void*, bool) (in /afs/cern.ch/sw/lcg/releases/ROOT/6.00.02-1c3e2/x86_64-slc6-gcc48-opt/lib/libPyROOT.so) ==23684== by 0xE99B1CF: PyROOT::TConstructorHolder::operator()(PyROOT::ObjectProxy*, _object*, _object*, long, bool) (in /afs/cern.ch/sw/lcg/releases/ROOT/6.00.02-1c3e2/x86_64-slc6-gcc48-opt/lib/libPyROOT.so) ==23684== by 0xE9738E9: PyROOT::(anonymous namespace)::mp_call(PyROOT::MethodProxy*, _object*, _object*) (in /afs/cern.ch/sw/lcg/releases/ROOT/6.00.02-1c3e2/x86_64-slc6-gcc48-opt/lib/libPyROOT.so) ==23684== by 0x4C637A2: PyObject_Call (abstract.c:2529) ==23684== by 0x4CCEC7E: slot_tp_init (typeobject.c:5692) ==23684== by 0x4CCD7FE: type_call (typeobject.c:745) ==23684== vex amd64->IR: unhandled instruction bytes: 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF vex amd64->IR: REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0 vex amd64->IR: VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=NONE vex amd64->IR: PFX.66=0 PFX.F2=0 PFX.F3=0 ==23684== Invalid read of size 1 ==23684== at 0x2F76965D: ??? ==23684== by 0xFFEFEFFEF: ??? ==23684== by 0x13E72601: ??? ==23684== by 0x1151A3FF: ??? ==23684== by 0x12E89588: clang::Decl::getASTContext() const (in /afs/cern.ch/sw/lcg/releases/ROOT/6.00.02-1c3e2/x86_64-slc6-gcc48-opt/lib/libCling.so) ==23684== by 0xFFEFF047F: ??? ==23684== by 0xFFEFF0197: ??? ==23684== by 0x212E92D91: ??? ==23684== Address 0x1151a4 is not stack'd, malloc'd or (recently) free'd ==23684== *** Break *** segmentation violation #0 vgModuleLocal_do_syscall_for_client_WRK () at m_syswrap/syscall-amd64-linux.S:147 #1 0x0000000000000008 in ?? () #2 0x0000000802c9ddd0 in ?? () #3 0x0000000802c9dd90 in ?? () #4 0x0000000039c2b950 in vgPlain_threads () #5 0x000000000000003d in ?? () #6 0x0000000039c2b940 in vgPlain_threads () #7 0x000000003a0154c8 in syscallInfo () #8 0x00000000000000b8 in ?? () #9 0x000000000000003d in ?? () #10 0x0000000000000001 in ?? () #11 0x000000003a015438 in syscallInfo () #12 0x00000000380a4acb in vgPlain_client_syscall () #13 0x00000000380a1483 in handle_syscall () #14 0x00000000380a2cb7 in vgPlain_scheduler () #15 0x00000000380b20ed in run_a_thread_NORETURN () at m_syswrap/syswrap-linux.c:103 #16 0x0000000000000000 in ?? () ==23684== ==23684== HEAP SUMMARY: ==23684== in use at exit: 400,342,711 bytes in 324,405 blocks ==23684== total heap usage: 1,706,725 allocs, 1,382,320 frees, 1,290,641,013 bytes allocated ==23684== ==23684== LEAK SUMMARY: ==23684== definitely lost: 16,272 bytes in 165 blocks ==23684== indirectly lost: 74,905 bytes in 446 blocks ==23684== possibly lost: 2,997,955 bytes in 40,606 blocks ==23684== still reachable: 381,943,798 bytes in 264,556 blocks ==23684== suppressed: 15,309,781 bytes in 18,632 blocks ==23684== Rerun with --leak-check=full to see details of leaked memory ==23684== ==23684== For counts of detected and suppressed errors, rerun with: -v ==23684== Use --track-origins=yes to see where uninitialised values come from ==23684== ERROR SUMMARY: 83 errors from 5 contexts (suppressed: 1440 from 192) p.s. libCling.so is an library based on the Clang 3.5 compiler that provides an interactive compiler environment. |
|
From: Tom H. <to...@co...> - 2014-08-05 10:52:39
|
On 05/08/14 10:08, Chris Jones wrote: > I have an application that is causing valgrind to seg. fault, but runs > just fine natively. The seg. fault is below. This is using the SVN trunk > build of valgrind as of an hour or so ago (revision 14231). > > The primary message to me seems to be > > vex amd64->IR: unhandled instruction bytes: 0xFF 0xFF 0xFF 0xFF 0xFF > 0xFF 0xFF 0xFF > vex amd64->IR: REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0 > vex amd64->IR: VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=NONE > vex amd64->IR: PFX.66=0 PFX.F2=0 PFX.F3=0 > > http://valgrind.org/docs/manual/faq.html > > Seems to suggest this could be one of two things; an error in the > application (but then it runs fine outside valgrind) or an error in > valgrind. Does anyone have any insights into which one this is here ? Firstly that's not a seg fault. It's a valgrind error telling you it has been asked to execute code that it can't make sense of. That's not really surprising given that I'm pretty sure that a string of 0xff bytes is not a valid x86 instruction. Fortunately valgrind has already told you exactly where the problem is: > ==23684== Jump to the invalid address stated on the next line > ==23684== at 0x2F769650: ??? > ==23684== by 0xFFEFEFFEF: ??? > ==23684== by 0x13E72601: ??? > ==23684== by 0x1151A3FF: ??? > ==23684== by 0x12E89588: clang::Decl::getASTContext() const (in > /afs/cern.ch/sw/lcg/releases/ROOT/6.00.02-1c3e2/x86_64-slc6-gcc48-opt/lib/libCling.so) So you have jumped to an invalid address at 0x2f769650... > ==23684== Address 0x2f769650 is 0 bytes inside a block of size 32 alloc'd > ==23684== at 0x4A07FD5: operator new(unsigned long) > (vg_replace_malloc.c:326) > ==23684== by 0x13E725DB: ??? > ==23684== by 0x125E3AE5: TClingCallFunc::exec(void*, void*) const (in > /afs/cern.ch/sw/lcg/releases/ROOT/6.00.02-1c3e2/x86_64-slc6-gcc48-opt/lib/libCling.so) ...and that address is in a block of dynamically allocated memory. Which means that you are presumably dealing with a program that generates code on the fly, and you will need to use --smc-check to tell valgrind where it should expect the find self modifying code like this. It defaults to only checking for it on the stack, so you will probably need to change it by using --smc-check=all-non-file so that it will check the heap as well. That will slow things down a bit but it should hopefully solve your problem. Tom -- Tom Hughes (to...@co...) http://compton.nu/ |
|
From: Chris J. <jo...@he...> - 2014-08-05 11:10:14
|
Hi, On 05/08/14 11:52, Tom Hughes wrote: > On 05/08/14 10:08, Chris Jones wrote: > >> I have an application that is causing valgrind to seg. fault, but runs >> just fine natively. The seg. fault is below. This is using the SVN trunk >> build of valgrind as of an hour or so ago (revision 14231). >> >> The primary message to me seems to be >> >> vex amd64->IR: unhandled instruction bytes: 0xFF 0xFF 0xFF 0xFF 0xFF >> 0xFF 0xFF 0xFF >> vex amd64->IR: REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0 >> vex amd64->IR: VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=NONE >> vex amd64->IR: PFX.66=0 PFX.F2=0 PFX.F3=0 >> >> http://valgrind.org/docs/manual/faq.html >> >> Seems to suggest this could be one of two things; an error in the >> application (but then it runs fine outside valgrind) or an error in >> valgrind. Does anyone have any insights into which one this is here ? > > Firstly that's not a seg fault. yes, my mistake. The seg. fault I was referring to was from a test with valgrind 3.8.1. Moving to 3.9.x removed this, but I erroneously still mentioned it above. It's a valgrind error telling you it has > been asked to execute code that it can't make sense of. Yep, figured that much. > > That's not really surprising given that I'm pretty sure that a string of > 0xff bytes is not a valid x86 instruction. Struck me as odd as well. > > Fortunately valgrind has already told you exactly where the problem is: > >> ==23684== Jump to the invalid address stated on the next line >> ==23684== at 0x2F769650: ??? >> ==23684== by 0xFFEFEFFEF: ??? >> ==23684== by 0x13E72601: ??? >> ==23684== by 0x1151A3FF: ??? >> ==23684== by 0x12E89588: clang::Decl::getASTContext() const (in >> /afs/cern.ch/sw/lcg/releases/ROOT/6.00.02-1c3e2/x86_64-slc6-gcc48-opt/lib/libCling.so) > > So you have jumped to an invalid address at 0x2f769650... > >> ==23684== Address 0x2f769650 is 0 bytes inside a block of size 32 alloc'd >> ==23684== at 0x4A07FD5: operator new(unsigned long) >> (vg_replace_malloc.c:326) >> ==23684== by 0x13E725DB: ??? >> ==23684== by 0x125E3AE5: TClingCallFunc::exec(void*, void*) const (in >> /afs/cern.ch/sw/lcg/releases/ROOT/6.00.02-1c3e2/x86_64-slc6-gcc48-opt/lib/libCling.so) > > ...and that address is in a block of dynamically allocated memory. Yeah, I guess that was clear, I just didn't focus on them enough. > Which means that you are presumably dealing with a program that > generates code on the fly, and you will need to use --smc-check to tell > valgrind where it should expect the find self modifying code like this. > > It defaults to only checking for it on the stack, so you will probably > need to change it by using --smc-check=all-non-file so that it will > check the heap as well. That will slow things down a bit but it should > hopefully solve your problem. Yes, this sounds very likely. Cling as I said is an interactive compiler environment and as such definitely will be doing this. We will try the option you suggest. thanks for the help. Chris > > Tom > |