|
From: Ed S. <ES...@fe...> - 2010-06-07 13:27:04
|
Valgrind kills my process because of an unrecognized instruction. I suspect the commercial Pegasus image decoder I am using contains opcodes that may be valid for other architectures such as AMD but not the Intel architecture I am currently running on. Or perhaps they have some encoded opcodes for IP protection issues that they somehow extract at run-time. I really do not know. Question: Is there a way to Valgrind to stay out of a shared library? ==1460== valgrind: Unrecognised instruction at address 0x9ceb615. ==1460== Your program just tried to execute an instruction that Valgrind ==1460== did not recognise. There are two possible reasons for this. ==1460== 1. Your program has a bug and erroneously jumped to a non-code ==1460== location. If you are running Memcheck and you just saw a ==1460== warning about a bad jump, it's probably your program's fault. ==1460== 2. The instruction is legitimate but Valgrind doesn't handle it, ==1460== i.e. it's Valgrind's fault. If you think this is the case or ==1460== you are not sure, please let us know and we'll try to fix it. ==1460== Either way, Valgrind will now raise a SIGILL signal which will ==1460== probably kill your program. ==1460== Process terminating with default action of signal 4 (SIGILL) ==1460== Illegal opcode at address 0x9CEB615 ==1460== at 0x9CEB615: ??? ==1460== by 0x408755A: picosCallPegasusProc (in /usr/lib/fesvideo/libpicl20.so) ==1460== by 0x408185C: ??? (in /usr/lib/fesvideo/libpicl20.so) ==1460== by 0x4081A56: ??? (in /usr/lib/fesvideo/libpicl20.so) ==1460== by 0x4086AF1: threadfn (in /usr/lib/fesvideo/libpicl20.so) ==1460== by 0x7CF45A: start_thread (in /lib/libpthread-2.5.so) ==1460== by 0x726C4D: clone (in /lib/libc-2.5.so) I am trying to debug a memory leak under Red Hat Enterprise Linux 5.2 32-bit in an application that uses a 3rd part commercial image decoder made by Pegasus. Thanks in advance for any tips or direction, -Ed |
|
From: John R. <jr...@bi...> - 2010-06-07 14:44:59
|
> Question: Is there a way to Valgrind to stay out of a shared library? No. Any instruction which accesses data memory must be examined. > ==1460== valgrind: Unrecognised instruction at address 0x9ceb615. http://valgrind.org/docs/manual/faq.html#faq.msgdeath It will help a lot if you specify the byte values (say, 8 of them) in the instruction stream at address 0x9ceb615. > ==1460== by 0x726C4D: clone (in /lib/libc-2.5.so) > Red Hat Enterprise Linux 5.2 32-bit Remember to specify the version of valgrind when you file the bug report. The software that you did specify is somewhat old. The current version of valgrind is 3.5.0. -- |
|
From: Julian S. <js...@ac...> - 2010-06-07 15:14:25
|
> It will help a lot if you specify the byte values (say, 8 of them) > in the instruction stream at address 0x9ceb615. Yes. You didn't show the actual bytes it is complaining about, which are present in the failure message. Without those it is impossible to say anything. > > ==1460== by 0x726C4D: clone (in /lib/libc-2.5.so) > > Red Hat Enterprise Linux 5.2 32-bit > > Remember to specify the version of valgrind when you file the bug report. > The software that you did specify is somewhat old. The current version > of valgrind is 3.5.0. Try also with --smc-check=all, just in the (unlikely but possible case) that the library is generating code on the fly. J |
|
From: Ed S. <ES...@fe...> - 2010-06-07 15:36:00
|
Thank you both for your assistance. > >> It will help a lot if you specify the byte values (say, 8 of them) >> in the instruction stream at address 0x9ceb615. > > Yes. You didn't show the actual bytes it is complaining about, which > are present in the failure message. Without those it is impossible to > say anything. How would I do this? I am new to Linux. The address 0x9ceb615 apparently is the address that the shared library is loaded. How would I get the opcode bytes? > >>> ==1460== by 0x726C4D: clone (in /lib/libc-2.5.so) >>> Red Hat Enterprise Linux 5.2 32-bit >> >> Remember to specify the version of valgrind when you file the bug report. >> The software that you did specify is somewhat old. The current version >> of valgrind is 3.5.0. I am using 3.5.0. > > Try also with --smc-check=all, just in the (unlikely but possible case) > that the library is generating code on the fly. I am trying this. I am still reading through the log file trying to see if it changed anything. Thanks for your help, -Ed |
|
From: Julian S. <js...@ac...> - 2010-06-07 16:20:03
|
> >> It will help a lot if you specify the byte values (say, 8 of them) > >> in the instruction stream at address 0x9ceb615. > > > > Yes. You didn't show the actual bytes it is complaining about, which > > are present in the failure message. Without those it is impossible to > > say anything. > > How would I do this? Look for a line in the crash report, like this (approximately) vex x86->IR: unhandled bytes: ... or vex amd64->IR: unhandled bytes: ... > I am trying this. I am still reading through the log file trying to see if > it changed anything. The only important change in this case is whether it still crashes or not. J |
|
From: John R. <jr...@bi...> - 2010-06-07 16:22:58
|
>>> It will help a lot if you specify the byte values (say, 8 of them) >>> in the instruction stream at address 0x9ceb615. > How would I do this? $ gdb valgrind (gdb) run arguments-to-valgrind my-app arguments-to-my-app [snip] ==1460== valgrind: Unrecognised instruction at address 0x9ceb615. [snip] ==1460== Process terminating with default action of signal 4 (SIGILL) ==1460== Illegal opcode at address 0x9CEB615 ==1460== at 0x9CEB615: ??? [snip] (gdb) x/4x 0x9ceb615 -- |
|
From: Ed S. <ES...@fe...> - 2010-06-07 17:04:12
|
It will help a lot if you specify the byte values (say, 8 of them) in the instruction stream at address 0x9ceb615. How would I do this? $ gdb valgrind (gdb) run arguments-to-valgrind my-app arguments-to-my-app [snip] ==1460== valgrind: Unrecognised instruction at address 0x9ceb615. [snip] ==1460== Process terminating with default action of signal 4 (SIGILL) ==1460== Illegal opcode at address 0x9CEB615 ==1460== at 0x9CEB615: ??? [snip] (gdb) x/4x 0x9ceb615 Thanks. I had trouble with the above. Hopefully the below is the equivalent using valgringd 3.5.0 : 1. ulimit -c unlimited ( I did not have core dump enabled ) 2. valgrind --leak-check=yes ./debugmemoryleak 3. gdb ./debugmemoryleak vgcore.6491 4. (gdb) x/8x 0x9ceb615 5. 0x9ceb615: 0xec8b55cb 0x8b515756 0x15e3104d 0x8b0c758b 6. 0x9ceb625: 0xc18b087d 0xfc02e9c1 0xc88ba5f3 0xf303e183 I suspect the decoder may contain AMD specific opcodes as well. What tool is used to determine if these are valid opcodes, perhaps not for Intel 32-bit but perhaps valid 64-bit or AMD 32/64? Or maybe just some random data that a bad jump arrived at? The app runs without core dumping when run without Valgrind. Memory usage is steady. The problem I see is that reported memory usage increase each time I destroy and create new objects which is why started to investigate with Valgrind. >The only important change in this case is whether it still crashes or not. Ok. I saw no change it stills raises signal when it encounters unrecognized opcodes so I guess the decoder is not generating code dynamically. -Ed -- ------------------------------------------------------------------------------ ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental unit. See the prize list and enter to win: http://p.sf.net/sfu/thinkgeek-promo _______________________________________________ Valgrind-users mailing list Val...@li... https://lists.sourceforge.net/lists/listinfo/valgrind-users -Ed |
|
From: John R. <jr...@bi...> - 2010-06-07 17:26:07
|
> 1. ulimit -c unlimited ( I did not have core dump enabled ) > 2. valgrind --leak-check=yes ./debugmemoryleak > 3. gdb ./debugmemoryleak vgcore.6491 > 4. (gdb) x/8x 0x9ceb615 > 5. 0x9ceb615: 0xec8b55cb Assuming that we are looking in the same place as the CPU was: Opcode 0xcb (the low-order byte in 0xec8b55cb) is 'lret' (RET far) which expects a 48-bit segmented return address (which was pushed onto the stack as two 32-bit words). Valgrind assumes that code is written for a flat, non-segmented, addressing model; all return addresses are 32-bits only [on a 32-bit machine.] So that library contains (or is generating) code that valgrind won't understand. Ask the purveyor of the library about this. If you are truly desperate, then get a wizard to patch the code so that '0xcb' becomes '0xc2 0x04 0x00' (RET near $4). This will discard the segment register information, and might work. -- |
|
From: Ed S. <ES...@fe...> - 2010-06-07 18:22:51
|
On Jun 7, 2010, at 12:25 PM, John Reiser wrote: >> 1. ulimit -c unlimited ( I did not have core dump enabled ) >> 2. valgrind --leak-check=yes ./debugmemoryleak >> 3. gdb ./debugmemoryleak vgcore.6491 >> 4. (gdb) x/8x 0x9ceb615 >> 5. 0x9ceb615: 0xec8b55cb > > Assuming that we are looking in the same place as the CPU was: > Opcode 0xcb (the low-order byte in 0xec8b55cb) is 'lret' (RET far) > which expects a 48-bit segmented return address (which was pushed > onto the stack as two 32-bit words). Valgrind assumes that > code is written for a flat, non-segmented, addressing model; > all return addresses are 32-bits only [on a 32-bit machine.] > > So that library contains (or is generating) code that valgrind > won't understand. Ask the purveyor of the library about this. So as I understand it the 48-bit segmented addressing opcode is valid but Valgrind 3.5.0 does not recognize it and raises an exception. I assume it is not worth submitting a bug report, especially for my own personal benefit, as it may never be supported. > > If you are truly desperate, then get a wizard to patch the code > so that '0xcb' becomes '0xc2 0x04 0x00' (RET near $4). This > will discard the segment register information, and might work. Good suggestion and might be worth it if it helps me uncover any real memory leaks. Thank you for your and everyone's help, -Ed |