|
From: Maynard J. <may...@us...> - 2010-12-10 17:50:03
|
I've found that Valgrind may segfault in coregrind/m_machine.c:VG_(machine_get_hwcaps) under certain conditions: - Running on a processor where one of the capability checks will fail (e.g., IBM POWER5, checking for Altivec) - Valgrind was built with gcc 4.4.4 or later The generated code is apparently incorrect, but we got no joy when our resident gcc person reported this to other gcc community folk. We were told that these functions are for internal gcc use only -- and also that doing a longjmp out of a signal handler is "undefined" by the POSIX standard. So far, I've only seen problems occur in cases where the longjmp is performed out of a signal handler. I was able to come up with a patch to eliminate the use of these functions where it was causing problems on my older POWER systems, and I will open a bugzilla and attach that patch. But there are some other places where these functions are used (listed below). Someone familiar with ARM should probably take a look at the issue for that architecture, since those cases also use signal handlers like the problematic ppc64 cases I am fixing. But there are some architecture-independent uses that need to be investigated, as well. My sense is that the cases that do NOT involve signal handlers are OK for now -- based on my testing using gcc 4.4.4. Examples of __builtin_set[long]jmp with signal handlers ---------------------------------------------------- - coregrind/m_machine.c: ppc64 and arm - exp-ptrcheck/tests (several users of '#define TTT' which makes use of __builtin_setjmp) - memcheck/tests (badjmup2.c) - memcheck/mc_leakcheck.c Examples of __builtin_set[long]jmp without signal handlers ---------------------------------------------------- - coregrind/m_debuginfo/readdwarf.c - coregrind/m_scheduler/scheduler.c Regards, -Maynard |
|
From: Tom H. <to...@co...> - 2010-12-10 18:09:24
|
On 10/12/10 17:50, Maynard Johnson wrote: > The generated code is apparently incorrect, but we got no joy when our resident gcc person reported this to other gcc community folk. We were told that these functions are for internal gcc use only -- and also that doing a longjmp out of a signal handler is "undefined" by the POSIX standard. So far, I've only seen problems occur in cases where the longjmp is performed out of a signal handler. My reading of POSIX is that it is well defined to longjmp from a signal handler so long as it is not a nested signal handler. See here: http://www.opengroup.org/onlinepubs/009695399/functions/longjmp.html Specifically the paragraph which says: "As it bypasses the usual function call and return mechanisms, longjmp() shall execute correctly in contexts of interrupts, signals, and any of their associated functions. However, if longjmp() is invoked from a nested signal handler (that is, from a function invoked as a result of a signal raised during the handling of another signal), the behavior is undefined. Tom -- Tom Hughes (to...@co...) http://compton.nu/ |
|
From: Dave G. <go...@mc...> - 2010-12-10 19:57:19
|
On Dec 10, 2010, at 12:09 PM CST, Tom Hughes wrote: > On 10/12/10 17:50, Maynard Johnson wrote: > >> The generated code is apparently incorrect, but we got no joy when our resident gcc person reported this to other gcc community folk. We were told that these functions are for internal gcc use only -- and also that doing a longjmp out of a signal handler is "undefined" by the POSIX standard. So far, I've only seen problems occur in cases where the longjmp is performed out of a signal handler. > > My reading of POSIX is that it is well defined to longjmp from a signal > handler so long as it is not a nested signal handler. See here: > > http://www.opengroup.org/onlinepubs/009695399/functions/longjmp.html Tom, I interpret that passage as you do, but Valgrind isn't calling those functions. It's calling the GCC builtins, presumably to avoid calling normal libc routines. Based on what Maynard said, I'm guessing that the gcc versions aren't playing nicely with whatever glibc normally does for signal handling. Maynard, what does your fix look like? -Dave |
|
From: Maynard J. <may...@us...> - 2010-12-15 17:24:07
|
Dave Goodell wrote: > On Dec 10, 2010, at 12:09 PM CST, Tom Hughes wrote: > >> On 10/12/10 17:50, Maynard Johnson wrote: >> >>> The generated code is apparently incorrect, but we got no joy when our resident gcc person reported this to other gcc community folk. We were told that these functions are for internal gcc use only -- and also that doing a longjmp out of a signal handler is "undefined" by the POSIX standard. So far, I've only seen problems occur in cases where the longjmp is performed out of a signal handler. >> >> My reading of POSIX is that it is well defined to longjmp from a signal >> handler so long as it is not a nested signal handler. See here: >> >> http://www.opengroup.org/onlinepubs/009695399/functions/longjmp.html > > Tom, I interpret that passage as you do, but Valgrind isn't calling those functions. It's calling the GCC builtins, presumably to avoid calling normal libc routines. Based on what Maynard said, I'm guessing that the gcc versions aren't playing nicely with whatever glibc normally does for signal handling. The gcc guy I talked to pointed me at http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_152.html, which is a bug report about unclear/conflicting documentation involving [set|long]jmp and signal. The end result of that bug is this statement: "The C Standard is clear enough as is. The longjmp function shall execute correctly when called from a non-nested signal handler invoked through calls to the raise or abort functions; if longjmp is called from a signal handler invoked by other means, or from a nested signal handler, the behavior is undefined". My gcc contacted referred me to http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf, section "7.14.1.1, The signal function", point #5 as the part of the C standard that is "clear enough". When I asked if this limitation for longjmp also applies to __builtin_longjmp, I was basically told "what part of internal gcc use do you not understand?". > > Maynard, what does your fix look like? I created a bug ticket for this problem -- https://bugs.kde.org/show_bug.cgi?id=259977 -- and have a patch attached there. -Maynard > > -Dave > |
|
From: Tom H. <to...@co...> - 2010-12-15 17:33:02
|
On 15/12/10 17:23, Maynard Johnson wrote: > The gcc guy I talked to pointed me at http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_152.html, which is a bug report about unclear/conflicting documentation involving [set|long]jmp and signal. The end result of that bug is this statement: > > "The C Standard is clear enough as is. The longjmp function shall execute correctly when called from a non-nested signal handler invoked through calls to the raise or abort functions; if longjmp is called from a signal handler invoked by other means, or from a nested signal handler, the behavior is undefined". POSIX can, and does, go beyond what the C standard requires/allows though, so the C standard is not necessarily the end of it. Indeed the "CX" marker beside the paragraph I quoted means precisely that the requirement in question is an extension to the ISO C standard. Jumping out of signal handlers with longjmp is a very common thing to do because it is one of the few things that it is generally considered safe to do in signal handlers and my understanding had always been that even if the C standard didn't allow it, POSIX did. Tom -- Tom Hughes (to...@co...) http://compton.nu/ |
|
From: Julian S. <js...@ac...> - 2010-12-11 10:15:15
|
How do you know the generated code is incorrect? Do you have more info on this? Does the problem happen for ppc32, or does it only appear in 64-bit mode? J On Friday, December 10, 2010, Maynard Johnson wrote: > I've found that Valgrind may segfault in > coregrind/m_machine.c:VG_(machine_get_hwcaps) under certain conditions: - > Running on a processor where one of the capability checks will fail (e.g., > IBM POWER5, checking for Altivec) - Valgrind was built with gcc 4.4.4 or > later > > The generated code is apparently incorrect, but we got no joy when our > resident gcc person reported this to other gcc community folk. We were > told that these functions are for internal gcc use only -- and also that > doing a longjmp out of a signal handler is "undefined" by the POSIX > standard. So far, I've only seen problems occur in cases where the > longjmp is performed out of a signal handler. > > I was able to come up with a patch to eliminate the use of these functions > where it was causing problems on my older POWER systems, and I will open a > bugzilla and attach that patch. But there are some other places where > these functions are used (listed below). Someone familiar with ARM should > probably take a look at the issue for that architecture, since those cases > also use signal handlers like the problematic ppc64 cases I am fixing. > But there are some architecture-independent uses that need to be > investigated, as well. My sense is that the cases that do NOT involve > signal handlers are OK for now -- based on my testing using gcc 4.4.4. > > Examples of __builtin_set[long]jmp with signal handlers > ---------------------------------------------------- > - coregrind/m_machine.c: ppc64 and arm > - exp-ptrcheck/tests (several users of '#define TTT' which makes use of > __builtin_setjmp) > - memcheck/tests (badjmup2.c) > - memcheck/mc_leakcheck.c > > Examples of __builtin_set[long]jmp without signal handlers > ---------------------------------------------------- > - coregrind/m_debuginfo/readdwarf.c > - coregrind/m_scheduler/scheduler.c > > Regards, > -Maynard > > > --------------------------------------------------------------------------- > --- Oracle to DB2 Conversion Guide: Learn learn about native support for > PL/SQL, new data types, scalar functions, improved concurrency, built-in > packages, OCI, SQL*Plus, data movement tools, best practices and more. > http://p.sf.net/sfu/oracle-sfdev2dev > _______________________________________________ > Valgrind-developers mailing list > Val...@li... > https://lists.sourceforge.net/lists/listinfo/valgrind-developers |
|
From: Maynard J. <may...@us...> - 2010-12-14 00:42:57
Attachments:
valg-BAD.asm
valg_GOOD.asm
|
Julian Seward wrote: > > How do you know the generated code is incorrect? Do you have more > info on this? I've attached two files showing the assembly code for machine_get_hwcaps for the good case (valgrind built with gcc 4.1.2) and the bad case (valgrind built with gcc 4.5.2). I ran this code on a POWER5, so it should fail the VMX check (using 'vor' instruction), where the __builtin_setjmp is at line 789 of coregrind/m_machine.c. You'll notice that in the good case, code is generated to branch to line 790 on failure or to line 797 on success. The generated code in the bad case does nothing like it's supposed to. > > Does the problem happen for ppc32, or does it only appear in > 64-bit mode? Both modes. -Maynard > > J > > On Friday, December 10, 2010, Maynard Johnson wrote: >> I've found that Valgrind may segfault in >> coregrind/m_machine.c:VG_(machine_get_hwcaps) under certain conditions: - >> Running on a processor where one of the capability checks will fail (e.g., >> IBM POWER5, checking for Altivec) - Valgrind was built with gcc 4.4.4 or >> later >> >> The generated code is apparently incorrect, but we got no joy when our >> resident gcc person reported this to other gcc community folk. We were >> told that these functions are for internal gcc use only -- and also that >> doing a longjmp out of a signal handler is "undefined" by the POSIX >> standard. So far, I've only seen problems occur in cases where the >> longjmp is performed out of a signal handler. >> >> I was able to come up with a patch to eliminate the use of these functions >> where it was causing problems on my older POWER systems, and I will open a >> bugzilla and attach that patch. But there are some other places where >> these functions are used (listed below). Someone familiar with ARM should >> probably take a look at the issue for that architecture, since those cases >> also use signal handlers like the problematic ppc64 cases I am fixing. >> But there are some architecture-independent uses that need to be >> investigated, as well. My sense is that the cases that do NOT involve >> signal handlers are OK for now -- based on my testing using gcc 4.4.4. >> >> Examples of __builtin_set[long]jmp with signal handlers >> ---------------------------------------------------- >> - coregrind/m_machine.c: ppc64 and arm >> - exp-ptrcheck/tests (several users of '#define TTT' which makes use of >> __builtin_setjmp) >> - memcheck/tests (badjmup2.c) >> - memcheck/mc_leakcheck.c >> >> Examples of __builtin_set[long]jmp without signal handlers >> ---------------------------------------------------- >> - coregrind/m_debuginfo/readdwarf.c >> - coregrind/m_scheduler/scheduler.c >> >> Regards, >> -Maynard >> >> >> --------------------------------------------------------------------------- >> --- Oracle to DB2 Conversion Guide: Learn learn about native support for >> PL/SQL, new data types, scalar functions, improved concurrency, built-in >> packages, OCI, SQL*Plus, data movement tools, best practices and more. >> http://p.sf.net/sfu/oracle-sfdev2dev >> _______________________________________________ >> Valgrind-developers mailing list >> Val...@li... >> https://lists.sourceforge.net/lists/listinfo/valgrind-developers > |