|
From: Alexander P. <gl...@go...> - 2012-01-25 12:42:25
|
Hi all, We're facing some crashes of the Chromium tests under Memcheck on Mac OS, see the log example below. Those are generally flaky (e.g. they happen only when a test is ran in the cloud, and it's hard to reproduce the crash in an interactive shell even on the same machine), but they've been occurring for more than a year (http://code.google.com/p/chromium/issues/detail?id=51716 contains some witnesses, but they're of little interest) for wide ranges of Chromium and Valgrind versions. The Thread 1 stack, together with some other observations, makes me think this is a heap corruption. My question is whether it's possible to detect any wild writes to Memcheck's heap. I think the best approach would be to mark all the core data structures unaddressable, but it can be hard because VG_(arena_malloc) knows nothing about Memcheck. ========================================== --42768-- src/xcodebuild/Release/unit_tests: --42768-- dSYM directory is missing; consider using --dsymutil=yes --42768-- VALGRIND INTERNAL ERROR: Valgrind received a signal 11 (SIGSEGV) - exiting --42768-- si_code=1; Faulting address: 0x726F7065; sp: 0xf3cbcb24 valgrind: the 'impossible' happened: Killed by fatal signal ==42768== at 0x3802BB2A: ??? ==42768== by 0x3802CCA7: ??? ==42768== by 0x6F666561: ??? sched status: running_tid=1 Thread 1: status = VgTs_Runnable ==42768== at 0xF5698E2: _Znam (vg_replace_malloc.c:360) ==42768== by 0x46C5: _ZN7testing8internal6String16ConstructNonNullEPKcm (in src/xcodebuild/Release/unit_tests) ==42768== by 0x12938: _ZN7testing8internal6StringC2EPKc (in src/xcodebuild/Release/unit_tests) ==42768== by 0x12539: _ZN7testing8internal6StringaSEPKc (in src/xcodebuild/Release/unit_tests) ==42768== by 0x1A80A37: _ZN7testing8internal15UnitTestOptions17FilterMatchesTestERKNS0_6StringES4_ (in src/xcodebuild/Release/unit_tests) ==42768== by 0x1A87C14: _ZN7testing8internal12UnitTestImpl11FilterTestsENS1_18ReactionToShardingE (in src/xcodebuild/Release/unit_tests) ==42768== by 0x1A873EC: _ZN7testing8internal12UnitTestImpl11RunAllTestsEv (in src/xcodebuild/Release/unit_tests) ==42768== by 0x1A8CD7E: _ZN7testing8internal38HandleSehExceptionsInMethodIfSupportedINS0_12UnitTestImplEbEET0_PT_MS4_FS3_vEPKc (in src/xcodebuild/Release/unit_tests) ==42768== by 0x1A89F4A: _ZN7testing8internal35HandleExceptionsInMethodIfSupportedINS0_12UnitTestImplEbEET0_PT_MS4_FS3_vEPKc (in src/xcodebuild/Release/unit_tests) ==42768== by 0x1A87333: _ZN7testing8UnitTest3RunEv (in src/xcodebuild/Release/unit_tests) ==42768== by 0x3C692D9: _ZN4base9TestSuite3RunEv (in src/xcodebuild/Release/unit_tests) ==42768== by 0x3CA6448: _ZN17UnitTestTestSuite3RunEv (in src/xcodebuild/Release/unit_tests) ==42768== by 0x1ADC5C2: main (in src/xcodebuild/Release/unit_tests) Thread 2: status = VgTs_WaitSys ==42768== at 0xF57D1A2: semaphore_wait_trap (in /usr/lib/libSystem.B.dylib) ==42768== by 0x22E849F: _ZN2v88internal15RuntimeProfiler27WaitForSomeIsolateToEnterJSEv (in src/xcodebuild/Release/unit_tests) ==42768== by 0x22E8567: _ZN2v88internal26RuntimeProfilerRateLimiter18SuspendIfNecessaryEv (in src/xcodebuild/Release/unit_tests) ==42768== by 0x23C6CCA: _ZN2v88internal13SamplerThread3RunEv (in src/xcodebuild/Release/unit_tests) ==42768== by 0x23C652E: _ZN2v88internalL11ThreadEntryEPv (in src/xcodebuild/Release/unit_tests) ==42768== by 0xF5AE054: _pthread_start (in /usr/lib/libSystem.B.dylib) ==42768== by 0xF5ADF11: thread_start (in /usr/lib/libSystem.B.dylib) ========================================== Thanks in advance, Alexander Potapenko Software Engineer Google Moscow |
|
From: Julian S. <js...@ac...> - 2012-01-26 09:10:07
|
There's a couple of things you can try. First you can try to see
if your app is doing some kind of out of range memory access that
Memcheck can't detect.
* increase the free block queue size as much as you can, with
the --freelist-vol= and --freelist-big-blocks= flags. This
increases Memcheck's ability to detect access-after-free
errors.
* increase the default redzone size for client (application) heap
blocks by changing MC_MALLOC_REDZONE_SZB in memcheck/mc_include.h
to (eg) 128. This will massively increase Memcheck's memory
consumption, but it will also make it possible to detect overruns
of up to 128 bytes.
Secondly you can try to see if there is some heap corruption for
non-application blocks.
* Do the same but for the other arenas, which hold non-application
blocks. Do this by changing the value 4 (3rd param) in the
7 back-to-back calls to arena_init in m_mallocfree.c. These
red zones are checked at block free time.
I just tested this using the patch shown below. Unfortunately it
detects underruns of the test block, but not overruns for some
reason. Maybe you can figure out why.
J
Index: coregrind/m_main.c
===================================================================
--- coregrind/m_main.c (revision 12354)
+++ coregrind/m_main.c (working copy)
@@ -1547,6 +1547,7 @@
//--------------------------------------------------------------
VG_(debugLog)(1, "main", "Starting the dynamic memory manager\n");
{ void* p = VG_(malloc)( "main.vm.1", 12345 );
+ ((UChar*)p)[-1] = 42;
if (p) VG_(free)( p );
}
VG_(debugLog)(1, "main", "Dynamic memory manager is running\n");
On Wednesday, January 25, 2012, Alexander Potapenko wrote:
> Hi all,
>
> We're facing some crashes of the Chromium tests under Memcheck on Mac
> OS, see the log example below.
> Those are generally flaky (e.g. they happen only when a test is ran in
> the cloud, and it's hard to reproduce the crash in an interactive
> shell even on the same machine),
> but they've been occurring for more than a year
> (http://code.google.com/p/chromium/issues/detail?id=51716 contains
> some witnesses, but they're of little interest) for wide ranges of
> Chromium and Valgrind versions.
>
> The Thread 1 stack, together with some other observations, makes me
> think this is a heap corruption.
> My question is whether it's possible to detect any wild writes to
> Memcheck's heap.
> I think the best approach would be to mark all the core data
> structures unaddressable, but it can be hard because VG_(arena_malloc)
> knows nothing about Memcheck.
>
> ==========================================
> --42768-- src/xcodebuild/Release/unit_tests:
> --42768-- dSYM directory is missing; consider using --dsymutil=yes
> --42768-- VALGRIND INTERNAL ERROR: Valgrind received a signal 11
> (SIGSEGV) - exiting
> --42768-- si_code=1; Faulting address: 0x726F7065; sp: 0xf3cbcb24
>
> valgrind: the 'impossible' happened:
> Killed by fatal signal
> ==42768== at 0x3802BB2A: ???
> ==42768== by 0x3802CCA7: ???
> ==42768== by 0x6F666561: ???
>
> sched status:
> running_tid=1
>
> Thread 1: status = VgTs_Runnable
> ==42768== at 0xF5698E2: _Znam (vg_replace_malloc.c:360)
> ==42768== by 0x46C5:
> _ZN7testing8internal6String16ConstructNonNullEPKcm (in
> src/xcodebuild/Release/unit_tests)
> ==42768== by 0x12938: _ZN7testing8internal6StringC2EPKc (in
> src/xcodebuild/Release/unit_tests)
> ==42768== by 0x12539: _ZN7testing8internal6StringaSEPKc (in
> src/xcodebuild/Release/unit_tests)
> ==42768== by 0x1A80A37:
> _ZN7testing8internal15UnitTestOptions17FilterMatchesTestERKNS0_6StringES4_
> (in src/xcodebuild/Release/unit_tests)
> ==42768== by 0x1A87C14:
> _ZN7testing8internal12UnitTestImpl11FilterTestsENS1_18ReactionToShardingE
> (in src/xcodebuild/Release/unit_tests)
> ==42768== by 0x1A873EC:
> _ZN7testing8internal12UnitTestImpl11RunAllTestsEv (in
> src/xcodebuild/Release/unit_tests)
> ==42768== by 0x1A8CD7E:
> _ZN7testing8internal38HandleSehExceptionsInMethodIfSupportedINS0_12UnitTest
> ImplEbEET0_PT_MS4_FS3_vEPKc (in src/xcodebuild/Release/unit_tests)
> ==42768== by 0x1A89F4A:
> _ZN7testing8internal35HandleExceptionsInMethodIfSupportedINS0_12UnitTestImp
> lEbEET0_PT_MS4_FS3_vEPKc (in src/xcodebuild/Release/unit_tests)
> ==42768== by 0x1A87333: _ZN7testing8UnitTest3RunEv (in
> src/xcodebuild/Release/unit_tests)
> ==42768== by 0x3C692D9: _ZN4base9TestSuite3RunEv (in
> src/xcodebuild/Release/unit_tests)
> ==42768== by 0x3CA6448: _ZN17UnitTestTestSuite3RunEv (in
> src/xcodebuild/Release/unit_tests)
> ==42768== by 0x1ADC5C2: main (in src/xcodebuild/Release/unit_tests)
>
> Thread 2: status = VgTs_WaitSys
> ==42768== at 0xF57D1A2: semaphore_wait_trap (in
> /usr/lib/libSystem.B.dylib) ==42768== by 0x22E849F:
> _ZN2v88internal15RuntimeProfiler27WaitForSomeIsolateToEnterJSEv (in
> src/xcodebuild/Release/unit_tests)
> ==42768== by 0x22E8567:
> _ZN2v88internal26RuntimeProfilerRateLimiter18SuspendIfNecessaryEv (in
> src/xcodebuild/Release/unit_tests)
> ==42768== by 0x23C6CCA: _ZN2v88internal13SamplerThread3RunEv (in
> src/xcodebuild/Release/unit_tests)
> ==42768== by 0x23C652E: _ZN2v88internalL11ThreadEntryEPv (in
> src/xcodebuild/Release/unit_tests)
> ==42768== by 0xF5AE054: _pthread_start (in /usr/lib/libSystem.B.dylib)
> ==42768== by 0xF5ADF11: thread_start (in /usr/lib/libSystem.B.dylib)
> ==========================================
>
>
> Thanks in advance,
> Alexander Potapenko
> Software Engineer
> Google Moscow
>
> ---------------------------------------------------------------------------
> --- Keep Your Developer Skills Current with LearnDevNow!
> The most comprehensive online learning library for Microsoft developers
> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
> Metro Style Apps, more. Free future releases when you subscribe now!
> http://p.sf.net/sfu/learndevnow-d2d
> _______________________________________________
> Valgrind-developers mailing list
> Val...@li...
> https://lists.sourceforge.net/lists/listinfo/valgrind-developers
|
|
From: Alexander P. <gl...@go...> - 2012-01-26 15:12:23
|
> There's a couple of things you can try. First you can try to see > if your app is doing some kind of out of range memory access that > Memcheck can't detect. > > * increase the free block queue size as much as you can, with > the --freelist-vol= and --freelist-big-blocks= flags. This > increases Memcheck's ability to detect access-after-free > errors. > > * increase the default redzone size for client (application) heap > blocks by changing MC_MALLOC_REDZONE_SZB in memcheck/mc_include.h > to (eg) 128. This will massively increase Memcheck's memory > consumption, but it will also make it possible to detect overruns > of up to 128 bytes. > > Secondly you can try to see if there is some heap corruption for > non-application blocks. > > * Do the same but for the other arenas, which hold non-application > blocks. Do this by changing the value 4 (3rd param) in the > 7 back-to-back calls to arena_init in m_mallocfree.c. These > red zones are checked at block free time. > > I just tested this using the patch shown below. Unfortunately it > detects underruns of the test block, but not overruns for some > reason. Maybe you can figure out why. > > J Thanks for your comments! So far I've tried redzone sizes up to 256 bytes. Some combinations of application/non-application redzones made the crashes disappear, with others the test would crash at a different place (always in the userspace memory allocation). No memory errors were reported by Memcheck prior to those crashes (I've patched Valgrind to print the suppressed errors as well). I am now trying to add magic bits to the fields of Block and check them on every access. My another guess is that the corruption can be caused by some memory allocation routines we do not wrap, which mess up with Valgrind's allocations. |
|
From: Julian S. <js...@ac...> - 2012-01-27 09:16:45
|
> My another guess is that the corruption can be caused by some memory > allocation routines we do not wrap, which mess up with Valgrind's > allocations. One way you can generally simplify the scenario is to run with --tool=none and see if it still fails. This removes all memcheck style instrumentation and removes all malloc etc function intercepting; basically runs the program on the framework with no instrumentation. J |