From: Daniel G. <dg...@su...> - 2008-10-25 10:56:37
|
On Saturday 25 October 2008 12:27:24 Daniel Gollub wrote: > On Saturday 25 October 2008 03:13:57 Jin Bing Guo wrote: > > Hi Daniel, > > > > On 10/24 15:32PM, Daniel Gollub wrote: > > > On Friday 24 October 2008 15:15:54 Ji?? Pale鑕k wrote: > > > > On Fri, 24 Oct 2008 14:16:29 +0200, Daniel Gollub <dg...@su...> > > wrote: > > > > > I'm looking right now also into the mallocstress testcase. > > > > > With more recent kernel I experience (e.g. 2.6.27) failing > > > > > mallocstress on > > > > > x86_64. (Didn't tested different architectures yet). > > > > > > > > > > On which kernel did you test mallocstress? > > > > > 2.6.27? Or something different? > > > > > > > > 2.6.27-rc8, i386. However, I didn't notice until you asked, that when > > > > I tested the patch, the test actually succeeded, which is very weird. > > > > Before, I got a message "malloc: Cannot allocate memory". I have a > > > > theory, that it is caused by swapping of the semop() and malloc() > > > > calls (see the patch). That means before, a thread first waited on > > > > the semaphore, and when it got released, other threads might have > > > > been already stressing the memory, so there wasn't any free and > > > > malloc() of the return variable would fail. If that is really the > > > > case, the patch doesn't fix it, only lowers the probablity of such > > > > behaviour. However, making a proper patch should be easy in that > > > > case. > > > > > > On x86_64 i get slightly different problem: > > > > > > x86_64:~/:[1]# ulimit -c unlimited > > > x86_64:~/:[0]# ./mallocstress > > > Aborted (core dumped) > > > x86_64:~/:[134]# uname -i > > > x86_64 > > > x86_64:~/:[0]# uname -r > > > 2.6.27.1-2-default > > > x86_64:~/:[0]# gdb mallocstress core.5217 > > > [[[[[ .. snipped the default amount of threads .... ]]]]]] > > > [New Thread 5221] > > > Core was generated by `./mallocstress'. > > > Program terminated with signal 6, Aborted. > > > #0 0x00007f641d5f4725 in *__GI_raise (sig=<value optimized out>) > > > from /lib64/libc.so.6 > > > (gdb) bt > > > #0 0x00007f641d5f4725 in *__GI_raise (sig=<value optimized out>) > > > from /lib64/libc.so.6 > > > #1 0x00007f641d5f5d13 in *__GI_abort () from /lib64/libc.so.6 > > > #2 0x00007f641d6380b0 in malloc_printerr (action=2, > > > str=0x7f641d6e501b "free(): invalid pointer", ptr=0x1461) > > > from /lib64/libc.so.6 > > > #3 0x0000000000400e48 in allocate_free (repeat=100, scheme=0) > > > at mallocstress.c:233 > > > #4 0x0000000000400f4e in alloc_mem (threadnum=0x7fff25d57fb4) > > > at mallocstress.c:281 > > > #5 0x00007f641d925070 in start_thread (arg=<value optimized out>) > > > from /lib64/libpthread.so.0 > > > #6 0x00007f641d697a7d in clone () from /lib64/libc.so.6 > > > #7 0x0000000000000000 in ?? () > > > (gdb) > > > > I also encountered this porblem on 2.6.27.1-2-ppc64 (SLES11 Beta3). > > # uname -a > > Linux venuslp12 2.6.27.1-2-ppc64 #1 SMP 2008-10-16 20:35:15 +0200 ppc64 > > ppc64 ppc64 GNU/Linux > > # ./mallocstress > > Aborted (core dumped) > > # uname -i > > ppc64 > > # uname -r > > 2.6.27.1-2-ppc64 > > # gdb mallocstress ./core.6218 > > [[[[[ .. snipped the default amount of threads .... ]]]]]] > > [New Thread 6274] > > [New Thread 6265] > > [New Thread 6273] > > [New Thread 6218] > > [New Thread 6261] > > [New Thread 6220] > > [New Thread 6267] > > [New Thread 6277] > > Core was generated by `./mallocstress '. > > Program terminated with signal 6, Aborted. > > #0 0x00000400001c62a0 in .raise () from /lib64/libc.so.6 > > (gdb) > > Thanks for this information! > > I tried to bisect this, unfortunately on a different platfrom were i > original found the problem - and realized the issue doesn't appear with the > same kernel at all on this platform... > > How much main memory do you have on your ppc64 testhost? > > Not quite sure, but size of main memory was the first major difference i > found between the systems i tested - affected host (x86_64): 8GB main > memory - non-affected: 4GB main memory (two different systems: x86_64 and > i386). Just booted the 8GB box, where i experience that mallocstress is failing, with mem=4G and mem=2G - it's still failing. It's also failing now on the 4GB machine i tested before successfully. I'm completely on the wrong path... I'll try another bisect round on those machines. Maybe it's not only a kernel thing - maybe glibc is here involved as well. best regards, Daniel |