[Valgrind-users] threads vs main and invalid stack writes

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi.

Hope s/o around here can help. 

I'm looking at some strange valgrind complaints when combining main(),
calling alloca() while spawning a couple pthreads.

I seem to have reduced it to the following program:

#include <pthread.h>
#include <alloca.h>
#include <assert.h>
#include <stdarg.h>

void *
nop(void *nil)
{
    return NULL;
}

void
__yell(void)
{
    char buf[256];
    memset(buf, 0, sizeof(buf)); /* <= */
}

int main(int argc, char **argv)
{
    pthread_t thr[4];
    int i, err;

    for (i = 0; i < sizeof(thr) / sizeof(*thr); i++) {
        err = pthread_create(&thr[i], NULL, nop, NULL);
        assert(!err);
    }

    alloca(4096);
    __yell();

    for (i = 0; i < sizeof(thr) / sizeof(*thr); i++)
        pthread_join(thr[i], NULL);

    return 0;
}

With that, valgrind typically fails as follows:

$ gcc -O0 -g test2.c -lpthread && /var/tmp/valgrind/bin/valgrind --vgdb-error=1 ./a.out
==23755== Memcheck, a memory error detector
==23755== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al.
==23755== Using Valgrind-3.9.0.SVN and LibVEX; rerun with -h for copyright info
==23755== Command: ./a.out
==23755== 
==23755== 
==23755== TO DEBUG THIS PROCESS USING GDB: start GDB like this
==23755==   /path/to/gdb ./a.out
==23755== and then give GDB the following command
==23755==   target remote | /var/tmp/valgrind/lib/valgrind/../../bin/vgdb --pid=23755
==23755== --pid is optional if only one valgrind process is running
==23755== 
==23755== Invalid write of size 8
==23755==    at 0x4006B7: __yell (test2.c:16)
==23755==    by 0x40076C: main (test2.c:30)
==23755==  Address 0xffeffeed0 is on thread 1's stack
==23755== 
==23755== (action on error) vgdb me ... 

(gdb) print &buf
$6 = (char (*)[256]) 0xffeffeed0

(gdb) monitor get_vbits 0xffeffeed0 256
________ ________ ________ ________ ________ ________ ________ ________
________ ________ ________ ________ ________ ________ ________ ________
________ ________ ________ ________ ________ ________ ________ ________
________ ________ ________ ________ ________ ________ ________ ________
________ ________ ________ ________ ffffffff ffffffff ffffffff ffffffff
ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff
ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff
ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff
Address 0xFFEFFEED0 len 256 has 144 bytes unaddressable

Any ideas? It seems to depend on:

 - Some (small) number of threads being spawned.
 - A > page-sized alloca().
 - Reasonably sized memset on top.
 - It's always the main thread which suffers.

The real stuff I'm working with typically fails deeply in the dynamic linker,
preferably eglibc's dl-load.c:open_path building alloca'd path name to 
find nss code.

I'd add suppressions, but we've seen quite a bit of variance in the
number of code paths affected. 

The valgrind I'm testing with is 3.8.1. Ran into the original dl-load issue with 
3.7-Debian, iirc.

I tried svn trunk but that didn't seem to solve it.

Thanks + Cheers,
Daniel