From: Wade H. <wad...@ns...> - 2002-11-11 22:11:58
|
More info on my problems. There appears to be an initialization problem. I can get gmond to crash on my head node as easily as on a slave node. Often it crashes in lib/inetaddr.c : g_inetaddr_new() calling strdup(name). The problem seems to be with memory allocation. I compiled with dmalloc and got gmond to fail with an error about recursive mallocs. Something is writing on the malloc memory area, I think. I can also get gmond to fail if I try to connect to it while it is initilializing. This causes one thread to have a segmentation violation. Also, the results of barrier_init are not checked in gmond.c. If the program receives a signal, this can result in interrupted system calls and barrier_init failing, but the code thinking all is ok. I suspect there are some other places or there is an initialization problem. Should the initialization code start a thread then wait for a semaphore before starting the next? BTW, all my testing is on a dual 2G XEON box with hyperthreads enabled (appears as 4 CPUS, hence 4 simultaneous threads could be running). After all this, I finally got gmond to work on my slave node (sometimes). 1) compile with -ldmalloc (add to LIBS line in Makefile) 2) compile options -ggdb and remove the -O options 3) manually copy /usr/lib/dmalloc.o 4) copy the /etc/gmond.conf file to the node 5) bpsh 0 ./gmond Of course, I have debug set to 100 so I'm getting LOTS of output..... Any help would be appreciated. -- Wade Hampton |