|
From: Guilhem B. <gu...@my...> - 2004-08-07 09:43:17
|
Hi,
This is an amended version of an email I sent on Jul 27th to this same
list, to discuss about something strange with signals in
multi-threaded programs when run under Valgrind 2.1.2. Please, I would
really love an answer, if you have time. The provided test program
obeys POSIX rules about calling sigwait() in multi-threaded programs,
so I do believe there is something not normal happening with
Valgrind 2.1.2.
Using Linux 2.4.22 LinuxThreads.
The original problem is that when run in Valgrind 2.0.0 (and older),
the MySQL daemon (mysqld) reacts to the TERM signal (that is, it does
what it is supposed to do: exit gracefully), but when run in Valgrind
2.1.2 (and, I believe, 2.1.0), mysqld does not react to TERM at all.
Signal catching in mysqld is left to one thread, which does a
sigwait() until it gets a TERM signal.
Note that mysqld, which is a multi-threaded application, shows up as
one unique process when in 2.0.0, and several when in 2.1.2:
In 2.0.0:
[guilhem@gbichot2 guilhem]$ ps -elf | grep mysqld
0 S guilhem 7444 2687 19 71 0 - 15621 nanosl 00:14 pts/3 00:00:03 /home/mysql_src/mysql-4.0/sql/mysqld --defaults-file=/home/mysql_src/my_master.cnf --user=guilhem --datadir=/m/data/4/1 --server-id=1 --log-bin --language=/home/mysql_src/mysql-4.0/sql/share/english/ --skip-grant-tables --skip-innodb --skip-bdb --debug
vs in 2.1.2:
[guilhem@gbichot2 guilhem]$ ps -elf | grep mysqld
0 S guilhem 800 2687 3 69 0 - 394306 poll 23:25 pts/3 00:00:06 valgrind --tool=memcheck /home/mysql_src/mysql-4.0/sql/mysqld --defaults-file=/home/mysql_src/my_master.cnf --user=guilhem --datadir=/m/data/4/1 --server-id=1 --log-bin --language=/home/mysql_src/mysql-4.0/sql/share/english/ --skip-grant-tables --skip-innodb --skip-bdb --debug
1 S guilhem 801 800 0 69 0 - 394306 pipe_w 23:25 pts/3 00:00:00 valgrind --tool=memcheck /home/mysql_src/mysql-4.0/sql/mysqld --defaults-file=/home/mysql_src/my_master.cnf --user=guilhem --datadir=/m/data/4/1 --server-id=1 --log-bin --language=/home/mysql_src/mysql-4.0/sql/share/english/ --skip-grant-tables --skip-innodb --skip-bdb --debug
1 S guilhem 802 800 0 69 0 - 394306 rt_sig 23:25 pts/3 00:00:00 valgrind --tool=memcheck /home/mysql_src/mysql-4.0/sql/mysqld --defaults-file=/home/mysql_src/my_master.cnf --user=guilhem --datadir=/m/data/4/1 --server-id=1 --log-bin --language=/home/mysql_src/mysql-4.0/sql/share/english/ --skip-grant-tables --skip-innodb --skip-bdb --debug
1 S guilhem 803 800 0 69 0 - 394306 nanosl 23:25 pts/3 00:00:00 valgrind --tool=memcheck /home/mysql_src/mysql-4.0/sql/mysqld --defaults-file=/home/mysql_src/my_master.cnf --user=guilhem --datadir=/m/data/4/1 --server-id=1 --log-bin --language=/home/mysql_src/mysql-4.0/sql/share/english/ --skip-grant-tables --skip-innodb --skip-bdb --debug
1 S guilhem 804 800 0 69 0 - 394306 pipe_w 23:25 pts/3 00:00:00 valgrind --tool=memcheck /home/mysql_src/mysql-4.0/sql/mysqld --defaults-file=/home/mysql_src/my_master.cnf --user=guilhem --datadir=/m/data/4/1 --server-id=1 --log-bin --language=/home/mysql_src/mysql-4.0/sql/share/english/ --skip-grant-tables --skip-innodb --skip-bdb --debug
1 S guilhem 1074 800 0 69 0 - 394306 pipe_w 23:26 pts/3 00:00:00 valgrind --tool=memcheck /home/mysql_src/mysql-4.0/sql/mysqld --defaults-file=/home/mysql_src/my_master.cnf --user=guilhem --datadir=/m/data/4/1 --server-id=1 --log-bin --language=/home/mysql_src/mysql-4.0/sql/share/english/ --skip-grant-tables --skip-innodb --skip-bdb --debug
1 S guilhem 1075 800 0 69 0 - 394306 pipe_w 23:26 pts/3 00:00:00 valgrind --tool=memcheck /home/mysql_src/mysql-4.0/sql/mysqld --defaults-file=/home/mysql_src/my_master.cnf --user=guilhem --datadir=/m/data/4/1 --server-id=1 --log-bin --language=/home/mysql_src/mysql-4.0/sql/share/english/ --skip-grant-tables --skip-innodb --skip-bdb --debug
Mono-thread applications runs fine in Valgrind 2.1.2 as far as signal
catching is concerned.
I have written a test program which demonstrates something is strange
(either in Valgrind or in my test program):
#include <stdlib.h>
#include <signal.h>
sigset_t set;
void *sigcatch(void *arg)
{
int sig, i;
printf("SIGCATCH started\n");
if (pthread_detach(pthread_self()))
printf("SIGCATCH could not detach\n");
sigemptyset(&set);
sigaddset(&set, 15);
while(1)
{
printf("SIGCATCH sigwait\n");
sigwait(&set, &sig); // rt_sig in 'ps'
printf("SIGCATCH saw signal %d\n", sig);
}
return NULL;
}
main()
{
pthread_t sighandler;
sigemptyset(&set);
sigaddset(&set, 15);
if (pthread_sigmask(SIG_BLOCK, &set, NULL))
printf("MAIN could not sigmask\n");
if (pthread_create(&sighandler, NULL, sigcatch, NULL))
printf("MAIN could not create thread sigcatch\n");
printf("MAIN thread sigcatch created\n");
sleep(1000); // nanosl in 'ps'
}
This is greatly inspired by an example of "Programming with POSIX
Threads" of David Butenhof.
When run in 2.0.0,
gcc -lpthread -g a.c ;valgrind ./a.out
I see:
[guilhem@gbichot2 guilhem]$ ps -elf | grep a.out
0 S guilhem 7524 732 1 75 0 - 5193 nanosl 00:16 pts/4 00:00:00 ./a.out
and if I send it 3 TERM signals:
[guilhem@gbichot2 tmp]$ gcc -lpthread -g a.c ;valgrind ./a.out
==7524== Memcheck, a.k.a. Valgrind, a memory error detector for x86-linux.
==7524== Copyright (C) 2002-2003, and GNU GPL'd, by Julian Seward.
==7524== Using valgrind-2.0.0, a program supervision framework for x86-linux.
==7524== Copyright (C) 2000-2003, and GNU GPL'd, by Julian Seward.
==7524== Estimated CPU clock rate is 1662 MHz
==7524== For more details, rerun with: -v
==7524==
MAIN thread sigcatch created
SIGCATCH started
SIGCATCH sigwait
SIGCATCH saw signal 15
SIGCATCH sigwait
SIGCATCH saw signal 15
SIGCATCH sigwait
SIGCATCH saw signal 15
SIGCATCH sigwait
When run in 2.1.2 (--tool=memcheck), I see
[guilhem@gbichot2 guilhem]$ ps -elf | grep a.out
0 S guilhem 3492 2658 15 72 0 - 388272 poll 22:00 pts/1 00:00:00 valgrind --tool=memcheck ./a.out
1 S guilhem 3493 3492 0 69 0 - 388272 nanosl 22:00 pts/1 00:00:00 valgrind --tool=memcheck ./a.out
1 S guilhem 3494 3492 0 71 0 - 388272 rt_sig 22:00 pts/1 00:00:00 valgrind --tool=memcheck ./a.out
and when I do a kill -TERM on the rt_sig process (which is the
sigwait() one I believe), or when I do a 'killall valgrind', nothing
happens, it stays at:
[guilhem@gbichot2 tmp]$ gcc -lpthread -g a.c ;valgrind ./a.out
==3567== Memcheck, a memory error detector for x86-linux.
==3567== Copyright (C) 2002-2004, and GNU GPL'd, by Julian Seward et al.
==3567== Using valgrind-2.1.2, a program supervision framework for x86-linux.
==3567== Copyright (C) 2000-2004, and GNU GPL'd, by Julian Seward et al.
==3567== For more details, rerun with: -v
==3567==
MAIN thread sigcatch created
SIGCATCH started
SIGCATCH sigwait
Looks like something fishy ? Looks like the signal is correctly masked
(as the program does nothing); but the sigwait() thread does not get
the signal?
Thanks for any help you could provide. Maybe I am doing something
wrong. But the MySQL code hasn't been changed for years and it used to
work in Valgrind <= 2.0.0.
Thank you again for providing Valgrind to us!!
--
__ ___ ___ ____ __
/ |/ /_ __/ __/ __ \/ / Mr. Guilhem Bichot <gu...@my...>
/ /|_/ / // /\ \/ /_/ / /__ MySQL AB, Full-Time Software Developer
/_/ /_/\_, /___/\___\_\___/ Bordeaux, France
<___/ www.mysql.com
|