|
From: Ethan Tira-T. <ej...@an...> - 2005-09-21 22:16:05
|
My project involves control software for a robot. In order to describe
the problems I'm seeing when using valgrind, first allow me to give some
detail on how my code works, and why it does what it does.
----------------- Environment -----------------
We have a body of code written for the Sony Aibo, which we have ported
to linux in order to run in simulation, and hopefully, eventually also
control linux-based robots.
This software has a realtime process for motor control ("Motion"), and a
non-realtime process which handles higher level decision making and
sensor processing which might take extended processing ("Main"). These
two processes communicate through shared memory regions and semaphores.
In addition, there is a simulation process which emulates the system of
the Aibo, so it loads camera and sensor frames from disk (perhaps
network later) and sends them to the "Main" process, just as it would on
the real Aibo. Currently, each of these message queues involves a
thread on the loading side, and another thread on the receiving side
(which I might change to streamline performance, but that's a different
story)
Each of these processes is created via a fork, so the original parent
becomes "Main", its child becomes "Motion", and the grandchild becomes
"Simulator".
During initialization, before any forks occur, atexit() and signal() are
called to register callbacks to intercept shutdown. These callbacks
check to see if any shared memory regions or semaphore sets are still in
use, and unlink them if they are. (otherwise they will persist past
process destruction, leaking system resources -- a bad idea IMHO, but
this is linux's problem)
At shutdown, we first call pthread_cancel on the message queue threads,
and then wait for them to complete with pthread_join. Then once the
message queues are torn down, the individual processes are torn down.
----------------- Problems -----------------
1) When running in valgrind, when the threads exit, they appear to be
calling the function registered with atexit(). When running outside of
valgrind, these threads do not call the exit callback. (this causes
trouble for me because when the threads call the callback, there are
still shared memory regions, and so this causes the callback to run
around unlinking them before we're actually done with them.)
2) pthread_setcancelstate() seems to be ignored, or at least there's
*something* I don't quite understand going on there. My asserts go a
little crazy telling me whenever they set it, the previous value isn't
what it should have been. I use setcancelstate to turn off thread
canceling whenever the thread is in a mutex so that it won't be killed
while a mutex is held, causing a deadlock down the road.
3) When I exit, I run into some kind of deadlock condition, from which
my only recourse is to ctrl-C it, which tries to run my signal handler
(similar to the exit handler, attempts to unload shared memory regions
and semaphore sets), but then I get a particularly nasty crash with
seemingly infinite screens of garbage output (that doesn't ever happen
running normally)
- I wouldn't really care that much, but the root problem is that the
abnormal shutdown causes stuff to be leaked which normally wouldn't, and
hides stuff which might've been leaking.
----------------- Attempted workarounds -----------------
If I comment out my atexit() registration, then the threads seem to exit
well enough without triggering the "emergency shutdown" code, but then I
still run into trouble with deadlock and the ensuing crash with ctrl-C
-- perhaps an issue with the cancel state. But it's hard to see where
my processes are being hung up at because I can't figure out how to
attach gdb to the process in order to get a backtrace when it's blocking
on a semaphore.
thanks!
-ethan
|