From: David C. <dc...@gm...> - 2014-01-26 22:28:58
|
Thank you very much, Philippe, The --fair-sched option was set in an attempt to fix this. I had read about interminable FUTEX_WAIT status and I think that was one of the suggestions. Clearly it doesn't make any difference. I think I've tried 3.9.0, but I will double-check and run that one from now on anyway. I have tried connecting with gdb and there wasn't much visible. I'll try again though and also try vgdb - I was unaware of this tool. Not sure what is getting locked, whether it's Valgrind or our code. We do use threading but only in a limited way, and I'm pretty sure memcheck is hanging up on single-threaded cases. Hopefully the extra logging etc will reveal something. I can't easily log onto the machine from here - I'll run the experiments you suggest and report back in a short while. One thing I didn't mention, which might be important, is that I run valgrind through a python-driven process-pool. I use the multiprocess module to spawn off a bunch of valgrinds. I don't think its relevant as it was working fine for several weeks like this before the hang-ups started. Best wishes and thanks again, David. On Sun, Jan 26, 2014 at 1:07 PM, Philippe Waroquiers < phi...@sk...> wrote: > On Sun, 2014-01-26 at 02:20 +0000, David Carter wrote: > > Hi, > > > > > > I've got an issue with memcheck in Valgrind 3.8.1 hanging. I've left > > processes running for weeks or even months but they don't complete > > (normally these processes run in a few minutes tops, and they were > > working fine with memcheck until a while ago. > > > > > > Has anyone seen anything like this before? Here are the details: > > > > > > options: > > > > --quiet --track-origins=yes --free-fill=7a > > --child-silent-after-fork=yes --fair-sched=no --log-file=/path/to/log > > --suppressions=/path/to/suppression.file > > > > > > > > strace shows: > > > > Process 5223 attached - interrupt to quit > > > > read(1027, > With --fair-sched=no, valgrind uses a pipe to implement a "big lock". > It is however not clear with what you have shown if this 1027 is > the valgrind pipe big lock fd. If yes, then it looks like a bug in > valgrind, as the above read means a thread want to acquire the big > lock to run, but the thread currently holding the lock does not > release it. > > Here are various suggestions : > 1. when you are in the above blocked state, use gdb+vgdb > to connect to your process, and examine the state > of your process (e.g. which thread is doing what) > (the most likely cause of deadlock/problem is your application, not > valgrind, at least when looking at your mail with > a "valgrind developer hat on" :). > > 2. upgrade to 3.9.0, there are many bugs solved since 3.8.1 > (probably not yours, I do not see anything related to deadlock > but one never knows). > > 3. run with a lot more traces e.g. > -v -v -v -d -d -d --trace-sched=yes --trace-syscalls=yes > --trace-signals=yes > and see if there is some suspicious output. > > Philippe > > > > |