I can't really provide any steps to reproducing this, but twice now the game has hung while trying to navigate the flup tube system. (For a savegame, see bug #6636.)
Both times it happened while trying to navigate a tight corner, so there was a lot of banging into walls. I attached GDB to the running process, and apparently it's hanging on a mutex in the mixer. Here is a stack trace:
#0 0xb778d424 in __kernel_vsyscall ()
#1 0xb6c33672 in __lll_lock_wait ()
at ../nptl/sysdeps/unix/sysv/linux/i386/i686/../i486/lowlevellock.S:144
#2 0xb6c2f1c2 in _L_lock_920 ()
from /lib/i386-linux-gnu/i686/cmov/libpthread.so.0
#3 0xb6c2f043 in __GI___pthread_mutex_lock (mutex=0xa5f41b8)
at ../nptl/pthread_mutex_lock.c:114
#4 0xb7056cc4 in pthread_mutex_lock (mutex=0xa5f41b8) at forward.c:192
#5 0xb7713c7f in SDL_mutexP () from /usr/lib/i386-linux-gnu/libSDL-1.2.so.0
#6 0x0936d41b in SdlMutexManager::lockMutex (this=0xa52c240, mutex=0xa5f41b8)
at backends/mutex/sdl/sdl-mutex.cpp:36
#7 0x0935d3df in ModularBackend::lockMutex (this=0xa4f7028, mutex=0xa5f41b8)
at backends/modular-backend.cpp:222
#8 0x094778b0 in Common::StackLock::lock (this=0xbf99d8c0)
at common/mutex.cpp:68
#9 0x0947784a in Common::StackLock::StackLock (this=0xbf99d8c0, mutex=...,
mutexName=0x0) at common/mutex.cpp:57
#10 0x09417d98 in Audio::MixerImpl::isSoundHandleActive (this=0xa6c2a98,
handle=...) at audio/mixer.cpp:452
#11 0x091571e6 in TsAGE::SoundBlasterDriver::updateVoice (this=0xa7ba120,
channel=0) at engines/tsage/sound.cpp:3131
#12 0x09151250 in TsAGE::SoundManager::sfRethinkVoiceTypes ()
at engines/tsage/sound.cpp:1273
#13 0x0914ef90 in TsAGE::SoundManager::sfAddToPlayList (sound=0xa7ec504)
at engines/tsage/sound.cpp:625
#14 0x0914e4a5 in TsAGE::SoundManager::addToPlayList (this=0xa797d40,
sound=0xa7ec504) at engines/tsage/sound.cpp:357
#15 0x091524e0 in TsAGE::Sound::play (this=0xa7ec504, soundNum=276)
at engines/tsage/sound.cpp:1564
#16 0x09154fc8 in TsAGE::ASound::play (this=0xa7ec4fc, soundNum=276, endAction=
0x0, volume=127) at engines/tsage/sound.cpp:2472
#17 0x09134cdc in TsAGE::Ringworld2::Scene3500::dispatch (this=0xa7eaed8)
at engines/tsage/ringworld2/ringworld2_scenes3.cpp:4261
#18 0x08fea2fa in TsAGE::SceneHandler::dispatch (this=0xa73f068)
at engines/tsage/core.cpp:4459
#19 0x09085e1d in TsAGE::Ringworld2::SceneHandlerExt::dispatch (this=0xa73f068)
at engines/tsage/ringworld2/ringworld2_logic.cpp:630
#20 0x08fe988c in TsAGE::GameHandler::execute (this=0xa73f068)
at engines/tsage/core.cpp:4263
#21 0x0914d244 in TsAGE::Game::execute (this=0xa73f058)
at engines/tsage/scenes.cpp:617
#22 0x08fd8ef7 in TsAGE::TSageEngine::run (this=0xa5f3400)
at engines/tsage/tsage.cpp:134
#23 0x08051e5b in runGame (plugin=0xa547e48, system=..., edebuglevels=...)
at base/main.cpp:244
#24 0x08052d9b in scummvm_main (argc=4, argv=0xbf9e3be4) at base/main.cpp:489
#25 0x08050af9 in main (argc=4, argv=0xbf9e3be4)
at backends/platform/sdl/posix/posix-main.cpp:45
One thing that struck me is that while it seems pretty clear that it's the main thread hanging, waiting for a mutex, it also seems reasonable to think that the mutex was locked by another thread.
I noticed that rethinkVoiceTypes() also gets killed from sfSoundServer(), which is called from readBuffer(). To me that seems like a possible race condition, though I'm rather too tired to give it much more of a thought.
I added some debug messages to mutex locking/unlocking (easier than I thought, since we already have the option of naming mutexes) and then deliberately flew into walls until it hung.
From the look of it, it locked the mutex from MixerImpl::mixCallback(), which is what you would expect. From there it probably called readBuffer() to get the audio data, and at some point before that terminated it called MixerImpl::setChannelVolume(), which tried to lock the already locked mutex.
That's probably not quite the same spot as in the backtrace above, but the point is that readBuffer() must not be allowed to do anything that waits for the mixer mutex to become unlocked, because that's not going to happen until after readBuffer() finishes. Or at least, that's how I understand it.
So in summary, my understanding is that the problem is that we call the mixer from within readBuffer(), i.e. from within the mixer callback when the mutex is already locked.
To fix the problem, we'd either have to stop doing that, or the mixer would have to provide alternative versions of those functions that are safe to call from within the mixer callback. Though that would probably still require some rewriting in TsAGE because I think this is code that's also called from the main thread.
I am not sure whether I understand you correctly, but: When you call the Mixer from within the Mixer's callback, like in readBuffer, that shouldn't be an issue because we allow recursive locking, i.e. you can lock a mutex multiple times from the same thread.
I think it would be helpful if you can get a backtrace of both the main thread and the audio thread. Probably the audio thread tries to lock an mutex from within TsAGE which is already locked from the main thread. For example, in the backtrace TsAGE::SoundManager::sfRethinkVoiceTypes is called which locks an TsAGE internal mutex. This same mutex is also locked inside TsAGE::AdlibSoundDriver::readBuffer, which is called from the audio thread. So, my guess is that this mutex in TsAGE itself is locked from the main thread. Then the audio thread kicks in and waits for the TsAGE mutex to be unlocked. The main thread in the meantime waits for the Mixer mutex to be unlocked (which won't happen because now TsAGE::SoundBlasterDriver::updateVoice waits for the Mixer mutex to be unlocked).
I did not know you could lock the mutex recursively like that. Maybe I've been barking up the wrong tree, but I could have sworn that the sound stopped...
I've made two more attempts at reproducing the hang, but so far I've only got what I think is a completely different hang, where sound keeps playing but the mouse cursor is gone. (I've been using arrow keys rather that clicking on-screen buttons, so maybe I was doing that too quickly for it to keep up, or something silly like that.)
Finally got the deadlock again. I think this is the backtrace for all the threads:
I should point out that unlike the other hang I mentioned, in the one where it hangs on a mutex the sound has indeed gone quiet. The mouse cursor is visible, but is not being updated.
(In the other one, the sound is playing and the mouse cursor is gone. Code is being executed, but nothing is happening.)
I think I've correctly removed mutex locks from my sound code that could result in deadlocks in the main thread versus the calls made by readBuffer in the audio processing thread. But it will still need testing.
I don't know if it's any help whatsoever, but here's a savegame with the game having hung in the non-deadlocky way.
It deadlocked again for me:
It turns out that the deadlock can happen in other places as well. I got this backtrace when trying to open a door after getting through the flup tubes:
Here is the backtrace of a deadlock I got when restoring an earlier savegame while I was in the middle of the final (?) confrontation.
Is there any details you can give me to help me replicate the non-deadlocky lock?
For technical details: This lock is happening during the signalling sequence of Scene3500::Action1::signal.. it has an _actionIndex controlling the switch that handles the states of animating changing direction. At index 2, it starts a mover to move _tunnelVertCircle to a given position, at the end of which it should signal the action to do the next index (3). In the given savegame, it seems like the mover has finished, but the action hasn't been signaled. So the scene remains in a "move disabled state", stopping any further movement actions from being done.
I'm afraid not. As I wrote earlier, it seemed to me like it would happen when trying to navigate some tight corner, by which I meant trying to make a turn right after another turn. And since I would often try it at default speed first, rather than slowing down, that meant hitting the arrow keys in rapid succession to first turn one way, then another, usually hitting a wall instead.
But that could have just been my imagination.
I can confirm this bug.
It doesn't only happen when trying to navigate a tight corner. It seems that it does only happen when you hit a wall or the end of a tube. I also noticed that when you hit a wall or the end of a tube the sound of the crash is sometimes too late or no sound at all.
This bug doesn't happen when you follow a walkthrough and fly straight to the end of the end of the flub tube system. But if you don't follow a walkthrough (like I do) and fly around in the flub tube system trying the find the right way, you should run in to this bug sooner or later.