Playing through BS2, when Nico talks to the Attendant in the British Museum, causes a interactive lockup.
ScummVM is not frozen and the debug console is available, but BS2 will not respond i.e. this looks like a script bug.
This is likely to be the same or similar issue to bug 2607089.
ScummVM 1.1.0pre48374 (Mar 23 2010 23:16:52)
Features compiled in: Vorbis FLAC MP3 ALSA RGB zLib FluidSynth
built on Linux x86_32 2.6.31 by GCC 4.3.4
Savegame prior to bug
Console Output
Screen capture of lockup with debug overlay information activated
Have attached :
* latest savegame prior to the lockup occurring,
* a capture of the console output which like bug 2607089 shows a lot of "WARNING: No free slot in FX queue!", but I'm not sure that this is related to the issue seen, and
* a screen capture of the lockup. Since the lockup did not affect the engine, I used the debug console to enable debug output information, which is shown.
I think the line "Script waiting for attendant_71 (3029)" is the issue at hand.
Does it always happen? I can't reproduce it at all.
I was probably not clear about the phrase "Playing through BS2".
I have also been unable to replicate this by loading the savegame, but
this occurred when I was playtesting BS2 and had played through almost the
entire game from the start. I had been saving and I think I loaded from a savegame
fairly near the beginning, but otherwise it was a straight play run.
This bug looks like it might be caused by a resource leak of some kind, maybe related to
"WARNING: No free slot in FX queue!" which then causes a script to get hung up, but
replication is likely to require playing through the game from the start with a debugger attached to
try to get the same resource exhaustion...
The strange thing is that I'm pretty sure the FX queue is cleared every time you enter a new room, so I don't know why it would fill up here. The only thing I can think of is that the resource manager (if memory serves me) expires resources when it runs out of memory. I guess it's possible, in some rare case, that it expires something that it shouldn't have, and that this would cause it to forget that a sound has already been added to the queue, or something silly like that. (I don't think there's any safeguard against adding the same sound more than once, and for all I know there may be cases where it's sensible to allow it.)
I've added an "fxq" debugger command to the trunk, which prints the contents of the FX queue. Maybe that will come in use some day.
eriktorbjorn : Thanks for the "fxq" command.
Will try playing through from start and see if I can replicate and see what "fxq" then gives.
Will also give valgrind a try.
Suspect this is going to be a slow one to get replication, but will try.
Screen capture of lockup with debug output overlay Run #2
fxq output for Run #2 - Part #1
fxq output for Run #2 - Part #2
reslist output for Run #2
fxq output after savegame reload i.e. no bug
Have played through again from beginning with :
ScummVM 1.2.0svn48496 (Apr 4 2010 00:27:34)
Features compiled in: Vorbis FLAC MP3 ALSA RGB zLib FluidSynth
And reliably hit the bug.
The "WARNING: No free slot in FX queue!" are the cause. The fxq command shows that
the FX queue has been exahusted of slots as upon entering the British Museum (As opposed
to starting to talk to the attendant), three background effects are meant to be loaded.
For some reason, when you play from the start, these three are loaded repeatedly i.e about
10+ times exhausting the FX queue and causing the script problems afterwards.
Loading a savegame in the British Museum or at Ketch's Landing in the previous scene,
does not trigger this problem.. so replication still requires playing from a new game state.
I have attached copies of the fxq and reslist output with the bug and fxq without.
I'm going to think out loud for a while...
A game resource is whatever game data that's loaded from disk. It could be graphics, sound, scripts, etc.
The engine is full of cases where it does something like "open resource, manipulate resource, close resource". However, closing a resource is not the same as removing it from memory. It simply tells the resource manager that it's done with the resource for now. The resource manager is expected to cache the resource for later use. In fact, some resources are stateful (e.g. script resources contain local variables) which means that if they were thrown away on closing and reloaded on opening, information would be lost. Incidentally, configuring the resource manager to not cache causes "No free slot in FX queue" warnings at the very beginning of the game.
But obviously, resources cannot be cached forever or we'd run out of memory. The resource manager deals with this by keeping track of how much memory is allocated, which resources are open at the moment, and when it exceeds its memory quota it expires what it thinks are the oldest resources that aren't open at the time.
So maybe the resource manager is expiring some stateful resource that belongs to the current room, and this causes things to go haywire. In that case, I can think of two possibilities:
1) There's something wrong with the "expire the oldest resources" logic. However, I can't see anything obviously wrong at the moment.
2) The game leaks resources, i.e. it opens some resource that it should have closed but doesn't. (There are resources that it legitimately should never close, of course. The global variables resource, for instance.) When I tried a random savegame, I had to enter three rooms before it started expiring anything, so normally it shouldn't expire anything museum-related while you're still there. But maybe if you play the game from the beginning, the museum is the point where the leak has accumulated to where it can no longer hold an entire room's resources in what memory remains?
I'll have to play around a bit with the latter assumption. Not right now, though.
After playing through the early parts of the game, I didn't see any leaks in the list of loaded resources. The current implementation doesn't always evict the least recently opened resource first. Actually, it evicts the least recently *closed* resource, but I think that should still be ok because we can assume that resources are used while they're held open. (I believe it's primarily sound effects that are held open for long periods of time, but I think the sounds from the previous room are still thrown out before anything from the current room.)
I'm not sure what to look for next. Perhaps checking if the memory usage is calculated properly. If it's not, and it thinks it's using more memory than it is, it will of course start throwing out resources much faster than before. But I don't see how that could happen...
Have run through to this bug with:
ScummVM 1.2.0svn48496 (Apr 4 2010 00:27:34)
Features compiled in: Vorbis FLAC MP3 ALSA RGB zLib FluidSynth
running under Valgrind.
This has revealed two memory errors, which should be corrected.
The first is :
==8765== Conditional jump or move depends on uninitialised value(s)
==8765== at 0x8138334: Sword2::Logic::fnSetFrame(int*) (function.cpp:406)
==8765== by 0x813A328: Sword2::Logic::fnSpeechProcess(int*) (function.cpp:1356)
==8765== by 0x813C7F7: Sword2::Logic::runScript2(unsigned char*, unsigned char*, unsigned char*) (interpreter.cpp:620)
==8765== by 0x813E59D: Sword2::Logic::processSession() (logic.cpp:130)
==8765== by 0x81247A0: Sword2::Sword2Engine::gameCycle() (sword2.cpp:717)
==8765== by 0x8125E20: Sword2::Sword2Engine::run() (sword2.cpp:536)
==8765== by 0x8056470: runGame(PluginSubclass<MetaEngine> const*, OSystem&, Common::String const&) (main.cpp:217)
==8765== by 0x8056E19: scummvm_main (main.cpp:390)
==8765== by 0x80532C5: main (main.cpp:65)
And occurs in Montfaulcon at the Cafe, during the "Meanwhile" non-interactive cutscene, with Nico.
To replicate, load the attached savegame. Talk to the waiter about Labino etc. which triggers Labino
to arrive. Talk to Labino, especially about the Pot, until the "meanwhile" cutscene is triggered.
Note : Sorry, can't get a closer savegame as these sections during conversations are non-saveable points.
Savegame for replication of 1st valgrind error - "Meanwhile" cutscene
The second is :
==24377== Use of uninitialised value of size 4
==24377== at 0x816FCD4: Sword2::Screen::scaleImageGood(unsigned char*, unsigned short, unsigned short, unsigned short, unsigned char*, unsigned short, unsigned short, unsigned short, unsigned char*) (render.cpp:171)
==24377== by 0x8182170: Sword2::Screen::drawSprite(Sword2::SpriteInfo*) (sprite.cpp:704)
==24377== by 0x817B2EC: Sword2::Screen::processImage(Sword2::BuildUnit*) (screen.cpp:673)
==24377== by 0x817B84E: Sword2::Screen::drawSortFrames(unsigned char*) (screen.cpp:482)
==24377== by 0x817D3EC: Sword2::Screen::buildDisplay() (screen.cpp:315)
==24377== by 0x814B7AB: Sword2::Sword2Engine::run() (sword2.cpp:556)
==24377== by 0x80565CC: runGame(PluginSubclass<MetaEngine> const*, OSystem&, Common::String const&) (main.cpp:217)
==24377== by 0x8056F75: scummvm_main (main.cpp:390)
==24377== by 0x8053421: main (main.cpp:65)
This error repeats 11 times, with lines in render.cpp as follows :
171-3, 175-7, 163-5 and 167-9
And occurs when you enter the Mining Company in Quaramonte.
To replicate, load the attached savegame and enter the stairs to
the mining company.
Savegame for replication of 2nd valgrind error - Mining Company, Quaramonte
Debug Patch for 2nd valgrind error - Mining Company, Quaramonte
I have taken a look at the second valgrind error in the Mining Company in Quaramonte.
I have traced the source of the uninitalised usage to the backPtr accessing outside the screen area,
when y = 161 and rd.top = 278.
My debugging patch for this is attached. If run with valgrind, the output should be aligned.
Note, this will only manifest with the highest image quality setting in the menu i.e. this triggers the use of scaleImageGood, rather than scaleImageFast which does not use backBuf.
eriktorbjorn: I could clamp backBuf accesses, but I'm not sure that is the best solution for this.
Sorry about the delay. I've committed a fix (hopefully) for the scaler Valgrind warning, but I doubt that's the cause of this bug. From what I understand, it was only reading data where it shouldn't, not writing. (Even the bad reads may very well have been within the allocated buffer.)
I haven't looked at the other warning yet.