The TZX of original release of "The Trapdoor" (sha256 bf2530f11286338bc602664d3842dafeec044050315025b2f85f047070054ed4) fails to load if accelerated loading is turned on. It loads successfully with --no-accelerate-loader.
Added to check_loaders as a test case.
Bugs: #292
Bugs: #388
Bugs: #389
Bugs: #390
Bugs: #391
Bugs: #392
Bugs: #393
Bugs: #394
Bugs: #395
Bugs: #397
Wiki: Fuse 1.4.1 Release Plan
the loader used in the game is quite standard rom loader with the same constants and only a few differences to rom loader:
1) different colors of stripes (so some constants are different, but these are not that important)
2) bits loaded to memory are ordered in opposite direction (so if on tape is byte 0x3A then to memory is stored 0x5C)
3) every now and then is executed some routine stored at 0xFED3 - this I expect is to animate eyes while game is loading
points 1 and 2 are not that crucial, but point 3 can change timing of loading quite a lot
by replacing instruction
0xFE68 CD D3 FE CALL 0xFED3
by three NOPs, game is loaded successfully.
how to do it:
enter debugger
set breakpoint on address 0xFE68
close debugger
LOAD ""
when break point is activated replace three bytes, delete breakpoint and close debugger
I've analyzed it little bit more and it seems to me, that there are some edges "lost" when acceleration is on. it seems to me they are "consumed" by standard event being fired, not by accelerator code (maybe the reason is that routine to animate eyes, which takes more than 470 Tstates).
so I tried this change:
with this change game loads successfully.
but the question is what constant to use (300 was just a test, it works also with 800 and also 1700). maybe for that constant should be used length of pulse from that previously removed edge event.
Unfortunately, that change breaks a significant number of other games, particularly Speedlock games (try e.g. Head over Heels, Midnight Resistance, Highway Encounter).
Many thanks for the analysis - I'm not ignoring this but my priority right now is to get a lot of test cases added to check_loaders so that we can know we're not regressing anything if we do make any changes to the loader acceleration code.
OK, I have a pretty detailed understanding of what's happening here; ub880d's analysis is pretty much correct. In both cases, counting "time 0" as the last read from tape before the "update Berk's eyes" routine at 0xfed3 is called:
In the non-accelerated case:
Time 0: PC = 0xfe85, tape sampled, second edge detected. Stored a little later as a zero bit.
Time 241: PC = 0xfe68, the "CALL FED3" mentioned above.
Time 657: PC = 0xfedb, back from the previous CALL.
Time 843: Tape edge occurs.
Time 1186: PC = 0xfe85, tape sampled, first edge detected.
Time 1663: PC = 0xfe85, tape sampled, no edge detected.
Time 1698: Tape edge occurs.
Time 1720: PC = 0xfe85, tape sampled, second edge detected. Store a little later as a zero bit.
In the accelerated case:
Time 0: PC = 0xfe85, tape sampled, second edge accelerated. PC set to 0xfe5d, tape edge scheduled for time 865. Bit stored a little bit later as a zero bit.
Time 52: PC = 0xfe68, the "CALL FED3" mentioned above.
Time 527: PC = 0xfedb, back from the previous CALL.
Time 865: Tape edge occurs.
Time 1043: PC = 0xfe85, tape sampled. At this point, the acceleration code accelerates the "current" edge, which is now the second edge of the next bit as we didn't sample the tape between time 0 and time 865. PC set to 0xfe7a, tape edge scheduled for time 2763 (ie the time at the end of the IN + 1710 as this is now a long edge)
Time 1433: PC = 0xfe85, tape sampled, second edge accelerated. As this was a long edge, this is stored as a one bit.
So two things have gone wrong here: we've missed an edge (which means we end up 8 bytes short at the end of the tape) and we've got the wrong data.
If you want to look at any of this yourself, run Fuse as "
fuse --debugger-command "break 0xfe85 if z80:b' == 1 && (z80:l & 1) && [z80:sp] == 0x5d && tape:microphone == 0" Trapdoor.tzx
. (You'll obviously need thetape:microphone
system variable I added a couple of hours ago...) Breaking those conditions down:Last edit: Philip Kendall 2017-09-23
Given that analysis, the first obvious fix here is to disable acceleration if we ever "miss" an edge - i.e. an edge occurs at the time it was first scheduled, rather than happening from loader.c:do_acceleration(). This turns out to be relatively easy to implement.
The bad news: this doesn't fix Trapdoor, although we get closer - it now loads in the right number of bytes, rather than ending up 8 short as it did before.
The good news: this does fix accelerated loading of Blood Brothers, City Slicker, Games Compendium (by Gremlin) and Zanthrax.
Committed to the "bugs-387-trapdoor-doesnt-load-with-acceleration" branch - I'll keep looking at what's still wrong with Trapdoor.
Hi, I've also tried this:
with this change, Trapdoor is loaded ok.
I've tried also few other games and they were also loaded ok, but not in every load. For example Saboteur, sometimes loading screen is not shown even that game starts and seems that it is playable (but I didn't play it too much, only few rooms). Then I've found that Saboteur uses turbo loader, so maybe there are more edges lost or so... (I've not analyzed it any deeper).
So also this change is not enough.
As I'm reading your timing analysis, I've spotted small inconsistency in it, which I don't clearly understand. 1st, there is written "0xfedb" as return address from Berk's eye animation routine, but it should be "0xfe6b", which is maybe just a typo. But what is more suspicious to me is, that in non-accelerated load, animation routine takes only 657-241 = 416 Tstates but in accelerated load it takes 527-52 = 475 Tstates. I'm attaching also my Tstate counting of animation code (I've counted only CPU Tstates, not any possible contention as it is writing to vram).
maybe that difference of 416 tstates is also caused by some typo (once I got 516 tstates difference for non-accelerated load) because routine (without contention) should take 475T (and with contention it could be more, but not less).
I got quite different Tstates length of the subroutine in one loading. I got also 475T for non-accelerated load and also I got bigger difference than 475 for accelerated load (if subroutine was called when value of tstates was small, routine took 475, but when tstates was something like 33k, subroutine took more) and so I think that is just caused by contention.
But I got 475 for accelerated load more times 'in a row' than for non-accelerated load, which I think is caused by the fact that acceleration "speeds up" tape and skips cpu instructions, but it is not changing tstates (so in accelerated load tstates value is not changed that much between iterations of load than in non-accelerated load). couldn't that be the somehow connected to why accelerated load fails?
OK, after a lot more sitting looking at Z80 code, traces and trying to work out what was going wrong it was finally obvious: the normal tape loading routine sets C as part of the edge-finding algorithm, but the accelerator doesn't do this. The obvious fix here is just to set C... and this fixes Trapdoor.
As an added bonus, it also fixes Driller, Dynamite Dan, Joe Blade II, Kokotoni Wilf, Powerplay and Saboteur without causing any regressions in the test suite (Bomb Jack now fails differently, but it failed before and it still fails now) so I'm calling that a win.
Fixed in commit 7f83be6d.
Unless there are any objections, I intend to merge this in the next couple of days.
And merged.