Hi wonderful people! First of all, thanks for all your fantastic work on Vice!
I think i have found three bugs in how the "g XXXX" command works after a break in the monitor (and in the remote-monitor). Hopefully you will agree that these are issues. I will first introduce the issues, then describe them in more detail, and finally show how to reproduce them with a test C64-program and some monitor-commands. Introduction:
Note that the "g" command, without an address afterwards, does not seem to be affected by any of this. These problems are both in the normal monitor and in the remote-monitor.
These 3 issues means that whenever you're using "g XXXX", the command has a chance to cause a crash/glitch/corruption (depending on the C64-code). They also make it difficult to use the remote-monitor for unit-testing of C64-code.
I have tried to make a simplified testcase reproducing the above errors.
Remember, problems 1 and 2 happens when we break just before the 7-cycle interrupt-BRK, and then do a "g XXXX", so this is what we have to reproduce.
How it works: The idea is to break at the same time a VIC raster-interrupt occurs, as an easy way to see what happens when we break after the last instruction before an interrupt should start up.
More precisely, the C64-code starts a raster-interrupt at rasterline $FE, and the monitor-commands also breaks at rasterline $FE and then executes a "g XXXX".
How to run it: Load the prg into memory (but do not run it) and then execute the monitor-commands. Afterwards, the CPU-history will show the three issues described above. (You will have to run it a few times to get the "correct jitter" for problem#1 and problem#2 - at most one of them will happen on a run)
This is the C64 assembler code (kickass). It's also attached. I've also attached a PRG file, so you don't have to assemble it. Simply load the PRG into VICE.
.const rasterline = $fe
* = $1000 "Code"
start: sei
lda #$35
sta $01
bit $D012 // wait for rasterline $100, before proceeding
bpl *-3
setupIRQRasterInterrupt(rasterline, irq)
break: jmp *
irq: eor #0 // just to do something to make the interrupt-code easy to spot in the CPU-history
inc $d020
inc $d021
inc $d019
endIrq: rti
otherCode: // This is where we "g XXXX" to
inc bugDetect
jmp break
bugDetect: .byte 0 // Should stay 0 until AFTER the first interrupt. If it is 1 inside the first interrupt, we're doomed. (It actually becomes 2)
// Setup a raster-interrupt. Lengthy boilerplate code, which is not relevant for the test, so it's hidden down here to reduce noise. Focus on the code above.
.macro setupIRQRasterInterrupt(rasterline, interruptHandler) {
sei
lda #<interruptHandler
sta $fffe
lda #>interruptHandler
sta $ffff
// select IRQ events (VIC: raster, CIA1: none)
lda #%00000001 // VIC: raster-event
sta $d01a
lda #%00011111 // CIA1: none
sta $dc0d
// select rasterline in for the raster-event ($D012[w] and bit 7 of $D011[w])
.if (rasterline != null) {
lda #<rasterline
sta $d012
lda $d011
.if (rasterline >= $100) {ora #%10000000} else {and #%01111111}
sta $d011
}
// ack any pending VIC and CIA1 events just in case
bit $dc0d
lda #%00001111
sta $d019
//.if (autoEnable) {cli}
cli
}
In my test-code on the PC-side, i send monitor commands by calling some methods of a homemade Vice-class, which hopefully serves as a high-level description to you of what the monitor commands below do: (Each of these methods basically justs to send some commands to the remote-monitor)
@Test
void test() throws Exception {
vice.loadAss("viceBreakBeforeInterruptBug.ass")
.runUntil("start", "break")
.waitRasterline(0xFE)
.printMemory(0xD019, 0xD019) // Just to be 100% sure that bit 7 of $D019 is actually 1 here (= the VIC wants to generate an interrupt). (Since the monitor can break between cycle 63 and cycle 1, it might actually not be - TODO: Hmm... But it is always! Maybe i misunderstand something about where exactly breakpoints breaks between two cycles??)
.runUntil("otherCode", "break")
.then(() -> vice.printCpuHistory());
// assertions removed
}
These are the monitor commands corresponding to the above code: (you should execute this manually in the monitor after having loaded the PRG. Alt+H and copy/paste is fine)
; loadAss("viceBreakBeforeInterruptBug.ass"). You can also just drag'n'drop the PRG into vice and skip this. So i have commented the loading out for now.
delete
;load "C:\...<insert correct path here>...\viceBreakBeforeInterruptBug.prg" 0
; runUntil("start", "break")
break exec 1035
g 1000
delete 1
; waitRasterline(0xFE)
break exec 0000 FFFF if RL == $fe
x
delete 1
; printMemory(0xD019, 0xD019)
m D019 D019
; runUntil("otherCode", "break")
break exec 1035
g 1044
delete 1
; printCpuHistory()
cpuhistory
In all the cases the raster-interrupt occurs at cycle 1 (as raster-interrupts always do) of rasterline $FE, meaning that the interrupt-sequence should start as soon as the instruction executing at cycle 1 is done (unless perhaps if it was a taken branch not crossing page-boundaries - but it is not).
In the test-code, this instruction is always a JMP. Vice also breaks at rasterline $FE. It does this either after the instruction that executed at cycle 1 is done, or between cycle 63 and 1, if the previous instruction ended at cycle 63. (as far as i can tell)
Depending on which cycle of the JMP is executing at cycle 1 of rasterline $FE we get the following cases:



In all the above cases, it can also be seen that the instruction we jumped to with "g XXXX" in executed twice (problem#3). (In the test-code, you can also see this in that "bugDetect" becomes 2 instead of 1, after the INC at "otherCode")
Here is the CPU-history for the 3 different possible runs/cases as text. I've only included the last few lines of the CPU-histories here, since only those are the interesting ones (where the issues happen). I've attached the full CPU-histories.
.C:1035 4C 35 10 JMP $1035 A:0f X:00 Y:0a SP:f3 ..-..... 14011063 RL: 253 CYC: 53 break
.C:1035 4C 35 10 JMP $1035 A:0f X:00 Y:0a SP:f3 ..-..... 14011066 RL: 253 CYC: 56 break
.C:1035 4C 35 10 JMP $1035 A:0f X:00 Y:0a SP:f3 ..-..... 14011069 RL: 253 CYC: 59 break
.C:1035 4C 35 10 JMP $1035 A:0f X:00 Y:0a SP:f3 ..-..... 14011072 RL: 253 CYC: 62 break
BUG .C:1044 EE 4A 10 INC $104A A:0f X:00 Y:0a SP:f3 ..-..... 14011075 RL: 254 CYC: 2 otherCode
BUG .C:1044 EE 4A 10 INC $104A A:0f X:00 Y:0a SP:f3 ..-..... 14011088 RL: 254 CYC: 15 otherCode
.C:1038 49 00 EOR #$00 A:0f X:00 Y:0a SP:f0 ..-..I.. 14011101 RL: 254 CYC: 28 irq
.C:103a EE 20 D0 INC $D020 A:0f X:00 Y:0a SP:f0 ..-..I.. 14011103 RL: 254 CYC: 30
.C:103d EE 21 D0 INC $D021 A:0f X:00 Y:0a SP:f0 N.-..I.. 14011109 RL: 254 CYC: 36
.C:1040 EE 19 D0 INC $D019 A:0f X:00 Y:0a SP:f0 N.-..I.. 14011115 RL: 254 CYC: 42
.C:1043 40 RTI A:0f X:00 Y:0a SP:f0 N.-..I.. 14011121 RL: 254 CYC: 48 endIrq
.C:1047 4C 35 10 JMP $1035 A:0f X:00 Y:0a SP:f3 ..-..... 14011127 RL: 254 CYC: 54
.C:1035 4C 35 10 JMP $1035 A:0f X:00 Y:0a SP:f3 ..-..... 112369686 RL: 253 CYC: 52 break
.C:1035 4C 35 10 JMP $1035 A:0f X:00 Y:0a SP:f3 ..-..... 112369689 RL: 253 CYC: 55 break
.C:1035 4C 35 10 JMP $1035 A:0f X:00 Y:0a SP:f3 ..-..... 112369692 RL: 253 CYC: 58 break
.C:1035 4C 35 10 JMP $1035 A:0f X:00 Y:0a SP:f3 ..-..... 112369695 RL: 253 CYC: 61 break
.C:1044 EE 4A 10 INC $104A A:0f X:00 Y:0a SP:f3 ..-..... 112369698 RL: 254 CYC: 1 otherCode
BUG .C:1044 EE 4A 10 INC $104A A:0f X:00 Y:0a SP:f3 ..-..... 112369711 RL: 254 CYC: 14 otherCode
.C:1038 49 00 EOR #$00 A:0f X:00 Y:0a SP:f0 ..-..I.. 112369724 RL: 254 CYC: 27 irq
.C:103a EE 20 D0 INC $D020 A:0f X:00 Y:0a SP:f0 ..-..I.. 112369726 RL: 254 CYC: 29
.C:103d EE 21 D0 INC $D021 A:0f X:00 Y:0a SP:f0 N.-..I.. 112369732 RL: 254 CYC: 35
.C:1040 EE 19 D0 INC $D019 A:0f X:00 Y:0a SP:f0 N.-..I.. 112369738 RL: 254 CYC: 41
.C:1043 40 RTI A:0f X:00 Y:0a SP:f0 N.-..I.. 112369744 RL: 254 CYC: 47 endIrq
.C:1047 4C 35 10 JMP $1035 A:0f X:00 Y:0a SP:f3 ..-..... 112369750 RL: 254 CYC: 53
.C:1035 4C 35 10 JMP $1035 A:0f X:00 Y:0a SP:f3 ..-..... 431701064 RL: 253 CYC: 54 break
.C:1035 4C 35 10 JMP $1035 A:0f X:00 Y:0a SP:f3 ..-..... 431701067 RL: 253 CYC: 57 break
.C:1035 4C 35 10 JMP $1035 A:0f X:00 Y:0a SP:f3 ..-..... 431701070 RL: 253 CYC: 60 break
.C:1035 4C 35 10 JMP $1035 A:0f X:00 Y:0a SP:f3 ..-..... 431701073 RL: 253 CYC: 63 break
BUG .C:1044 EE 4A 10 INC $104A A:0f X:00 Y:0a SP:f0 ..-..I.. 431701083 RL: 254 CYC: 10 otherCode
BUG .C:1044 EE 4A 10 INC $104A A:0f X:00 Y:0a SP:f0 ..-..I.. 431701089 RL: 254 CYC: 16 otherCode
BUG .C:1047 4C 35 10 JMP $1035 A:0f X:00 Y:0a SP:f0 ..-..I.. 431701095 RL: 254 CYC: 22
(Note: The CPU-history output has been decorated with the labels from the assembly code, as well as which rasterline ("RL") and cycle-of-rasterline ("CYC"). (RL and CYC are calculated from the stopwatch counter))
Update: New observation: Hmm.. Actually for case 1 and 2, the cycle counts in the CPU-history are strange.. For case 2, the first instance of "INC $104A" starts at cycle 1, but the second starts at cycle 14, instead of at cycle 7. And the first interrupt-handler instruction starts at cycle 27.
But an INC $XXXX only takes 6 cycles, not 13 cycles. It is as if the interrupt-BRK is executed after both of the "INC $104A"-instances.. this would fit with the 13 cycles. (perhaps there are also 6 bytes pushed to the stack, instead of 3? Haven't checked yet)
This could be seen as a fourth problem, but if problem 3 is solved, i'ld imagine this one is probably solved too.
I've reproduced all three of these problems (or "opportunities" as some might call them) in Vice 3.8 and 3.7. I've reproduced the last one in Vice 3.2 (the monitor of Vice 3.2 doesn't seem to support breaking at a rasterline, so I didn't test the first two issues in 3.2). I have not tried other versions.
Sorry it became long - hope you'll manage. Let me know if i can help with anything or if you need more info ;)
P.S. Wouldn't it be cool if the CPU-history wrote a line with for example "IRQ" or "NMI" pseudo-instructions for the injected BRK, so you could more easily see in the history when an interrupt started?
Update: Problem 3 doesn't actually seem to always happen. I just tried a "g XXXX" where it did not happen. Then i tried the exact same "g XXXX" again, and it did happen. So not sure about the conditions.
Ok, after reading all this....
That said, this stuff looks a lot like we have to be very careful about what we are looking at, and what conclusions we draw from it. The cpu history feature might fool you (it doesn't appear to always show 100% what we expect, so its likely buggy for itself - see other tickets), so we should ignore this for the time being and try to reproduce the problem with as little monitor interaction as possible (really only a single "g" command would be ideal). That also means to not use the remote monitor - as that might produce it's own dedicated bugs :o)
For a start, i checked what the "g" command actually does - and it really only sets the PC to a new value, as expected. There is no "wait until IRQ is over" or whatever magic. So chances are, it is actually working right :) (see monitor.c line 980)
Thank you for your reply, gpz! I unfortunately do not have much time the next few days, so just a quick answer for now.
I could split this ticket up, however, the first two "problems" are basically the same: Break exactly before an interrupt would start, and do a "g XXXX". What happens in that case can vary between "problem 1" and "problem 2" (and perhaps something else i have not yet encountered).
Since i do not really know what exact condition causes VICE to behave as "problem 1" or as "problem 2", i am not really sure how to split it up into two independent tickets.
I could of course make a new ticket for "problem 3"
No probs, if you like that better ;)
You're completely right - I meant $D011, not $D012, of course. True, it's safer to wait for both plus and minus. This brainfart luckily doesn't seem to affect the outcome :)
Anyway, it was just a quickly made test, to make you able to reproduce it with minimal work (load the PRG, press ALT+H, copy/paste the monitor-commands, done).
Yeah, ok. Shouldn't it give the same result if IO is mapped in at $D000-$DFFF, though? (Anyway, it doesn't matter to the test)
Good point in being careful in drawing conclusions. It's true i might have generalized a bit too much in some of the sentences above, but it is what seems to be happening :) Anyway, i don't pretend to have any idea about what's going on under the hood :)
Yes, using a single "g XXXX", would be interesting, if it's possible. I can try looking at it when i get the time. (It might very well be something with breakpoints that causes the issues)
Ah, ok, good to know there are some problems with the CPU-history. It could of course be a problem with that one. However, i don't think it's only that, since as mentioned for "problem 3", it did seem like the "INC bugDetect" (INC $104A), did actually get executed twice (like in the CPU-history), since the value of "bugDetect" had been increased twice afterwards.
For "problem 1", it also seems like "INC bugDetect" was executed before the interrupt, like the CPU-history says, since if you break inside the interrupt-handler, the value of "bugDetect" has already been increased.
For "problem 2", after having executed the monitor commands i posted above, and you bump into "problem 2", if you try to just continue running the program ("g" or just exit the monitor), you will see that the border+screen is not flashing, indicating that the IRQ-interrupt no longer is running - so it does seem like the CPU-history could at least be somewhat correct here (something has stopped the IRQs).
Are you able to reproduce it, btw? (Who knows, it might be my installation that has a problem)
Thanks for looking into the code for what "g XXXX" does, and thank you for the reference to the source code, btw! :)
Last edit: Algorithmix 2024-04-29
Hi Groepaz
A few questions on how it would be most convenient for you that i split the tickets:
I can create 3 new tickets where the 2 of them refer to each other. (and maybe also refers back to this ticket as parent-ticket, which can then be closed). Ok?
The current test program currently requires you to run it a few times to get the right "jitter", if you want to catch a specific outcome (3 possibilities).
After having split up into individual tickets, is that still ok?
Or do you want me to do new ones that always hit the right jitter corresponding to that ticket? (will be longer code)
Minimal text in each ticket, i assume?
(I can of course do additional sneaky test later and put them in the comments. Like the one you wrote about and maybe one with instructions longer than 3 cycles, etc, etc)
sure. the whole point is to have less text per issue and hopefully only one "thing" to fix :)
It would be a lot better if the program reproduces the exact problem every single time - that leaves no room for misunderstandings. You could steal "stable raster" stuff from some other test program in the repo (und put that code into a separate file).
Please hold the line, don't bother splitting the ticket yet - we might have located something that explains all this, see #2025
Ah, ok, thank you - that sounds great! I will stop for now.
(Just skimmed #2025 - on the surface of it, it sounds to only be the instruction-duplication bug ("problem #3" above), but we'll see. Otherwise, i can post the new test-code and text i've done :))
this is the output i get after applying the patch from #2025
and it looks pretty much the same every time i try.
Last edit: Querino 2024-05-03
Thank you very much for running it on the build from #2525! Really appreciate it! :)
It would be very interesting to get a CPU-history for all the 3 possibilities, to be sure the interrupt-problems are really fixed by the patch from #2025.
I unfortunately don't have the skills to build it myself, so can't really check this myself. :'(
Can you perhaps help again?
I'll just add an extra "r" command to the monitor-commands, so we can see which cycle/rasterline we have breaked at and thereby identify which case we've hit (yes, yes, i should have added that to the original):
The 3 possibilities can now be identified by the "CYC"-output of the "r" command:
Could i ask you to run it again with these new montor commands, until you get all 3 cases? (normaly only takes a few tries)
You don't have to, of course, but it would be cool :)
Last edit: Algorithmix 2024-05-05
i ran it 20 times now. just the monitor commands, i have mo other tools.
Thanks!! That was a lot of times - much more than i expected of you :)
Only 2 of the cases was hit, for some reason, but that's fine (when i've tried, i usually hit them in very few tries, but i guess that's how randomness works :))
Unfortunately, it looks like the patch didn't fix the interrupt-problem ("problem#1" at least), since the "INC $104A" is still executed before the interrupt-BRK, when we break at (=after) CYC 001.
Thanks again!
EDIT: But i think i'll have to re-read up on my interrupt theory to be sure that's actually not how it's supposed to work. I could have been wrong here. (now that it's not delayed as much as before)
Last edit: Algorithmix 2024-05-05
i have comitted a more sane version of that patch in r45151 - please test, not only if the problem is solved, but also if everything related to R and G commands, and entering/exiting the monitor, still works as before :)
Very cool! That was much faster than i expected! You're a genious! :)
It looks like the instruction-duplication is solved, and looks like "problem 1" wasn't actually a real problem (see below).
I will run a lot of my old tests through the remote-monitor soon, to see they still behave the same as before (i use "r" and "g" in those all the time).
Seems like "Problem 2" is still a thing, so i can do an issue on it, as soon as possible.
I was terribly, terribly wrong about problem#1. I was mistakenly convinced that the VIC was timing raster-interrupts such that the 6510 could detect them at cycle 1, but it looks like it does not, and the 6510 can not detect a raster-interrupt until cycle 2 (and therefore does not start the interrupt-BRK until cycle 3 at the earliest). My bad, sorry for the confusion. While it was true that the interrupt became impossibly delayed, that was only because of problem 3 (and 4).
I guess the lesson here is that it's a good idea to wait a some time after a debug session until you write a bug-report :)
Hi @gpz. Did some testing and I think i've found a problem in r45151:
and add a breakpoint to the same address we jump to:
chis 5, then in r45151 we get this:but shouldn't it have breaked immediately without having executed anything?
In the old 3.8:
The old 3.8 from december (without any r-number in the about dialog), has the instruction-duplication bug, but if we work around it, by exiting and reentering the monitor between
break exec c000andg c000, we get this result:So that one actually does execute nothing. (if we don't work around it we get a single
inc $d020)UPDATE: If we do the same workaround in r45151 it actually also executes nothing.
Last edit: Algorithmix 2024-05-05
hm.
this doesn't break at all here in r45151 ? only when using the "workaround".
Yes, you're right - also here. (sorry, i made a writing-mistake initially and wrote "jmp $c003", instead of "jmp $c000", if you were trying to reproduce what i wrote :))
Last edit: Algorithmix 2024-05-05
ah. haha. now i know why i didn't get what the code actually was supposed to do. :)
at least i can confirm now your observations.
please keep testing (hopefully this is the only new problem). Also please test if it behaves differently depending on how the monitor was entered (via a watch/breakpoint, or by UI). eg setting a new breakpoint after another breakpoint triggered and the monitor popped up....
I have already looked briefly and i can see a potential problem with this - but i'd really like to know more details before attempting to fix it :)
edit: in particular please check if "breakpoint at current address" is the only misbehaving one
edit++: we apparently have to be extremely careful with the order of events, and how the test starts... i am playing around with it a bit, and it seems that some slightly different cases work when others do not... really tricky :)
Last edit: gpz 2024-05-06
well, i wish i could help more, but i don't really have the knowledge.
but i see, this one does not break at all in r45151
whereas this one (note the different breakpoint)
indeed breaks properly. as far as i can tell:
Last edit: Querino 2024-05-06
btw, the example qbove:
which does not break. but entering the monitor again and throw another
it DOES break. next
g $c000won't break, anotherg $c000will break again... and so on.Last edit: Querino 2024-05-06
Interesting that it alternates like that. (some flag/state that is used for whether or not we should break immediately before the first instruction if there is a bp there?)
Haven't found anything new. Just some minor points from a bit of experimenting:
Looks like entering the monitor from a breakpoint also triggers it:
So both creating a new breakpoint and jumping to it, and entering the monitor through a breakpoint, seems to consistently reproduce it
Creating some random breakpoint in the "current monitor-session", that we don't jump to, does not seem to cause the problem.
Deleting a breakpoint in the "current monitor-session" does not seem to cause the problem. (Couldn't make enabling/disabling breakpoints do it either).
Creating a new breakpoint in the "current monitor-session", at the same address of the already existing one, does not seem to cause the bug: (maybe because the old one kicks in?)
Can also make "g" or "x" bug. For example:
(we should have stopped at $c003 after "g" (or "x"))
Using "r pc = xxxx, x" instead of "g xxxx" can also cause it
Last edit: Algorithmix 2024-05-06
any news here? :)
i wonder if the current state still is better than what we have before?
we really need to create regression tests (in the form of monitor scripts) for this (for all the different issues mentioned here), and then see what still needs fixing. (and fix the new issues while at it). And i certainly need help with this too :)
So, as a first step to get these things under control i knocked up the beginning of a test bench for the monitor: https://sourceforge.net/p/vice-emu/code/HEAD/tree/testprogs/Monitor/testbench/
Refer to the included readme for details, how to add tests etc. Obviously more tests would be good, so feel free to provide some :)
As for this ticket:
I've got a theory.... it must be bunnies!
(oops, wrong tv series)
So I set a breakpoint in the monitor, and I found that it is called from this part in the macro
DO_INTERRUPT:and the checking for steps, breakpoints and watchpoints happens lower in the same macro at
So my theory was that when you set a breakpoint while in the monitor, this macro is working with the old flags value (
ikis a local copy), so the breakpoint fails to work. (This is probably only true when the breakpoint was the first breakpoint)So I added a few lines to refresh the local copy
and retested. Well not everything, but at least the goonbreak test now actually breaks right away.
However there are differences in SP and flag values. I can't say if those are okay or not.
In other tests there are now at most differences like that as well.
Anyway I made the changes a bit nicer, and applied them to all versions of
DO_INTERRUPT.They are in essence:
IK_*flags from the global version, so we get any flags that have been set additionallyDO_INTERRUPTwith the global flags, and eliminate the local copy that was made in most places.From the above fragments it seems that the same sort of issue was already found and fixed for
IK_RESETin the past. I am generalizing and simplifying the solution. In theory it might be possible that a simple refresh from the global flags isn't 100% perfect. (Maybe we should|=them into the local copy? But I don't expect that.) That's one reason why I'm posting this diff for review.