VICE / Bugs / #2024 3 monitor issues wrt. the "g XXXX"-command and interrupts (can "crash" the C64-code)

Algorithmix - 2024-04-28

Update: Problem 3 doesn't actually seem to always happen. I just tried a "g XXXX" where it did not happen. Then i tried the exact same "g XXXX" again, and it did happen. So not sure about the conditions.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

gpz - 2024-04-28

Ok, after reading all this....

please split this into 3 tickets, one for each issue. It will become super messy to deal with one or the other thing when those 3 problems are mixed

please don't hide code in assembler specific macros (preferably don't use macros at all) - that makes reading and understanding the code much easier

bit $d012; bpl -3 does not actually wait for line $100, but for line $080 - which might not be what you want. to make sure no badlines are interfering, better use something like bit $d011; bpl -3; bit $d011; bmi *-3 - which will always wait for line $00

do not use the "m" command to look at $d019, use the "io" command instead

That said, this stuff looks a lot like we have to be very careful about what we are looking at, and what conclusions we draw from it. The cpu history feature might fool you (it doesn't appear to always show 100% what we expect, so its likely buggy for itself - see other tickets), so we should ignore this for the time being and try to reproduce the problem with as little monitor interaction as possible (really only a single "g" command would be ideal). That also means to not use the remote monitor - as that might produce it's own dedicated bugs :o)

For a start, i checked what the "g" command actually does - and it really only sets the PC to a new value, as expected. There is no "wait until IRQ is over" or whatever magic. So chances are, it is actually working right :) (see monitor.c line 980)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Algorithmix - 2024-04-28

Thank you for your reply, gpz! I unfortunately do not have much time the next few days, so just a quick answer for now.

I could split this ticket up, however, the first two "problems" are basically the same: Break exactly before an interrupt would start, and do a "g XXXX". What happens in that case can vary between "problem 1" and "problem 2" (and perhaps something else i have not yet encountered).

Since i do not really know what exact condition causes VICE to behave as "problem 1" or as "problem 2", i am not really sure how to split it up into two independent tickets.
I could of course make a new ticket for "problem 3"

No probs, if you like that better ;)

You're completely right - I meant $D011, not $D012, of course. True, it's safer to wait for both plus and minus. This brainfart luckily doesn't seem to affect the outcome :)
Anyway, it was just a quickly made test, to make you able to reproduce it with minimal work (load the PRG, press ALT+H, copy/paste the monitor-commands, done).

Yeah, ok. Shouldn't it give the same result if IO is mapped in at $D000-$DFFF, though? (Anyway, it doesn't matter to the test)

Good point in being careful in drawing conclusions. It's true i might have generalized a bit too much in some of the sentences above, but it is what seems to be happening :) Anyway, i don't pretend to have any idea about what's going on under the hood :)

Yes, using a single "g XXXX", would be interesting, if it's possible. I can try looking at it when i get the time. (It might very well be something with breakpoints that causes the issues)

Ah, ok, good to know there are some problems with the CPU-history. It could of course be a problem with that one. However, i don't think it's only that, since as mentioned for "problem 3", it did seem like the "INC bugDetect" (INC $104A), did actually get executed twice (like in the CPU-history), since the value of "bugDetect" had been increased twice afterwards.
For "problem 1", it also seems like "INC bugDetect" was executed before the interrupt, like the CPU-history says, since if you break inside the interrupt-handler, the value of "bugDetect" has already been increased.
For "problem 2", after having executed the monitor commands i posted above, and you bump into "problem 2", if you try to just continue running the program ("g" or just exit the monitor), you will see that the border+screen is not flashing, indicating that the IRQ-interrupt no longer is running - so it does seem like the CPU-history could at least be somewhat correct here (something has stopped the IRQs).

Are you able to reproduce it, btw? (Who knows, it might be my installation that has a problem)

Thanks for looking into the code for what "g XXXX" does, and thank you for the reference to the source code, btw! :)

Last edit: Algorithmix 2024-04-29
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Algorithmix - 2024-05-01

Hi Groepaz

A few questions on how it would be most convenient for you that i split the tickets:

I can create 3 new tickets where the 2 of them refer to each other. (and maybe also refers back to this ticket as parent-ticket, which can then be closed). Ok?

The current test program currently requires you to run it a few times to get the right "jitter", if you want to catch a specific outcome (3 possibilities).
After having split up into individual tickets, is that still ok?
Or do you want me to do new ones that always hit the right jitter corresponding to that ticket? (will be longer code)

Minimal text in each ticket, i assume?

(I can of course do additional sneaky test later and put them in the comments. Like the one you wrote about and maybe one with instructions longer than 3 cycles, etc, etc)

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

gpz - 2024-05-01

I can create 3 new tickets where the 2 of them refer to each other. (and maybe also refers back to this ticket as parent-ticket, which can then be closed). Ok?

sure. the whole point is to have less text per issue and hopefully only one "thing" to fix :)

The current test program currently requires you to run it a few times to get the right "jitter", if you want to catch a specific outcome (3 possibilities).
After having split up into individual tickets, is that still ok?
Or do you want me to do new ones that always hit the right jitter corresponding to that ticket? (will be longer code)

It would be a lot better if the program reproduces the exact problem every single time - that leaves no room for misunderstandings. You could steal "stable raster" stuff from some other test program in the repo (und put that code into a separate file).

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

gpz - 2024-05-02

Please hold the line, don't bother splitting the ticket yet - we might have located something that explains all this, see #2025

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Algorithmix - 2024-05-02

Ah, ok, thank you - that sounds great! I will stop for now.

(Just skimmed #2025 - on the surface of it, it sounds to only be the instruction-duplication bug ("problem #3" above), but we'll see. Otherwise, i can post the new test-code and text i've done :))

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

this is the output i get after applying the patch from #2025

(C:$e5cf) break exec 1035
BREAK: 1  C:$1035  (Stop on exec)
(C:$e5cf) 
#1 (Stop on  exec 1035)  128/$080,  57/$39
.C:1035  4C 35 10    JMP $1035      - A:0F X:01 Y:16 SP:f2 ..-.....   18366825
(C:$1035) delete 1
(C:$1035) break exec 0000 FFFF if RL == $fe
BREAK: 1  C:$0000-$ffff  (Stop on exec)
Setting checkpoint 1 condition to: RL == $fe
(C:$1035) 
#1 (Stop on  exec 1035)  254/$0fe,   0/$00
.C:1035  4C 35 10    JMP $1035      - A:0F X:01 Y:16 SP:f2 ..-.....   18374706
(C:$1035) delete 1
(C:$1035) m D019 D019
>C:d019  f1                                                   .
(C:$d01a) break exec 1035
BREAK: 1  C:$1035  (Stop on exec)
(C:$d01a) 
#1 (Stop on  exec 1035)  254/$0fe,  42/$2a
.C:1035  4C 35 10    JMP $1035      - A:0F X:01 Y:16 SP:f2 ..-.....   18374748
(C:$1035) delete 1
(C:$1035) chis 20
.C:1035  4C 35 10    JMP $1035      A:0f X:01 Y:16 SP:f2 ..-.....     18374667
.C:1035  4C 35 10    JMP $1035      A:0f X:01 Y:16 SP:f2 ..-.....     18374670
.C:1035  4C 35 10    JMP $1035      A:0f X:01 Y:16 SP:f2 ..-.....     18374673
.C:1035  4C 35 10    JMP $1035      A:0f X:01 Y:16 SP:f2 ..-.....     18374676
.C:1035  4C 35 10    JMP $1035      A:0f X:01 Y:16 SP:f2 ..-.....     18374679
.C:1035  4C 35 10    JMP $1035      A:0f X:01 Y:16 SP:f2 ..-.....     18374682
.C:1035  4C 35 10    JMP $1035      A:0f X:01 Y:16 SP:f2 ..-.....     18374685
.C:1035  4C 35 10    JMP $1035      A:0f X:01 Y:16 SP:f2 ..-.....     18374688
.C:1035  4C 35 10    JMP $1035      A:0f X:01 Y:16 SP:f2 ..-.....     18374691
.C:1035  4C 35 10    JMP $1035      A:0f X:01 Y:16 SP:f2 ..-.....     18374694
.C:1035  4C 35 10    JMP $1035      A:0f X:01 Y:16 SP:f2 ..-.....     18374697
.C:1035  4C 35 10    JMP $1035      A:0f X:01 Y:16 SP:f2 ..-.....     18374700
.C:1035  4C 35 10    JMP $1035      A:0f X:01 Y:16 SP:f2 ..-.....     18374703
.C:1044  EE 4A 10    INC $104A      A:0f X:01 Y:16 SP:f2 ..-.....     18374706
.C:1038  49 00       EOR #$00       A:0f X:01 Y:16 SP:ef ..-..I..     18374719
.C:103a  EE 20 D0    INC $D020      A:0f X:01 Y:16 SP:ef ..-..I..     18374721
.C:103d  EE 21 D0    INC $D021      A:0f X:01 Y:16 SP:ef N.-..I..     18374727
.C:1040  EE 19 D0    INC $D019      A:0f X:01 Y:16 SP:ef N.-..I..     18374733
.C:1043  40          RTI            A:0f X:01 Y:16 SP:ef N.-..I..     18374739
.C:1047  4C 35 10    JMP $1035      A:0f X:01 Y:16 SP:f2 ..-.....     18374745
(C:$1035

and it looks pretty much the same every time i try.

Last edit: Querino 2024-05-03

Algorithmix - 2024-05-05

Thank you very much for running it on the build from #2525! Really appreciate it! :)

It would be very interesting to get a CPU-history for all the 3 possibilities, to be sure the interrupt-problems are really fixed by the patch from #2025.

I unfortunately don't have the skills to build it myself, so can't really check this myself. :'(
Can you perhaps help again?

I'll just add an extra "r" command to the monitor-commands, so we can see which cycle/rasterline we have breaked at and thereby identify which case we've hit (yes, yes, i should have added that to the original):

delete ; runUntil("start", "break") break exec 1035 g 1000 delete 1 ; waitRasterline(0xFE) break exec 0000 FFFF if RL == $fe x delete 1 ; printMemory(0xD019, 0xD019) m D019 D019 ; print which cycle we've breaked after r ; runUntil("otherCode", "break") break exec 1035 g 1044 delete 1 ; printCpuHistory() cpuhistory 20

The 3 possibilities can now be identified by the "CYC"-output of the "r" command:

(Problem#1): CYC: 001

(ok): CYC: 000

(Problem#2): CYC: 009 (This one may have another CYC-number, if the problem is now fixed. If it is fixed, it will probably be CYC = 002 instead of 009. (Looks like the problem here is that the breakpoint breaks too late))

Could i ask you to run it again with these new montor commands, until you get all 3 cases? (normaly only takes a few tries)

You don't have to, of course, but it would be cool :)

Last edit: Algorithmix 2024-05-05
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Querino - 2024-05-05

i ran it 20 times now. just the monitor commands, i have mo other tools.

❤️
1

tests.zip

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Algorithmix - 2024-05-05
  
  Thanks!! That was a lot of times - much more than i expected of you :)
  Only 2 of the cases was hit, for some reason, but that's fine (when i've tried, i usually hit them in very few tries, but i guess that's how randomness works :))
  
  Unfortunately, it looks like the patch didn't fix the interrupt-problem ("problem#1" at least), since the "INC $104A" is still executed before the interrupt-BRK, when we break at (=after) CYC 001.
  
  Thanks again!
  
  EDIT: But i think i'll have to re-read up on my interrupt theory to be sure that's actually not how it's supposed to work. I could have been wrong here. (now that it's not delayed as much as before)
  
  Last edit: Algorithmix 2024-05-05
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

gpz - 2024-05-05

i have comitted a more sane version of that patch in r45151 - please test, not only if the problem is solved, but also if everything related to R and G commands, and entering/exiting the monitor, still works as before :)

❤️
1

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Algorithmix - 2024-05-05
  
  Very cool! That was much faster than i expected! You're a genious! :)
  It looks like the instruction-duplication is solved, and looks like "problem 1" wasn't actually a real problem (see below).
  
  I will run a lot of my old tests through the remote-monitor soon, to see they still behave the same as before (i use "r" and "g" in those all the time).
  
  Seems like "Problem 2" is still a thing, so i can do an issue on it, as soon as possible.
  
  I was terribly, terribly wrong about problem#1. I was mistakenly convinced that the VIC was timing raster-interrupts such that the 6510 could detect them at cycle 1, but it looks like it does not, and the 6510 can not detect a raster-interrupt until cycle 2 (and therefore does not start the interrupt-BRK until cycle 3 at the earliest). My bad, sorry for the confusion. While it was true that the interrupt became impossibly delayed, that was only because of problem 3 (and 4).
  
  I guess the lesson here is that it's a good idea to wait a some time after a debug session until you write a bug-report :)
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Algorithmix - 2024-05-05
  
  Hi @gpz. Did some testing and I think i've found a problem in r45151:
  
  Start a clean VICE (x64sc)
  
  Go to the monitor
  
  Type:
  
  a c000 inc $d020 .c003 jmp $c000 .c006
  
  and add a breakpoint to the same address we jump to:
  
  break exec c000 g c000
  
  If we type chis 5, then in r45151 we get this:
  
  .C:e5d4 F0 F7 BEQ $E5CD A:00 X:00 Y:0a SP:f3 ..-...Z. 5336219 .C:e5cd A5 C6 LDA $C6 A:00 X:00 Y:0a SP:f3 ..-...Z. 5336222 .C:e5cf 85 CC STA $CC A:00 X:00 Y:0a SP:f3 ..-...Z. 5336225 .C:c000 EE 20 D0 INC $D020 A:00 X:00 Y:0a SP:f3 ..-...Z. 5336228 .C:c003 4C 00 C0 JMP $C000 A:00 X:00 Y:0a SP:f3 N.-..... 5336234
  
  but shouldn't it have breaked immediately without having executed anything?
  
  In the old 3.8:
  
  The old 3.8 from december (without any r-number in the about dialog), has the instruction-duplication bug, but if we work around it, by exiting and reentering the monitor between break exec c000and g c000, we get this result:
  
  .C:e5cf 85 CC STA $CC A:00 X:00 Y:0a SP:f3 ..-...Z. 5470106 .C:e5d1 8D 92 02 STA $0292 A:00 X:00 Y:0a SP:f3 ..-...Z. 5470109 .C:e5d4 F0 F7 BEQ $E5CD A:00 X:00 Y:0a SP:f3 ..-...Z. 5470155 .C:e5cd A5 C6 LDA $C6 A:00 X:00 Y:0a SP:f3 ..-...Z. 5470158 .C:e5cf 85 CC STA $CC A:00 X:00 Y:0a SP:f3 ..-...Z. 5470161
  
  So that one actually does execute nothing. (if we don't work around it we get a single inc $d020)
  
  UPDATE: If we do the same workaround in r45151 it actually also executes nothing.
  
  Last edit: Algorithmix 2024-05-05
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Querino - 2024-05-05
    
    hm.
    
    a c000 inc $d020 jmp $c003 break exec $c000 g $c000
    
    this doesn't break at all here in r45151 ? only when using the "workaround".
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Algorithmix - 2024-05-05
      
      Yes, you're right - also here. (sorry, i made a writing-mistake initially and wrote "jmp $c003", instead of "jmp $c000", if you were trying to reproduce what i wrote :))
      
      Last edit: Algorithmix 2024-05-05
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
      - Querino - 2024-05-05
        
        ah. haha. now i know why i didn't get what the code actually was supposed to do. :)
        at least i can confirm now your observations.
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

please keep testing (hopefully this is the only new problem). Also please test if it behaves differently depending on how the monitor was entered (via a watch/breakpoint, or by UI). eg setting a new breakpoint after another breakpoint triggered and the monitor popped up....

I have already looked briefly and i can see a potential problem with this - but i'd really like to know more details before attempting to fix it :)

edit: in particular please check if "breakpoint at current address" is the only misbehaving one

edit++: we apparently have to be extremely careful with the order of events, and how the test starts... i am playing around with it a bit, and it seems that some slightly different cases work when others do not... really tricky :)

Last edit: gpz 2024-05-06

well, i wish i could help more, but i don't really have the knowledge.

but i see, this one does not break at all in r45151

a c000 
nop
nop
nop
nop
nop
rts


break exec $c000
g $c000

whereas this one (note the different breakpoint)

a c000 
nop
nop
nop
nop
nop
rts


break exec $c001
g $c000

indeed breaks properly. as far as i can tell:

(C:$c001) chis 3
.C:e5cf  85 CC       STA $CC        A:00 X:00 Y:0a SP:f3 ..-...Z.      2398026
.C:e5d1  8D 92 02    STA $0292      A:00 X:00 Y:0a SP:f3 ..-...Z.      2398029
.C:c000  EA          NOP            A:00 X:00 Y:0a SP:f3 ..-...Z.      2398033
(C:$c001)

Last edit: Querino 2024-05-06

btw, the example qbove:

a c000 
nop
nop
nop
nop
nop
rts

delete
break exec $c000

g $c000

which does not break. but entering the monitor again and throw another

g $c000

it DOES break. next g $c000 won't break, another g $c000 will break again... and so on.

Last edit: Querino 2024-05-06

Interesting that it alternates like that. (some flag/state that is used for whether or not we should break immediately before the first instruction if there is a bp there?)

Haven't found anything new. Just some minor points from a bit of experimenting:

Looks like entering the monitor from a breakpoint also triggers it:

a c000 
inc $d020
jmp $c000

break exec c000
break exec e5cf
x
g c000
chis 5

.C:e5d1  8D 92 02    STA $0292      A:00 X:00 Y:0a SP:f3 ..-...Z.      4124988
.C:e5d4  F0 F7       BEQ $E5CD      A:00 X:00 Y:0a SP:f3 ..-...Z.      4124992
.C:e5cd  A5 C6       LDA $C6        A:00 X:00 Y:0a SP:f3 ..-...Z.      4124995
.C:c000  EE 20 D0    INC $D020      A:00 X:00 Y:0a SP:f3 ..-...Z.      4124998
.C:c003  4C 00 C0    JMP $C000      A:00 X:00 Y:0a SP:f3 N.-.....      4125004

So both creating a new breakpoint and jumping to it, and entering the monitor through a breakpoint, seems to consistently reproduce it

Creating some random breakpoint in the "current monitor-session", that we don't jump to, does not seem to cause the problem.

a c000 
inc $d020
jmp $c000

break exec c000
x
; press ALT+H
break exec 4000
g c000
chis 5

.C:e5d1  8D 92 02    STA $0292      A:00 X:00 Y:0a SP:f3 ..-...Z.      2924318
.C:e5d4  F0 F7       BEQ $E5CD      A:00 X:00 Y:0a SP:f3 ..-...Z.      2924322
.C:e5cd  A5 C6       LDA $C6        A:00 X:00 Y:0a SP:f3 ..-...Z.      2924325
.C:e5cf  85 CC       STA $CC        A:00 X:00 Y:0a SP:f3 ..-...Z.      2924328
.C:e5d1  8D 92 02    STA $0292      A:00 X:00 Y:0a SP:f3 ..-...Z.      2924331

is fine

Deleting a breakpoint in the "current monitor-session" does not seem to cause the problem. (Couldn't make enabling/disabling breakpoints do it either).

Creating a new breakpoint in the "current monitor-session", at the same address of the already existing one, does not seem to cause the bug: (maybe because the old one kicks in?)

a c000 
inc $d020
jmp $c000

break exec c000
x
; press ALT+H
break exec c000
g c000
chis 5

.C:e5d1  8D 92 02    STA $0292      A:00 X:00 Y:0a SP:f3 ..-...Z.      6084966
.C:e5d4  F0 F7       BEQ $E5CD      A:00 X:00 Y:0a SP:f3 ..-...Z.      6084970
.C:e5cd  A5 C6       LDA $C6        A:00 X:00 Y:0a SP:f3 ..-...Z.      6084973
.C:e5cf  85 CC       STA $CC        A:00 X:00 Y:0a SP:f3 ..-...Z.      6084976
.C:e5d1  8D 92 02    STA $0292      A:00 X:00 Y:0a SP:f3 ..-...Z.      6084979

is fine

Can also make "g" or "x" bug. For example:

a c000 
inc $d020
inc $d021
jmp $c003

break exec c000
break exec c003
g c000
chis 5

.C:e5cf  85 CC       STA $CC        A:00 X:00 Y:0a SP:f3 ..-...Z.      3158115
.C:e5d1  8D 92 02    STA $0292      A:00 X:00 Y:0a SP:f3 ..-...Z.      3158118
.C:e5d4  F0 F7       BEQ $E5CD      A:00 X:00 Y:0a SP:f3 ..-...Z.      3158122
.C:e5cd  A5 C6       LDA $C6        A:00 X:00 Y:0a SP:f3 ..-...Z.      3158125
.C:c000  EE 20 D0    INC $D020      A:00 X:00 Y:0a SP:f3 ..-...Z.      3158128

g   ; or "x"
chis 5

.C:e5d4  F0 F7       BEQ $E5CD      A:00 X:00 Y:0a SP:f3 ..-...Z.      3158122
.C:e5cd  A5 C6       LDA $C6        A:00 X:00 Y:0a SP:f3 ..-...Z.      3158125
.C:c000  EE 20 D0    INC $D020      A:00 X:00 Y:0a SP:f3 ..-...Z.      3158128
.C:c003  EE 21 D0    INC $D021      A:00 X:00 Y:0a SP:f3 N.-.....      3158134
.C:c006  4C 03 C0    JMP $C003      A:00 X:00 Y:0a SP:f3 N.-.....      3158140

(we should have stopped at $c003 after "g" (or "x"))

Using "r pc = xxxx, x" instead of "g xxxx" can also cause it

Last edit: Algorithmix 2024-05-06

Querino - 2024-06-09

any news here? :)

i wonder if the current state still is better than what we have before?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

gpz - 2024-06-09

we really need to create regression tests (in the form of monitor scripts) for this (for all the different issues mentioned here), and then see what still needs fixing. (and fix the new issues while at it). And i certainly need help with this too :)

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

3 monitor issues wrt. the "g XXXX"-command and interrupts (can "crash" the...

Versatile Commodore Emulator

Version

Searches

Help

#2024 3 monitor issues wrt. the "g XXXX"-command and interrupts (can "crash" the C64-code)

Description of the 3 problems:

How to reproduce and a test-program:

THE C64 CODE:

THE MONITOR COMMANDS:

CPU-HISTORY: (THE RESULT)

Which VICE versions?

Discussion

In the old 3.8: