Menu

#2024 3 monitor issues wrt. the "g XXXX"-command and interrupts (can "crash" the C64-code)

v3.x
open
nobody
None
GTK3
Monitor
2024-06-09
2024-04-28
Algorithmix
No

Hi wonderful people! First of all, thanks for all your fantastic work on Vice!

I think i have found three bugs in how the "g XXXX" command works after a break in the monitor (and in the remote-monitor). Hopefully you will agree that these are issues. I will first introduce the issues, then describe them in more detail, and finally show how to reproduce them with a test C64-program and some monitor-commands. Introduction:

  • The first issue: a "g XXXX" command can cause an interrupt to be delayed in a way which would be impossible on a real C64 (or an emulated C64 without the monitor), in case of unlucky timing.
  • Additionally, depending on the exact timing (it looks like), a "g XXXX" command executed on code outside an interrupt, will sometimes wait until the interrupt have started before going to address $XXXX, causing the interrupt routine to be unintentionally taken over by the code at $XXXX.
  • Thirdly (a bit unrelated to the other two, but let's include it anyway), all "g XXXX" commands seems to always execute the instruction at address $XXXX twice (doesn't depend on interrupts or timing - it always happens).

Note that the "g" command, without an address afterwards, does not seem to be affected by any of this. These problems are both in the normal monitor and in the remote-monitor.

These 3 issues means that whenever you're using "g XXXX", the command has a chance to cause a crash/glitch/corruption (depending on the C64-code). They also make it difficult to use the remote-monitor for unit-testing of C64-code.

Some context I found it when doing unit-tests of C64-code using the remote-monitor, which kept failing at arbitrary times (the C64-code crashed by jumping to arbitrary memory, in the timer-based raster-stabilization code of the interrupts). Often it would usually crash relatively consistently at the same place, and if i inserted for example a NOP in some unrelated code (which was executed earlier), it could either survive or crash somewhere else (i now know, this was due to lucky/unlucky jitter and "problem 1" described below). After almost 2 years of (very rarely) debugging this, I decided to get to the bottom of it this time, and as we all always do when our code isn't working, i've now ended up blaming the compiler. I mean, the monitor. ;)

 

Description of the 3 problems:

  • Problem#1: (bug before interrupt: the interrupts gets "impossibly" delayed by the forced execution of the instruction from $XXXX)
     
    If you are unlucky and have break'ed (using a breakpoint) after the last instruction before an interrupt would fire (i.e. just before the injected 7-cycle interrupt-BRK for an IRQ/NMI), and you then use the "g XXXX" command, then VICE will execute an extra instruction before the interrupt starts (i.e. before the IRQ/NMI-BRK), namely the instruction at address $XXXX.
     
    This can for example lead to crash (jump to arbitrary memory) in the interrupt routine, if it uses timer-based raster-stabilization, since the interrupt will be delayed more cycles than in reality would be possible. (In other situations, I can imagine this could cause graphical glitches, for example, due to VIC-register writes being delayed, etc).
     
    My expected behavior:
     
    I would expect "g XXXX" to just set the PC, and not force an extra instruction through before the interrupt (and thereby cause a situation which is impossible under normal circumstances).
    If the CPU had detected an interrupt during the instruction before the break, i would expect the interrupt-sequence (IRQ/NMI BRK) to happen immediately after the "g XXXX" command.
    Since "g XXXX" had set the PC, the interrupt-BRK would push $XXXX (the new PC) to the stack, and not until the interrupt returns with RTI, the code at $XXXX would execute.
     
  • Problem#2: (bug after interrupt: the interrupt-routine gets replaced by the code at $XXXX)
     
    I just noticed another behavior of the "g XXXX" command, while writing this. It basically happens under the same conditions that problem#1 happens under, but which of the two problems happens, seems to be related to the exact cycle-timing of the break and/or of the interrupt occurrence (see the CPU-history section below).
     
    What happens in this case, is that the 7 cycle interrupt-BRK correctly starts just after the "g XXXX" (no extra instruction is forced, which is good), but after the interrupt-BRK has completed, instead of executing the interrupt-routine (from $FFFE/$FFFF or $FFFA/$FFFB), it will run the code at $XXXX during the interrupt. The interrupt-routine has thus been replaced by the code at $XXXX. This will obviously very likely break the C64-program which is running (the real interrupt-routine will be skipped, and since we thought we were outside the interrupt (in "main-code") when doing the "g XXXX", we will probably not CLI or ack in the code at $XXXX, so interrupts will likely effectively be disabled from this point forward).
     
    My expected behavior:
     
    I would expect that if i break outside of an interrupt, and do a "g XXXX", the code at $XXXX would also be executed outside of an interrupt (i would not expect it to take over the interrupt handler).
     
    (More precisely:
    If i break before the 7-cycle IRQ/NMI-BRK, i would expect the the "g XXXX" to also set the PC before the IRQ/NMI-BRK (never after it).
    Similarly, if i break within an interrupt (before its RTI), i would expect the "g XXXX" to also set the PC before the RTI.
    Otherwise, it is hard to avoid running into crashes on unlucky timing between main-code breakpoints and interrupts)
     
    In other words, i would expect "g XXXX" to set the PC immediately and not delaying it till later. (I realize that this most likely is much more complicated :))
     
  • Problem#3 (the first instruction at $XXXX always gets executed twice (unrelated to interrupts)):
     
    It seems like "g XXXX" always executes the first instruction at $XXXX twice (not just in the two cases mentioned above - it always happens). This of course also happens in relation to problem 1 and 2 above. In combination with "problem#1", the interrupt is delayed until both instructions have been executed.
     
    Easy to reproduce without any test-programs: Just go to the monitor, do a "g XXXX" and go back to the monitor again and do a CPU-history => the instruction at $XXXX is there twice.
     
    (Note that it is not just in the CPU-history it looks like it is executed twice, it is actually executed twice. This can be seen in the example-code below, in that the value of the "bugDetect"-byte, will end up becoming 2 instead of 1, since the "INC bugDetect" is executed twice)
     
    My expected behavior:
     
    I would expect the instruction at $XXXX to be executed only once after a "g XXXX"
     

How to reproduce and a test-program:

I have tried to make a simplified testcase reproducing the above errors.
Remember, problems 1 and 2 happens when we break just before the 7-cycle interrupt-BRK, and then do a "g XXXX", so this is what we have to reproduce.

How it works: The idea is to break at the same time a VIC raster-interrupt occurs, as an easy way to see what happens when we break after the last instruction before an interrupt should start up.
More precisely, the C64-code starts a raster-interrupt at rasterline $FE, and the monitor-commands also breaks at rasterline $FE and then executes a "g XXXX".

How to run it: Load the prg into memory (but do not run it) and then execute the monitor-commands. Afterwards, the CPU-history will show the three issues described above. (You will have to run it a few times to get the "correct jitter" for problem#1 and problem#2 - at most one of them will happen on a run)

THE C64 CODE:

This is the C64 assembler code (kickass). It's also attached. I've also attached a PRG file, so you don't have to assemble it. Simply load the PRG into VICE.

.const rasterline = $fe

        * = $1000 "Code"
start:  sei

        lda #$35
        sta $01

        bit $D012 // wait for rasterline $100, before proceeding
        bpl *-3

        setupIRQRasterInterrupt(rasterline, irq)

break:  jmp *

irq:    eor #0      // just to do something to make the interrupt-code easy to spot in the CPU-history
        inc $d020
        inc $d021
        inc $d019
endIrq: rti

otherCode:         // This is where we "g XXXX" to
        inc bugDetect
        jmp break

bugDetect: .byte 0 // Should stay 0 until AFTER the first interrupt. If it is 1 inside the first interrupt, we're doomed. (It actually becomes 2)
The rest of the C64 assembler code (the setupIRQRasterInterrupt-macro)
// Setup a raster-interrupt. Lengthy boilerplate code, which is not relevant for the test, so it's hidden down here to reduce noise. Focus on the code above.
.macro setupIRQRasterInterrupt(rasterline, interruptHandler) {
    sei
    lda #<interruptHandler
    sta $fffe
    lda #>interruptHandler
    sta $ffff
    // select IRQ events (VIC: raster, CIA1: none)
    lda #%00000001 // VIC:  raster-event
    sta $d01a
    lda #%00011111 // CIA1: none
    sta $dc0d
    // select rasterline in for the raster-event ($D012[w] and bit 7 of $D011[w])
    .if (rasterline != null) {
        lda #<rasterline
        sta $d012
        lda $d011
        .if (rasterline >= $100) {ora #%10000000} else {and #%01111111}
        sta $d011
    }
    // ack any pending VIC and CIA1 events just in case
    bit $dc0d
    lda #%00001111
    sta $d019
    //.if (autoEnable) {cli}
    cli
}

 

THE MONITOR COMMANDS:

In my test-code on the PC-side, i send monitor commands by calling some methods of a homemade Vice-class, which hopefully serves as a high-level description to you of what the monitor commands below do: (Each of these methods basically justs to send some commands to the remote-monitor)

@Test
void test() throws Exception {
    vice.loadAss("viceBreakBeforeInterruptBug.ass")
        .runUntil("start", "break")
        .waitRasterline(0xFE)
        .printMemory(0xD019, 0xD019) // Just to be 100% sure that bit 7 of $D019 is actually 1 here (= the VIC wants to generate an interrupt). (Since the monitor can break between cycle 63 and cycle 1, it might actually not be - TODO: Hmm... But it is always! Maybe i misunderstand something about where exactly breakpoints breaks between two cycles??)
        .runUntil("otherCode", "break")
        .then(() -> vice.printCpuHistory());
        // assertions removed
}

These are the monitor commands corresponding to the above code: (you should execute this manually in the monitor after having loaded the PRG. Alt+H and copy/paste is fine)

; loadAss("viceBreakBeforeInterruptBug.ass"). You can also just drag'n'drop the PRG into vice and skip this. So i have commented the loading out for now.
delete
;load "C:\...<insert correct path here>...\viceBreakBeforeInterruptBug.prg" 0

; runUntil("start", "break")
break exec 1035
g 1000
delete 1

; waitRasterline(0xFE)
break exec 0000 FFFF if RL == $fe
x
delete 1

; printMemory(0xD019, 0xD019)
m D019 D019

; runUntil("otherCode", "break")
break exec 1035
g 1044
delete 1

; printCpuHistory()
cpuhistory

CPU-HISTORY: (THE RESULT)

In all the cases the raster-interrupt occurs at cycle 1 (as raster-interrupts always do) of rasterline $FE, meaning that the interrupt-sequence should start as soon as the instruction executing at cycle 1 is done (unless perhaps if it was a taken branch not crossing page-boundaries - but it is not).

In the test-code, this instruction is always a JMP. Vice also breaks at rasterline $FE. It does this either after the instruction that executed at cycle 1 is done, or between cycle 63 and 1, if the previous instruction ended at cycle 63. (as far as i can tell)

Depending on which cycle of the JMP is executing at cycle 1 of rasterline $FE we get the following cases:

  1. When the last JMP starts at cycle 62 (problem#1): In this case the interrupt is "impossibly delayed" since the instruction at $XXXX is forced through after the JMP and before the interrupt. But since the last cycle of the JMP is at cycle 1, the interrupt-sequence should have started at cycle 2. Instead it is pushed to start at cycle 28, which should be impossible.
    case1image
  2. When the last JMP starts at cycle 61 (ok) : In this case it is actually fine that the instruction at "otherCode" happens before the interrupt, since the previous JMP ended at cycle 63 at the previous raster-line. (If we hadn't done "g XXXX" one more JMP would have been executed before the interrupt). (Well, it's ok, except of course for the fact that the instruction is forced through twice)
    case2image
  3. When the last JMP starts at cycle 63 (problem#2): In this case the interrupt correctly starts after the JMP. However, after the interrupt-sequence, instead of jumping to the irq-handler, it jumps to address $XXXX!
    case3image

In all the above cases, it can also be seen that the instruction we jumped to with "g XXXX" in executed twice (problem#3). (In the test-code, you can also see this in that "bugDetect" becomes 2 instead of 1, after the INC at "otherCode")

Here is the CPU-history for the 3 different possible runs/cases as text. I've only included the last few lines of the CPU-histories here, since only those are the interesting ones (where the issues happen). I've attached the full CPU-histories.

Case 1: (last JMP at cycle 62: problem 1 + problem 3)
    .C:1035  4C 35 10    JMP $1035      A:0f X:00 Y:0a SP:f3 ..-.....     14011063           RL: 253  CYC:  53        break
    .C:1035  4C 35 10    JMP $1035      A:0f X:00 Y:0a SP:f3 ..-.....     14011066           RL: 253  CYC:  56        break
    .C:1035  4C 35 10    JMP $1035      A:0f X:00 Y:0a SP:f3 ..-.....     14011069           RL: 253  CYC:  59        break
    .C:1035  4C 35 10    JMP $1035      A:0f X:00 Y:0a SP:f3 ..-.....     14011072           RL: 253  CYC:  62        break
BUG .C:1044  EE 4A 10    INC $104A      A:0f X:00 Y:0a SP:f3 ..-.....     14011075           RL: 254  CYC:   2        otherCode
BUG .C:1044  EE 4A 10    INC $104A      A:0f X:00 Y:0a SP:f3 ..-.....     14011088           RL: 254  CYC:  15        otherCode
    .C:1038  49 00       EOR #$00       A:0f X:00 Y:0a SP:f0 ..-..I..     14011101           RL: 254  CYC:  28        irq
    .C:103a  EE 20 D0    INC $D020      A:0f X:00 Y:0a SP:f0 ..-..I..     14011103           RL: 254  CYC:  30        
    .C:103d  EE 21 D0    INC $D021      A:0f X:00 Y:0a SP:f0 N.-..I..     14011109           RL: 254  CYC:  36        
    .C:1040  EE 19 D0    INC $D019      A:0f X:00 Y:0a SP:f0 N.-..I..     14011115           RL: 254  CYC:  42        
    .C:1043  40          RTI            A:0f X:00 Y:0a SP:f0 N.-..I..     14011121           RL: 254  CYC:  48        endIrq
    .C:1047  4C 35 10    JMP $1035      A:0f X:00 Y:0a SP:f3 ..-.....     14011127           RL: 254  CYC:  54
Case 2: (last JMP at cycle 61: ok + problem 3)
    .C:1035  4C 35 10    JMP $1035      A:0f X:00 Y:0a SP:f3 ..-.....    112369686           RL: 253  CYC:  52        break
    .C:1035  4C 35 10    JMP $1035      A:0f X:00 Y:0a SP:f3 ..-.....    112369689           RL: 253  CYC:  55        break
    .C:1035  4C 35 10    JMP $1035      A:0f X:00 Y:0a SP:f3 ..-.....    112369692           RL: 253  CYC:  58        break
    .C:1035  4C 35 10    JMP $1035      A:0f X:00 Y:0a SP:f3 ..-.....    112369695           RL: 253  CYC:  61        break
    .C:1044  EE 4A 10    INC $104A      A:0f X:00 Y:0a SP:f3 ..-.....    112369698           RL: 254  CYC:   1        otherCode
BUG .C:1044  EE 4A 10    INC $104A      A:0f X:00 Y:0a SP:f3 ..-.....    112369711           RL: 254  CYC:  14        otherCode
    .C:1038  49 00       EOR #$00       A:0f X:00 Y:0a SP:f0 ..-..I..    112369724           RL: 254  CYC:  27        irq
    .C:103a  EE 20 D0    INC $D020      A:0f X:00 Y:0a SP:f0 ..-..I..    112369726           RL: 254  CYC:  29        
    .C:103d  EE 21 D0    INC $D021      A:0f X:00 Y:0a SP:f0 N.-..I..    112369732           RL: 254  CYC:  35        
    .C:1040  EE 19 D0    INC $D019      A:0f X:00 Y:0a SP:f0 N.-..I..    112369738           RL: 254  CYC:  41        
    .C:1043  40          RTI            A:0f X:00 Y:0a SP:f0 N.-..I..    112369744           RL: 254  CYC:  47        endIrq
    .C:1047  4C 35 10    JMP $1035      A:0f X:00 Y:0a SP:f3 ..-.....    112369750           RL: 254  CYC:  53  
Case 3: (last JMP at cycle 63: problem 2 + problem 3)
    .C:1035  4C 35 10    JMP $1035      A:0f X:00 Y:0a SP:f3 ..-.....    431701064           RL: 253  CYC:  54        break
    .C:1035  4C 35 10    JMP $1035      A:0f X:00 Y:0a SP:f3 ..-.....    431701067           RL: 253  CYC:  57        break
    .C:1035  4C 35 10    JMP $1035      A:0f X:00 Y:0a SP:f3 ..-.....    431701070           RL: 253  CYC:  60        break
    .C:1035  4C 35 10    JMP $1035      A:0f X:00 Y:0a SP:f3 ..-.....    431701073           RL: 253  CYC:  63        break
BUG .C:1044  EE 4A 10    INC $104A      A:0f X:00 Y:0a SP:f0 ..-..I..    431701083           RL: 254  CYC:  10        otherCode
BUG .C:1044  EE 4A 10    INC $104A      A:0f X:00 Y:0a SP:f0 ..-..I..    431701089           RL: 254  CYC:  16        otherCode
BUG .C:1047  4C 35 10    JMP $1035      A:0f X:00 Y:0a SP:f0 ..-..I..    431701095           RL: 254  CYC:  22 

 

(Note: The CPU-history output has been decorated with the labels from the assembly code, as well as which rasterline ("RL") and cycle-of-rasterline ("CYC"). (RL and CYC are calculated from the stopwatch counter))

 

Update: New observation: Hmm.. Actually for case 1 and 2, the cycle counts in the CPU-history are strange.. For case 2, the first instance of "INC $104A" starts at cycle 1, but the second starts at cycle 14, instead of at cycle 7. And the first interrupt-handler instruction starts at cycle 27.
But an INC $XXXX only takes 6 cycles, not 13 cycles. It is as if the interrupt-BRK is executed after both of the "INC $104A"-instances.. this would fit with the 13 cycles. (perhaps there are also 6 bytes pushed to the stack, instead of 3? Haven't checked yet)
This could be seen as a fourth problem, but if problem 3 is solved, i'ld imagine this one is probably solved too.

 

Which VICE versions?

I've reproduced all three of these problems (or "opportunities" as some might call them) in Vice 3.8 and 3.7. I've reproduced the last one in Vice 3.2 (the monitor of Vice 3.2 doesn't seem to support breaking at a rasterline, so I didn't test the first two issues in 3.2). I have not tried other versions.

  • VICE 3.8 version: "GTK3 3.24.39, GLib 2.78.3, Cairo 1.18.0, Pango 1.50.14"
  • OS: Windows 10

Sorry it became long - hope you'll manage. Let me know if i can help with anything or if you need more info ;)

P.S. Wouldn't it be cool if the CPU-history wrote a line with for example "IRQ" or "NMI" pseudo-instructions for the injected BRK, so you could more easily see in the history when an interrupt started?

10 Attachments

Discussion

  • Algorithmix

    Algorithmix - 2024-04-28

    Update: Problem 3 doesn't actually seem to always happen. I just tried a "g XXXX" where it did not happen. Then i tried the exact same "g XXXX" again, and it did happen. So not sure about the conditions.

     
  • gpz

    gpz - 2024-04-28

    Ok, after reading all this....

    • please split this into 3 tickets, one for each issue. It will become super messy to deal with one or the other thing when those 3 problems are mixed
    • please don't hide code in assembler specific macros (preferably don't use macros at all) - that makes reading and understanding the code much easier
    • bit $d012; bpl -3 does not actually wait for line $100, but for line $080 - which might not be what you want. to make sure no badlines are interfering, better use something like bit $d011; bpl -3; bit $d011; bmi *-3 - which will always wait for line $00
    • do not use the "m" command to look at $d019, use the "io" command instead

    That said, this stuff looks a lot like we have to be very careful about what we are looking at, and what conclusions we draw from it. The cpu history feature might fool you (it doesn't appear to always show 100% what we expect, so its likely buggy for itself - see other tickets), so we should ignore this for the time being and try to reproduce the problem with as little monitor interaction as possible (really only a single "g" command would be ideal). That also means to not use the remote monitor - as that might produce it's own dedicated bugs :o)

    For a start, i checked what the "g" command actually does - and it really only sets the PC to a new value, as expected. There is no "wait until IRQ is over" or whatever magic. So chances are, it is actually working right :) (see monitor.c line 980)

     
  • Algorithmix

    Algorithmix - 2024-04-28

    Thank you for your reply, gpz! I unfortunately do not have much time the next few days, so just a quick answer for now.

    • I could split this ticket up, however, the first two "problems" are basically the same: Break exactly before an interrupt would start, and do a "g XXXX". What happens in that case can vary between "problem 1" and "problem 2" (and perhaps something else i have not yet encountered).
       
      Since i do not really know what exact condition causes VICE to behave as "problem 1" or as "problem 2", i am not really sure how to split it up into two independent tickets.
      I could of course make a new ticket for "problem 3"
       

    • No probs, if you like that better ;)
       

    • You're completely right - I meant $D011, not $D012, of course. True, it's safer to wait for both plus and minus. This brainfart luckily doesn't seem to affect the outcome :)
      Anyway, it was just a quickly made test, to make you able to reproduce it with minimal work (load the PRG, press ALT+H, copy/paste the monitor-commands, done).
       

    • Yeah, ok. Shouldn't it give the same result if IO is mapped in at $D000-$DFFF, though? (Anyway, it doesn't matter to the test)

    Good point in being careful in drawing conclusions. It's true i might have generalized a bit too much in some of the sentences above, but it is what seems to be happening :) Anyway, i don't pretend to have any idea about what's going on under the hood :)

    Yes, using a single "g XXXX", would be interesting, if it's possible. I can try looking at it when i get the time. (It might very well be something with breakpoints that causes the issues)

    Ah, ok, good to know there are some problems with the CPU-history. It could of course be a problem with that one. However, i don't think it's only that, since as mentioned for "problem 3", it did seem like the "INC bugDetect" (INC $104A), did actually get executed twice (like in the CPU-history), since the value of "bugDetect" had been increased twice afterwards.
       For "problem 1", it also seems like "INC bugDetect" was executed before the interrupt, like the CPU-history says, since if you break inside the interrupt-handler, the value of "bugDetect" has already been increased.
       For "problem 2", after having executed the monitor commands i posted above, and you bump into "problem 2", if you try to just continue running the program ("g" or just exit the monitor), you will see that the border+screen is not flashing, indicating that the IRQ-interrupt no longer is running - so it does seem like the CPU-history could at least be somewhat correct here (something has stopped the IRQs).

    Are you able to reproduce it, btw? (Who knows, it might be my installation that has a problem)

    Thanks for looking into the code for what "g XXXX" does, and thank you for the reference to the source code, btw! :)

     

    Last edit: Algorithmix 2024-04-29
  • Algorithmix

    Algorithmix - 2024-05-01

    Hi Groepaz

    A few questions on how it would be most convenient for you that i split the tickets:

    I can create 3 new tickets where the 2 of them refer to each other. (and maybe also refers back to this ticket as parent-ticket, which can then be closed). Ok?

    The current test program currently requires you to run it a few times to get the right "jitter", if you want to catch a specific outcome (3 possibilities).
    After having split up into individual tickets, is that still ok?
    Or do you want me to do new ones that always hit the right jitter corresponding to that ticket? (will be longer code)

    Minimal text in each ticket, i assume?

    (I can of course do additional sneaky test later and put them in the comments. Like the one you wrote about and maybe one with instructions longer than 3 cycles, etc, etc)

     
  • gpz

    gpz - 2024-05-01

    I can create 3 new tickets where the 2 of them refer to each other. (and maybe also refers back to this ticket as parent-ticket, which can then be closed). Ok?

    sure. the whole point is to have less text per issue and hopefully only one "thing" to fix :)

    The current test program currently requires you to run it a few times to get the right "jitter", if you want to catch a specific outcome (3 possibilities).
    After having split up into individual tickets, is that still ok?
    Or do you want me to do new ones that always hit the right jitter corresponding to that ticket? (will be longer code)

    It would be a lot better if the program reproduces the exact problem every single time - that leaves no room for misunderstandings. You could steal "stable raster" stuff from some other test program in the repo (und put that code into a separate file).

     
  • gpz

    gpz - 2024-05-02

    Please hold the line, don't bother splitting the ticket yet - we might have located something that explains all this, see #2025

     
  • Algorithmix

    Algorithmix - 2024-05-02

    Ah, ok, thank you - that sounds great! I will stop for now.

    (Just skimmed #2025 - on the surface of it, it sounds to only be the instruction-duplication bug ("problem #3" above), but we'll see. Otherwise, i can post the new test-code and text i've done :))

     
  • Querino

    Querino - 2024-05-03

    this is the output i get after applying the patch from #2025

    (C:$e5cf) break exec 1035
    BREAK: 1  C:$1035  (Stop on exec)
    (C:$e5cf) 
    #1 (Stop on  exec 1035)  128/$080,  57/$39
    .C:1035  4C 35 10    JMP $1035      - A:0F X:01 Y:16 SP:f2 ..-.....   18366825
    (C:$1035) delete 1
    (C:$1035) break exec 0000 FFFF if RL == $fe
    BREAK: 1  C:$0000-$ffff  (Stop on exec)
    Setting checkpoint 1 condition to: RL == $fe
    (C:$1035) 
    #1 (Stop on  exec 1035)  254/$0fe,   0/$00
    .C:1035  4C 35 10    JMP $1035      - A:0F X:01 Y:16 SP:f2 ..-.....   18374706
    (C:$1035) delete 1
    (C:$1035) m D019 D019
    >C:d019  f1                                                   .
    (C:$d01a) break exec 1035
    BREAK: 1  C:$1035  (Stop on exec)
    (C:$d01a) 
    #1 (Stop on  exec 1035)  254/$0fe,  42/$2a
    .C:1035  4C 35 10    JMP $1035      - A:0F X:01 Y:16 SP:f2 ..-.....   18374748
    (C:$1035) delete 1
    (C:$1035) chis 20
    .C:1035  4C 35 10    JMP $1035      A:0f X:01 Y:16 SP:f2 ..-.....     18374667
    .C:1035  4C 35 10    JMP $1035      A:0f X:01 Y:16 SP:f2 ..-.....     18374670
    .C:1035  4C 35 10    JMP $1035      A:0f X:01 Y:16 SP:f2 ..-.....     18374673
    .C:1035  4C 35 10    JMP $1035      A:0f X:01 Y:16 SP:f2 ..-.....     18374676
    .C:1035  4C 35 10    JMP $1035      A:0f X:01 Y:16 SP:f2 ..-.....     18374679
    .C:1035  4C 35 10    JMP $1035      A:0f X:01 Y:16 SP:f2 ..-.....     18374682
    .C:1035  4C 35 10    JMP $1035      A:0f X:01 Y:16 SP:f2 ..-.....     18374685
    .C:1035  4C 35 10    JMP $1035      A:0f X:01 Y:16 SP:f2 ..-.....     18374688
    .C:1035  4C 35 10    JMP $1035      A:0f X:01 Y:16 SP:f2 ..-.....     18374691
    .C:1035  4C 35 10    JMP $1035      A:0f X:01 Y:16 SP:f2 ..-.....     18374694
    .C:1035  4C 35 10    JMP $1035      A:0f X:01 Y:16 SP:f2 ..-.....     18374697
    .C:1035  4C 35 10    JMP $1035      A:0f X:01 Y:16 SP:f2 ..-.....     18374700
    .C:1035  4C 35 10    JMP $1035      A:0f X:01 Y:16 SP:f2 ..-.....     18374703
    .C:1044  EE 4A 10    INC $104A      A:0f X:01 Y:16 SP:f2 ..-.....     18374706
    .C:1038  49 00       EOR #$00       A:0f X:01 Y:16 SP:ef ..-..I..     18374719
    .C:103a  EE 20 D0    INC $D020      A:0f X:01 Y:16 SP:ef ..-..I..     18374721
    .C:103d  EE 21 D0    INC $D021      A:0f X:01 Y:16 SP:ef N.-..I..     18374727
    .C:1040  EE 19 D0    INC $D019      A:0f X:01 Y:16 SP:ef N.-..I..     18374733
    .C:1043  40          RTI            A:0f X:01 Y:16 SP:ef N.-..I..     18374739
    .C:1047  4C 35 10    JMP $1035      A:0f X:01 Y:16 SP:f2 ..-.....     18374745
    (C:$1035
    

    and it looks pretty much the same every time i try.

     

    Last edit: Querino 2024-05-03
    • Algorithmix

      Algorithmix - 2024-05-05

      Thank you very much for running it on the build from #2525! Really appreciate it! :)

      It would be very interesting to get a CPU-history for all the 3 possibilities, to be sure the interrupt-problems are really fixed by the patch from #2025.

      I unfortunately don't have the skills to build it myself, so can't really check this myself. :'(
      Can you perhaps help again?

      I'll just add an extra "r" command to the monitor-commands, so we can see which cycle/rasterline we have breaked at and thereby identify which case we've hit (yes, yes, i should have added that to the original):

      delete
      ; runUntil("start", "break")
      break exec 1035
      g 1000
      delete 1
      ; waitRasterline(0xFE)
      break exec 0000 FFFF if RL == $fe
      x
      delete 1
      ; printMemory(0xD019, 0xD019)
      m D019 D019
      ; print which cycle we've breaked after
      r
      ; runUntil("otherCode", "break")
      break exec 1035
      g 1044
      delete 1
      ; printCpuHistory()
      cpuhistory 20
      

      The 3 possibilities can now be identified by the "CYC"-output of the "r" command:

      1. (Problem#1): CYC: 001
      2. (ok):                     CYC: 000
      3. (Problem#2): CYC: 009 (This one may have another CYC-number, if the problem is now fixed. If it is fixed, it will probably be CYC = 002 instead of 009. (Looks like the problem here is that the breakpoint breaks too late))

      Could i ask you to run it again with these new montor commands, until you get all 3 cases? (normaly only takes a few tries)

      You don't have to, of course, but it would be cool :)

       

      Last edit: Algorithmix 2024-05-05
  • Querino

    Querino - 2024-05-05

    i ran it 20 times now. just the monitor commands, i have mo other tools.

     
    ❤️
    1
    • Algorithmix

      Algorithmix - 2024-05-05

      Thanks!! That was a lot of times - much more than i expected of you :)
      Only 2 of the cases was hit, for some reason, but that's fine (when i've tried, i usually hit them in very few tries, but i guess that's how randomness works :))

      Unfortunately, it looks like the patch didn't fix the interrupt-problem ("problem#1" at least), since the "INC $104A" is still executed before the interrupt-BRK, when we break at (=after) CYC 001.

      Thanks again!

      EDIT: But i think i'll have to re-read up on my interrupt theory to be sure that's actually not how it's supposed to work. I could have been wrong here. (now that it's not delayed as much as before)

       

      Last edit: Algorithmix 2024-05-05
  • gpz

    gpz - 2024-05-05

    i have comitted a more sane version of that patch in r45151 - please test, not only if the problem is solved, but also if everything related to R and G commands, and entering/exiting the monitor, still works as before :)

     
    ❤️
    1
    • Algorithmix

      Algorithmix - 2024-05-05

      Very cool! That was much faster than i expected! You're a genious! :)
      It looks like the instruction-duplication is solved, and looks like "problem 1" wasn't actually a real problem (see below).

      I will run a lot of my old tests through the remote-monitor soon, to see they still behave the same as before (i use "r" and "g" in those all the time).

      Seems like "Problem 2" is still a thing, so i can do an issue on it, as soon as possible.

       


      I was terribly, terribly wrong about problem#1. I was mistakenly convinced that the VIC was timing raster-interrupts such that the 6510 could detect them at cycle 1, but it looks like it does not, and the 6510 can not detect a raster-interrupt until cycle 2 (and therefore does not start the interrupt-BRK until cycle 3 at the earliest). My bad, sorry for the confusion. While it was true that the interrupt became impossibly delayed, that was only because of problem 3 (and 4).

      I guess the lesson here is that it's a good idea to wait a some time after a debug session until you write a bug-report :)

       
    • Algorithmix

      Algorithmix - 2024-05-05

      Hi @gpz. Did some testing and I think i've found a problem in r45151:

      • Start a clean VICE (x64sc)
      • Go to the monitor
      • Type:
      a c000 inc $d020
      .c003  jmp $c000
      .c006  
      

      and add a breakpoint to the same address we jump to:

      break exec c000
      g c000
      
      • If we type chis 5, then in r45151 we get this:
      .C:e5d4  F0 F7       BEQ $E5CD      A:00 X:00 Y:0a SP:f3 ..-...Z.      5336219
      .C:e5cd  A5 C6       LDA $C6        A:00 X:00 Y:0a SP:f3 ..-...Z.      5336222
      .C:e5cf  85 CC       STA $CC        A:00 X:00 Y:0a SP:f3 ..-...Z.      5336225
      .C:c000  EE 20 D0    INC $D020      A:00 X:00 Y:0a SP:f3 ..-...Z.      5336228
      .C:c003  4C 00 C0    JMP $C000      A:00 X:00 Y:0a SP:f3 N.-.....      5336234
      

      but shouldn't it have breaked immediately without having executed anything?

       

      In the old 3.8:

      The old 3.8 from december (without any r-number in the about dialog), has the instruction-duplication bug, but if we work around it, by exiting and reentering the monitor between break exec c000and g c000, we get this result:

      .C:e5cf  85 CC       STA $CC        A:00 X:00 Y:0a SP:f3 ..-...Z.      5470106
      .C:e5d1  8D 92 02    STA $0292      A:00 X:00 Y:0a SP:f3 ..-...Z.      5470109
      .C:e5d4  F0 F7       BEQ $E5CD      A:00 X:00 Y:0a SP:f3 ..-...Z.      5470155
      .C:e5cd  A5 C6       LDA $C6        A:00 X:00 Y:0a SP:f3 ..-...Z.      5470158
      .C:e5cf  85 CC       STA $CC        A:00 X:00 Y:0a SP:f3 ..-...Z.      5470161
      

      So that one actually does execute nothing. (if we don't work around it we get a single inc $d020)

      UPDATE: If we do the same workaround in r45151 it actually also executes nothing.

       

      Last edit: Algorithmix 2024-05-05
      • Querino

        Querino - 2024-05-05

        hm.

        a c000 
        inc $d020
        jmp $c003
        
        
        break exec $c000
        g $c000
        

        this doesn't break at all here in r45151 ? only when using the "workaround".

         
        • Algorithmix

          Algorithmix - 2024-05-05

          Yes, you're right - also here. (sorry, i made a writing-mistake initially and wrote "jmp $c003", instead of "jmp $c000", if you were trying to reproduce what i wrote :))

           

          Last edit: Algorithmix 2024-05-05
          • Querino

            Querino - 2024-05-05

            ah. haha. now i know why i didn't get what the code actually was supposed to do. :)
            at least i can confirm now your observations.

             
  • gpz

    gpz - 2024-05-06

    please keep testing (hopefully this is the only new problem). Also please test if it behaves differently depending on how the monitor was entered (via a watch/breakpoint, or by UI). eg setting a new breakpoint after another breakpoint triggered and the monitor popped up....

    I have already looked briefly and i can see a potential problem with this - but i'd really like to know more details before attempting to fix it :)

    edit: in particular please check if "breakpoint at current address" is the only misbehaving one

    edit++: we apparently have to be extremely careful with the order of events, and how the test starts... i am playing around with it a bit, and it seems that some slightly different cases work when others do not... really tricky :)

     

    Last edit: gpz 2024-05-06
    • Querino

      Querino - 2024-05-06

      well, i wish i could help more, but i don't really have the knowledge.

      but i see, this one does not break at all in r45151

      a c000 
      nop
      nop
      nop
      nop
      nop
      rts
      
      
      break exec $c000
      g $c000
      

      whereas this one (note the different breakpoint)

      a c000 
      nop
      nop
      nop
      nop
      nop
      rts
      
      
      break exec $c001
      g $c000
      

      indeed breaks properly. as far as i can tell:

      (C:$c001) chis 3
      .C:e5cf  85 CC       STA $CC        A:00 X:00 Y:0a SP:f3 ..-...Z.      2398026
      .C:e5d1  8D 92 02    STA $0292      A:00 X:00 Y:0a SP:f3 ..-...Z.      2398029
      .C:c000  EA          NOP            A:00 X:00 Y:0a SP:f3 ..-...Z.      2398033
      (C:$c001)
      
       

      Last edit: Querino 2024-05-06
    • Querino

      Querino - 2024-05-06

      btw, the example qbove:

      a c000 
      nop
      nop
      nop
      nop
      nop
      rts
      
      delete
      break exec $c000
      
      g $c000
      

      which does not break. but entering the monitor again and throw another

      g $c000
      

      it DOES break. next g $c000 won't break, another g $c000 will break again... and so on.

       

      Last edit: Querino 2024-05-06
      • Algorithmix

        Algorithmix - 2024-05-06

        Interesting that it alternates like that. (some flag/state that is used for whether or not we should break immediately before the first instruction if there is a bp there?)

        Haven't found anything new. Just some minor points from a bit of experimenting:

        Looks like entering the monitor from a breakpoint also triggers it:

        a c000 
        inc $d020
        jmp $c000
        
        break exec c000
        break exec e5cf
        x
        g c000
        chis 5
        
        .C:e5d1  8D 92 02    STA $0292      A:00 X:00 Y:0a SP:f3 ..-...Z.      4124988
        .C:e5d4  F0 F7       BEQ $E5CD      A:00 X:00 Y:0a SP:f3 ..-...Z.      4124992
        .C:e5cd  A5 C6       LDA $C6        A:00 X:00 Y:0a SP:f3 ..-...Z.      4124995
        .C:c000  EE 20 D0    INC $D020      A:00 X:00 Y:0a SP:f3 ..-...Z.      4124998
        .C:c003  4C 00 C0    JMP $C000      A:00 X:00 Y:0a SP:f3 N.-.....      4125004
        

        So both creating a new breakpoint and jumping to it, and entering the monitor through a breakpoint, seems to consistently reproduce it

        Creating some random breakpoint in the "current monitor-session", that we don't jump to, does not seem to cause the problem.

        a c000 
        inc $d020
        jmp $c000
        
        break exec c000
        x
        ; press ALT+H
        break exec 4000
        g c000
        chis 5
        
        .C:e5d1  8D 92 02    STA $0292      A:00 X:00 Y:0a SP:f3 ..-...Z.      2924318
        .C:e5d4  F0 F7       BEQ $E5CD      A:00 X:00 Y:0a SP:f3 ..-...Z.      2924322
        .C:e5cd  A5 C6       LDA $C6        A:00 X:00 Y:0a SP:f3 ..-...Z.      2924325
        .C:e5cf  85 CC       STA $CC        A:00 X:00 Y:0a SP:f3 ..-...Z.      2924328
        .C:e5d1  8D 92 02    STA $0292      A:00 X:00 Y:0a SP:f3 ..-...Z.      2924331
        
        is fine
        

        Deleting a breakpoint in the "current monitor-session" does not seem to cause the problem. (Couldn't make enabling/disabling breakpoints do it either).

        Creating a new breakpoint in the "current monitor-session", at the same address of the already existing one, does not seem to cause the bug: (maybe because the old one kicks in?)

        a c000 
        inc $d020
        jmp $c000
        
        break exec c000
        x
        ; press ALT+H
        break exec c000
        g c000
        chis 5
        
        .C:e5d1  8D 92 02    STA $0292      A:00 X:00 Y:0a SP:f3 ..-...Z.      6084966
        .C:e5d4  F0 F7       BEQ $E5CD      A:00 X:00 Y:0a SP:f3 ..-...Z.      6084970
        .C:e5cd  A5 C6       LDA $C6        A:00 X:00 Y:0a SP:f3 ..-...Z.      6084973
        .C:e5cf  85 CC       STA $CC        A:00 X:00 Y:0a SP:f3 ..-...Z.      6084976
        .C:e5d1  8D 92 02    STA $0292      A:00 X:00 Y:0a SP:f3 ..-...Z.      6084979
        
        is fine
        

        Can also make "g" or "x" bug. For example:

        a c000 
        inc $d020
        inc $d021
        jmp $c003
        
        break exec c000
        break exec c003
        g c000
        chis 5
        
        .C:e5cf  85 CC       STA $CC        A:00 X:00 Y:0a SP:f3 ..-...Z.      3158115
        .C:e5d1  8D 92 02    STA $0292      A:00 X:00 Y:0a SP:f3 ..-...Z.      3158118
        .C:e5d4  F0 F7       BEQ $E5CD      A:00 X:00 Y:0a SP:f3 ..-...Z.      3158122
        .C:e5cd  A5 C6       LDA $C6        A:00 X:00 Y:0a SP:f3 ..-...Z.      3158125
        .C:c000  EE 20 D0    INC $D020      A:00 X:00 Y:0a SP:f3 ..-...Z.      3158128
        
        g   ; or "x"
        chis 5
        
        .C:e5d4  F0 F7       BEQ $E5CD      A:00 X:00 Y:0a SP:f3 ..-...Z.      3158122
        .C:e5cd  A5 C6       LDA $C6        A:00 X:00 Y:0a SP:f3 ..-...Z.      3158125
        .C:c000  EE 20 D0    INC $D020      A:00 X:00 Y:0a SP:f3 ..-...Z.      3158128
        .C:c003  EE 21 D0    INC $D021      A:00 X:00 Y:0a SP:f3 N.-.....      3158134
        .C:c006  4C 03 C0    JMP $C003      A:00 X:00 Y:0a SP:f3 N.-.....      3158140
        

        (we should have stopped at $c003 after "g" (or "x"))

        Using "r pc = xxxx, x" instead of "g xxxx" can also cause it

         

        Last edit: Algorithmix 2024-05-06
  • Querino

    Querino - 2024-06-09

    any news here? :)

    i wonder if the current state still is better than what we have before?

     
  • gpz

    gpz - 2024-06-09

    we really need to create regression tests (in the form of monitor scripts) for this (for all the different issues mentioned here), and then see what still needs fixing. (and fix the new issues while at it). And i certainly need help with this too :)

     

Log in to post a comment.