The FreeDOS Project / Feature Requests / #118 DEBUG - Scrolling or a history command showing the previous output or register states.

#118 DEBUG - Scrolling or a history command showing the previous output or register states.

Status: open

Owner: nobody

Labels: DEBUG (27)

Priority: 5

Updated: 2023-12-12

Created: 2023-07-08

Creator: Oliver

Private: No

Could you implement a scrolling or history function in DEBUG from FreeDOS that shows the previous stepping output or the previous register states?

Discussion

1 2 3 .. 5 > >> (Page 1 of 5)

Oliver - 2023-07-08

@C.Masloch

Discussion about LDEBUG and these two new feature requests.

Basically they are:
2)
Scrolling support for the output of a LDEBUG session. So that you can scroll some pages back and see the output of previous debugging steps.
But i don't know if that is possible to implement in DOS, that's why i also have suggestions 3:

3)
A history function in LDEBUG that shows the previous register states of previous steps.
Here I could imagine two possibilities, either you only save the register states in the history that have changed from the last n steps and reconstruct the rest for the output starting from the current register status.
Or you save all register states of the last n steps completely.
The former takes up more space in the code segment, the latter in the data segment. And you said that you have still some free space left in the data segment, so this could be used for such a helpful function.
The history is only about showing the previous register states of the last n steps, so it is not meant to be a rollback of the program to a previous program execution state.
If it is also possible to store the output of the debugged program itself, this would be even better. But i don't know if that is possible.

The output of the register state history should be page by page, but also selective.
As a command name I would suggest dh as dump history. As an alternative history or sh for show history could also be possible

Example:

- dh ... ; all last n register states are shown counting from current register status which is step 0 back to step n. - dh 5 ... ; Counting from the current status, the previous register status of step 5 backwards is displayed. AX=0000 BX=0000 ; etc.... DS=1F8B ES=1F8B ; etc.... 1F8B:0100 C3 retn - dh 7-10 ... ; Counting from the current status, the previous register status of steps 7 to 10 backwards are displayed. - dh 7, 10 ... ; Counting from the current status, the previous register status of step 7 and 10 is shown.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- E. C. Masloch - 2023-07-09
  
  3)
  A history function in LDEBUG that shows the previous register states of previous steps.
  Here I could imagine two possibilities, either you only save the register states in the history that have changed from the last n steps and reconstruct the rest for the output starting from the current register status.
  Or you save all register states of the last n steps completely.
  The former takes up more space in the code segment, the latter in the data segment. And you said that you have still some free space left in the data segment, so this could be used for such a helpful function.
  The history is only about showing the previous register states of the last n steps, so it is not meant to be a rollback of the program to a previous program execution state.
  
  I implemented the RH mode and corresponding RH command.
  
  If it is also possible to store the output of the debugged program itself, this would be even better. But i don't know if that is possible.
  
  Possible yes, at least for plain text output, but you would have to hook int 21h and/or int 10h. I think debugging with a serial terminal or with video screen swapping are alternatives to this though.
  
  The output of the register state history should be page by page, but also selective.
  As a command name I would suggest dh as dump history. As an alternative history or sh for show history could also be possible
  
  I chose RH for Register (dump) History. You have to enable RH mode using install rh. Then any subsequent R, RE, or T/P/G command output is stored in the auxiliary buffer, which is currently a fixed size of about 8 KiB. (Can be made larger, but for now only with a build time option. Considering a startup time option for a larger size.) Unlike your suggestion this stores the text displayed rather than the register values in binary.
  
  The benefit is that other messages like permanent breakpoint hits and the disassembly, including jumping and memory access notices, are also preserved. And it was fairly easy to implement, re-using a lot of the silent tracing buffering code.
  
  The big disadvantage is the memory use. At 8 KiB we can store about forty steps at a time, steps older than that are lost. Each dump (using default RE buffer content) comes in at about 200 bytes. Storing 8086 registers (excluding the 386 parts) would be about 8 times smaller, but that's without the disassembly.
  
  Anyway, the RH command has three modes:
  
  Just RH: Display entire saved contents.
  
  RH number: Display one dump from saved content. With 0 the most recent dump is displayed, 1 is the second-most recent dump, etc. If too high then the oldest dump is shown.
  
  RH number number: First number is start, same as for single-parameter form. Second number is how many dumps to display at most.
  
  The output of RH is paged by default. DCO3 options for the silent dump can be modified to disable paging:
  
  0100 T/TP/P: modify paging for silent dump
  0200 T/TP/P: if 0100 set: turn paging on, else off
  
  So you want r dco3 = dco3 clr 200 or 100 to disable paging for RH.
  
  To use the RH command you should enable RH mode using install rh. (This command will clear the RH buffers when enabling RH mode.) However, this disables commands that require the auxiliary buffer, like RN, RM, re.replace, DIL, and S. Boot loaded mode Y script file reading also requires the auxiliary buffer. The Q command will disable RH mode in order to operate using the auxiliary buffer.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Oliver - 2023-07-10
    
    I implemented the RH mode and corresponding RH command.
    
    These are great news. I have never had the case that feature requests are implemented as quickly as yours. My highest respect and a big thank you.
    
    Possible yes, at least for plain text output, but you would have to hook int 21h and/or int 10h. I think debugging with a serial terminal or with video screen swapping are alternatives to this though.
    
    Ok, It doesn't matter.
    
    The benefit is that other messages like permanent breakpoint hits and the disassembly, including jumping and memory access notices, are also preserved. And it was fairly easy to implement, re-using a lot of the silent tracing buffering code.
    
    Sounds good.
    
    At 8 KiB we can store about forty steps at a time, steps older than that are lost.
    
    40 steps is more than enough. I assumed 20, which should be sufficient in most cases. But 40 is even better.
    
    RH number number: First number is start, same as for single-parameter form. Second number is how many dumps to display at most.
    
    I noticed, that if you start with a smaller number than an older step only one step is printed.
    For example RH 0 3. This prints only step 0.
    
    Also there seems to be a bug with the ordering.
    See the screenshots. There i did a:
    
    - rh 3 ; prints step 3 which is mov ah 09 - rh 1 3 ; prints only step 0 and 1, but not 2 (first int 21) and 3 (mov ah 09) - rh 1 2 ; prints step 0 and 1, but not 1 (mov ah, 4C) and 2 (first int 21) - rh ; prints all steps, expect the first one.
    
    Anyway, the RH command has three modes:
    
    Is it possible to add a fourth mode, where an enumeration of certain steps is given and a separator is used.
    Example:
    
    RH 3, 5, 8, 10 ; prints steps 10, 8, 5, 3
    
    This could be handy if steps 9, 7, 6, and 4 are not important and thus should not be shown.
    You could print them individually of course, but this will always take one line for the input of the next RH number command.
    
    To use the RH command you should enable RH mode using install rh. (This command will clear the RH buffers when enabling RH mode.) However, this disables commands that require the auxiliary buffer, like RN, RM, re.replace, DIL, and S.
    
    I could live with not being able to use RM, it's quite unlikely that i will debug DOS programs that use MMX registers. I am unsure about RNand DIL.
    
    But no S is definitely a loss.
    How much work would it be to write a unified register output function that takes a struct of register values as binary values (in C language expression integers and pointers), as well as the other information you said, like permanent breakpoint hits, the disassembly, including jumping and memory access notices, and a step number and converts them to ASCII text and outputs them?
    
    So you could use this function for all commands t,p, r and rh and you could save a lot of storage space in the auxiliary buffer, so that S, RN, RM,RHand DIL would be possible again at the same time.
    
    It also seems that S search does have a regression (INSTALL RH is not used here). Its HEX and ASCII output doesn't match that from D dump. Basically the Hex numbers and ASCII text of the search string is not printed and ignored, which leads to having start the output at the wrong memory address.
    I made a screenshot of this, there it is easily visible.
    
    The Q command will disable RH mode in order to operate using the auxiliary buffer.
    
    Did you mean Q RH or QRH or Q H?
    Using these leads to an error message and only Q exits LDEBUG.
    
    complete_listing_u.png
    
    regression.png
    
    rh_complete_program.png
    
    rh_output_error.png
    
    rh_output_error_step_0.png
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - E. C. Masloch - 2023-07-10
      
      40 steps is more than enough. I assumed 20, which should be sufficient in most cases. But 40 is even better.
      
      The number may be a little lower if you have many memory references, or if you use the register change highlighting as you do.
      
      I noticed, that if you start with a smaller number than an older step only one step is printed.
      For example RH 0 3. This prints only step 0.
      
      Yes, this is intended.
      
      Also there seems to be a bug with the ordering.
      See the screenshots. There i did a:
      
      I don't think there is any bug here. The steps are:
      
      5 (oldest) = mov ds, ax
      4 = mov dx, 0
      3 = mov ah, 09
      2 = int 21
      1 = mov ah, 4C
      0 (most recent) =int 21
      
      rh 3
      ; prints step 3 which is mov ah 09
      
      Correct, and as intended.
      
      rh 1 3
      ; prints only step 0 and 1, but not 2 (first int 21) and 3 (mov ah 09)
      
      Also correct. The start step is 1, and then up to 3 steps starting from it are displayed. As there are only 2, steps 1 then 0 are displayed.
      
      rh 1 2
      ; prints step 0 and 1, but not 1 (mov ah, 4C) and 2 (first int 21)
      
      Your description is wrong, this does display step 1 (mov ah, 4C) and also step 0 the second int 21. The display in your screenshot is exactly what I intended.
      
      (Writing of which, on Linux dosemu2 you can copy text from the dosemu2 graphical window when pressing Shift it seems.)
      
      rh
      ; prints all steps, expect the first one.
      
      If you want to include a dump from before the first trace, run install rh then a plain r command before t commands.
      
      Is it possible to add a fourth mode, where an enumeration of certain steps is given and a separator is used.
      Example:
      
      RH 3, 5, 8, 10
      ; prints steps 10, 8, 5, 3
      
      This could be handy if steps 9, 7, 6, and 4 are not important and thus should not be shown.
      You could print them individually of course, but this will always take one line for the input of the next RH number command.
      
      As a workaround you could use rc.replace @rh 3; @rh 5; @rh 8; @rh 10 then run rc. (No, wait, you cannot use rc.replace while RH mode is active. Hmm.) However, I can probably squeeze an RH IN number, number, ... syntax into the code segment later.
      
      But no S is definitely a loss.
      
      I prepared a small patch today https://hg.pushbx.org/ecm/ldebug/rev/14a4c72ffbab which allows to use S while in RH mode. It uses the (new) WHILE buffer, which means S called from T/TP/P with a WHILE condition with RH mode or silent tracing enabled will fail. However, outside use in the RE buffer the S command will always work now.
      
      How much work would it be to write a unified register output function that takes a struct of register values as binary values (in C language expression integers and pointers), as well as the other information you said, like permanent breakpoint hits, the disassembly, including jumping and memory access notices, and a step number and converts them to ASCII text and outputs them?
      So you could use this function for all commands t,p, r and rh and you could save a lot of storage space in the auxiliary buffer, so that S, RN, RM,RHand DIL would be possible again at the same time.
      
      I am considering a different form of compression but either way will probably eat a lot of code space size.
      
      It also seems that S search does have a regression (INSTALL RH is not used here). Its HEX and ASCII output doesn't match that from D dump. Basically the Hex numbers and ASCII text of the search string is not printed and ignored, which leads to having start the output at the wrong memory address.
      I made a screenshot of this, there it is easily visible.
      
      This is intended. It is expected that you already know the search string, so the S dump displays the 16 bytes after the end of the search pattern match. This is documented in
      https://pushbx.org/ecm/doc/ldebug.htm#cmdhttps://pushbx.org/ecm/doc/ldebug.htm#cmds And this is not a regression: Even if you consider it a bug, it always worked like this so it didn't regress at any point. (The term "regression" to me means something that used to work no longer working the same way.)
      
      The Q command will disable RH mode in order to operate using the auxiliary buffer.
      
      Did you mean Q RH or QRH or Q H?
      Using these leads to an error message and only Q exits LDEBUG.
      
      No, to disable RH mode without quitting the debugger just use uninstall rh. I meant that a plain Q command asks for the auxiliary buffer, which is in use if RH mode is enabled. Instead of failing the Q command with an error message (which annoyed me), Q will now disable RH mode first before attempting to carry out the Q command's intended task which is to quit the debugger. (If the Q command fails then RH mode stays disabled.)
      
      I also made a small change today so that the QA command will no longer need the auxiliary buffer nor disable RH mode. https://hg.pushbx.org/ecm/ldebug/rev/5e51cb3e6dc7
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
      - Oliver - 2023-07-10
        
        rh 1 3 ; prints only step 0 and 1, but not 2 (first int 21) and 3 (mov ah 09)
        
        Also correct. The start step is 1, and then up to 3 steps starting from it are displayed. As there are only 2, steps 1 then 0 are displayed.
        
        That's confusing. The last step, most recent is 0 in the history, counting backwards from there, why not just printing step 1 to step 3:
        3 = mov ah, 09
        2 = int 21
        1 = mov ah, 4C
        
        and not:
        1 =mov ah, 4C
        0 (most recent) =int 21
        
        From a user perspective the question from the program to the user should be:
        
        "What steps do you want?"
        
        And the user answers:
        
        "Steps 1 to 3 (which means 3 is included so a <= 3, not < 3) "
        
        And then the program should print steps 1 to 3, not 0 and 1.
        The latter is quite confusing and complicates more than required.
        
        rh 1 2 ; prints step 0 and 1, but not 1 (mov ah, 4C) and 2 (first int 21)
        
        Your description is wrong, this does display step 1 (mov ah, 4C) and also step 0 the second int 21. The display in your screenshot is exactly what I intended.
        
        No, with first int 21 i meant the first int 21in the program execution.
        According to your above list, this would be:
        2 = int 21 where AX = 09E7, after step 3 with (mov ah, 09)
        
        And when entering rh 1 2 i expected:
        2 = int 21 where AX = 09E7
        1 =mov ah, 4C
        
        Not:
        1 = mov ah, 4C
        0 (most recent) =int 21 where AX = 4C24
        
        (Writing of which, on Linux dosemu2 you can copy text from the dosemu2 graphical window when pressing Shift it seems.)
        
        That's good to know. Thank you for the information.
        
        If you want to include a dump from before the first trace, run install rh then a plain r command before t commands.
        
        Ah, i understand.
        Suggestion:
        Is it possible to do a call of r without printing an output under the hood after entering install rh, that way the history is already fed with the first r until it gets overwritten after n steps.
        
        However, I can probably squeeze an RH IN number, number, ... syntax into the code segment later.
        
        Sounds okay, it's better than not having that mode available.
        
        I prepared a small patch today https://hg.pushbx.org/ecm/ldebug/rev/14a4c72ffbab which allows to use S while in RH mode. It uses the (new) WHILE buffer, which means S called from T/TP/P with a WHILE condition with RH mode or silent tracing enabled will fail. However, outside use in the RE buffer the S command will always work now.
        
        That sounds better. I will need to test it tomorrow when the new binary is compiled and available.
        
        I am considering a different form of compression but either way will probably eat a lot of code space size.
        
        If you ask me, if feature after feature is added, it's sometime a good idea to do a refactoring. That would also be a good opportunity to add the error code and error messages thing.
        I would use a uniform output for R, RH and what else goes with it and work internally with the data in binary form, as described, so i would not save it as ASCII. I would only convert to ASCII at the end of the output. So you could work internally with binary data without much overhead.
        I could imagine that such a model would also be better for later expansions, whatever that may be.
        
        A simple line of Rthat looks like:
        
        AX=2031 BX=0000 CX=0004 DX=0000 SP=0100 BP=0000 SI=0000 DI=000A DS=20EE ES=20EE SS=20F0 CS=20E6 IP=001C 0_ D_ I_ S_ Z_ A1 P_ C_ 20E6:001C 743F jz 005D not jumping
        
        Seems to require 64 Bytes for the first and second line and 80 Bytes, if tabs are not used, for the third line. That's 208 Bytes if stored as ASCII/Codepage characters.
        
        If every register is stored in binary form it's only:
        13 registers x 2 Bytes + 2 Bytes for the Flag registers, 6 Bytes for the instruction (includes 1 Byte to 6 Byte long instruction) = 34 Bytes for these 3 line and a 8086 and the rest could be reconstructed from this information during the output.
        This would leave additional space for the FPU registers, the 2 new segment registers of the 386, 2 bytes wider flag register and its 4 byte wide extended registers, and the MMX register of the Pentium MMX.
        And with smart, i.e. conditional storage, you could omit the MMX and FPU registers if not needed.
        
        Code size would increase a little but, but not much, because the output function is required anyway and a more flexible one will not add that much.
        
        This is intended. It is expected that you already know the search string, so the S dump displays the 16 bytes after the end of the search pattern match. This is documented in..
        
        I understand. Ok then it is my fault. I thought the search pattern is also shown, so that the user knows, that this search string is really in the shown memory range.
        
        And this is not a regression: Even if you consider it a bug, it always worked like this so it didn't regress at any point. (The term "regression" to me means something that used to work no longer working the same way.)
        
        You are correct, but my assumption was, that it was different before. And here i was wrong. But that's the reason why i used the term regression.
        
        But the address value or the output is still wrong when you compare the output of the D dump with the output of the S search .
        The output of the search command gives the impression that the word "Welt" starts at address 20D6:0111.
        
        -S DS:0,FFFF "Hallo" 20D6:0110 20 57 65 6C 74 21 0D 0A-24 4D 89 81 53 C9 41 B8 Welt!..$M..S.A. 0 1 2 3 4 5 6 7 8 9 A B C D E F
        
        But the dump command shows, that it is actually starting at 20D6:0116, because the W is where the 6h is.
        
        -D DS:110,120 20D6:0110 48 61 6C 6C 6F 20 57 65-6C 74 21 0D 0A 24 4D 89 Hallo Welt!..$M. 0 1 2 3 4 5 6 7 8 9 A B C D E F
        
        So to fix that, either the offset address in the output of the search command must be adjusted or the output from hex value 0 to 5 must be filled with blanks, because W starts at 6h.
        If you want omit the search patter in the output, i would consider this a correct output, as one is used to from Dump.
        
        -S DS:0,FFFF "Hallo" 20D6:0110 20 57 65-6C 74 21 0D 0A 24 4D 89 Welt!..$M. 0 1 2 3 4 5 6 7 8 9 A B C D E F
        
        No, to disable RH mode without quitting the debugger just use uninstall rh
        
        I understand.
        
        I also made a small change today so that the QA command will no longer need the auxiliary buffer nor disable RH mode. https://hg.pushbx.org/ecm/ldebug/rev/5e51cb3e6dc7
        
        Sounds good.
        
        Last edit: Oliver 2023-07-10
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        E. C. Masloch - 2023-07-11
        
        That's confusing. The last step, most recent is 0 in the history, counting backwards from there, why not just printing step 1 to step 3:
        
        You can use rh in from 1 length 3 for that. I'm not convinced to change the default behaviour because it was very simple to implement based on the silent trace buffering.
        
        Suggestion:
        Is it possible to do a call of r without printing an output under the hood after entering install rh, that way the history is already fed with the first r until it gets overwritten after n steps.
        
        I would not want that always, so this would need an additional option. Furthermore, it would require more handling to do a silent R command dump. The workaround of needing to run R yourself is not too bad I think.
        
        That sounds better. I will need to test it tomorrow when the new binary is compiled and available.
        
        I also modified rc.replace today in the same way to allow using it in RH mode.
        
        If you ask me, if feature after feature is added, it's sometime a good idea to do a refactoring. That would also be a good opportunity to add the error code and error messages thing.
        
        It's important to me to work in small steps and commit changesets often rather than prepare large far-reaching changes. This is why the error messages will probably be handled by first adding the new functions, then incrementally changing users to make use of it.
        
        I would use a uniform output for R, RH and what else goes with it and work internally with the data in binary form, as described, so i would not save it as ASCII. I would only convert to ASCII at the end of the output. So you could work internally with binary data without much overhead.
        I could imagine that such a model would also be better for later expansions, whatever that may be.
        
        A simple line of Rthat looks like:
        
        AX=2031 BX=0000 CX=0004 DX=0000 SP=0100 BP=0000 SI=0000 DI=000A
        DS=20EE ES=20EE SS=20F0 CS=20E6 IP=001C 0_ D_ I_ S_ Z_ A1 P_ C_
        20E6:001C 743F jz 005D not jumping
        
        Seems to require 64 Bytes for the first and second line and 80 Bytes, if tabs are not used, for the third line. That's 208 Bytes if stored as ASCII/Codepage characters.
        
        If every register is stored in binary form it's only:
        13 registers x 2 Bytes + 2 Bytes for the Flag registers, 6 Bytes for the instruction (includes 1 Byte to 6 Byte long instruction) = 34 Bytes for these 3 line and a 8086 and the rest could be reconstructed from this information during the output.
        This would leave additional space for the FPU registers, the 2 new segment registers of the 386, 2 bytes wider flag register and its 4 byte wide extended registers, and the MMX register of the Pentium MMX.
        And with smart, i.e. conditional storage, you could omit the MMX and FPU registers if not needed.
        
        Code size would increase a little but, but not much, because the output function is required anyway and a more flexible one will not add that much.
        
        Actually, this will require two different forms of encoding the registers, one into the intermediate binary format and another from that to the text output. This would be a considerably large change. RH mode and the silent buffer would be the only users of this.
        
        You are correct, but my assumption was, that it was different before. And here i was wrong. But that's the reason why i used the term regression.
        
        Yes, it would be accurate then.
        
        But the address value or the output is still wrong when you compare the output of the D dump with the output of the S search .
        The output of the search command gives the impression that the word "Welt" starts at address 20D6:0111.
        
        This is not ideal, but does work as intended. A 16-bit segment search could add a small hint like +NNNN where the number would be the length of the search string. However, this would not fit for 32-bit segments, or I'd have to shorten the dump to fit this in the line.
        
        But the dump command shows, that it is actually starting at 20D6:0116, because the W is where the 6h is.
        
        Hint: If you use r dco2 or= 333 the debugger will draw headers and trailers with the offsets in D/DB/DW/DD commands.
        
        So to fix that, either the offset address in the output of the search command must be adjusted or the output from hex value 0 to 5 must be filled with blanks, because W starts at 6h.
        If you want omit the search patter in the output, i would consider this a correct output, as one is used to from Dump.
        
        -S DS:0,FFFF "Hallo"
        
        20D6:0110 20 57 65-6C 74 21 0D 0A 24 4D 89 Welt!..$M. 0 1 2 3 4 5 6 7 8 9 A B C D E F
        
        This would lessen the use of the data dump the longer the search pattern gets, with a completely blank dump eventually for 16-byte or longer patterns. So I won't do this.
        
        (By the way, you can omit the DS: prefix for a search range as ds is already the default.)
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Oliver - 2023-07-21
        
        Sorry if it took a little longer to answer that. I just needed a break from assembly programming the last few days.
        
        You can use rh in from 1 length 3 for that. I'm not convinced to change the default behaviour because it was very simple to implement based on the silent trace buffering.
        
        rh in from 1 length 3
        
        works, but
        
        rh 1 3
        
        is still confusing, because the last step seems to be counted as step 1, not step 0 like we programmers are used to and how rh with only one number given works.
        Thus step 0 would be the next step, which isn't taken so far.
        
        I wrote a new assembly program wich looks like this in the code section.
        Basically it increments AX until AX reaches 5. It saves me some work to write the output here because I only have to write the AX register.:
        
        START: MOV AX, 0 INC AX INC AX INC AX INC AX INC AX MOV AH, 4Ch ; return to DOS INT 21h END START
        
        When i load it in LDEBUG and trace it unreal AX reaches value 3, i get at step 3 counting from 0:
        
        -t ; step 0, = rh 3 AX=0000 ... -t ; step 1, = rh 2 AX=0001 ... -t ; step 2, = rh 1 AX=0002 ... -t ; step 3, = rh 0 AX=0003 ... -
        
        Then i try different rh commands.
        Here the last step is step 3 (rh 0) as expected.
        
        -rh 0 AX=0003... -
        
        And as you said, rh 0 1 should give step 3 (the last step) counting from 0 backwards only 1 step.
        So this is correct too:
        
        -rh 0 1 AX=0003... -
        
        But rh 0 2, should give us now two register outputs, rh 0 and rh 1, but it doesn't.
        Only rh 0 is printed:
        
        -rh 0 2 AX=0003... -
        
        It's confusing, because i expected that 2 steps are printed.
        
        The command rh 1 2 outputs, what i expected from rh 0 2:
        
        -rh 1 2 AX=0002... AX=0003... -
        
        But according to the definition, that rh 0 is the last step, there should be no AX=0003 in the output. Instead, there should be:
        
        AX=0001... AX=0002...
        
        Which corresponds to individual
        
        -rh 2 AX=0001
        
        and
        
        -rh 1 AX=0002
        
        And rh 1 3 seems to output the same as the above rh 1 2, only 2 register outputs. Expected where 3.
        
        -rh 1 3 AX=0002... AX=0003... -
        
        It get's even more confusing with rh 2 3, now i get 3 steps in the output:
        
        -rh 2 3 AX=0001 AX=0002 AX=0003 -
        
        The world seems to look good again if we do the input in reverse. Oldest step (largest number) first, newest step last:
        
        -rh 1 0 AX=0002 AX=0003 -
        
        So a rh 2 1 should give us
        
        AX=0001 AX=0002
        
        Does it? No, we get:
        
        -rh 2 1 AX=0001 -
        
        Which corresponds to:
        
        -rh 2 AX=0001 -
        
        So we just learned rh 2 1 gives us the output of only 1 step, so rh 2 0 should give us 2 steps, right?
        No:
        
        -rh 2 0 AX=0001... AX=0002... AX=0003...
        
        This is correct, if we expected the output of rh 2, rh 1 and rh 0, but then the above rh 2 1should work the same way and give us rh 2 and rh 1.
        
        Now i try to print several steps at once, but this time i use a comma to have them not coherently. So this time no range, only individual steps:
        
        -rh 4,3 AX=0000... AX=0001... -
        
        Looks good. As expected.
        
        What about:
        
        -rh 4, 2 AX=0000 -
        
        Only rh 4 is printed, no rh 2.
        
        And several unrelated steps:
        
        -rh 4, 2, 0 ^ Error
        
        So this seems not to work this way.
        
        rh in from 0 to 3 looks better, it gives at least what i have expected:
        
        -rh in from 0 to 3 AX=0000 AX=0001 AX=0002 AX=0003 -
        
        So i wonder, can this be done in reverse order?
        
        - rh in from 3 to 0 ^Error
        
        Sadly no.
        
        Above i tried rh in from 0 to 3, now i wanted to see what rh in from 0 length 3will give me:
        
        -rh in from 0 length 3 AX=0001 AX=0002 AX=0003
        
        Okay, this is correct, if we assume that length n is the number of steps and not a range.
        
        Personally i think i will stick with
        rh in from 0 to 3
        because that's what is not confusing and will be the range feature i will very likely need the most time.
        
        But it would be more desirable if rh 0 3 equaled rh in from 0 to 3.
        
        I also modifiedrc.replace today in the same way to allow using it in RH mode.
        
        Thank you.
        
        It's important to me to work in small steps and commit changesets often rather than prepare large far-reaching changes. This is why the error messages will probably be handled by first adding the new functions, then incrementally changing users to make use of it.
        
        Okay.
        
        Actually, this will require two different forms of encoding the registers, one into the intermediate binary format and another from that to the text output. This would be a considerably large change. RH mode and the silent buffer would be the only users of this.
        
        Why do you need two encodings for this when the text output could be created on the fly from the information encoded in the intermediate binary format?
        
        You only need a text output function that creates the text output from the information in intermediate binary format. And this output function could be used for R, RH and what else goes with it and you would have a silent buffer too, because the encoded intermediate binary format is only internal data, not text output on the screen.
        The text output is done by the text output function that takes the encoded intermediate binary format data as parameter.
        
        This is not ideal, but does work as intended. A 16-bit segment search could add a small hint like +NNNN where the number would be the length of the search string. However, this would not fit for 32-bit segments, or I'd have to shorten the dump to fit this in the line.
        
        and
        
        This would lessen the use of the data dump the longer the search pattern gets, with a completely blank dump eventually for 16-byte or longer patterns. So I won't do this.
        
        It would lessen the use of the data dump in some cases, but it would be correct and consistent to dump D. And in cases of an empty dump, this could be supplemented with another line.
        But i accept, when you don't want do it. In this case your suggestion of a +NNNN hint would be the best option, i think. At least for 16 Bit real mode programs.
        
        Hint: If you use r dco2 or= 333 the debugger will draw headers and trailers with the offsets in D/DB/DW/DD commands.
        
        Thank you very much for that hint.
        I thought about taking this option into my default configuration file, but since then two additional lines always have to be outputted when this option is enabled, I changed my mind.
        
        Couldn't this function be made part of dumpdso it's available on a case use basis when you need it? So that you just have to add a parameter with d, and then d automatically displays this help?
        
        The top parameter for outputting non-ASCII characters already exists in d dump, so how about another short parameter word for this help?
        Something like ht for header and trailer.
        
        -d top, ht 10,20 ; dumps memory with header and trailer help and including non-ASCII characters -d ht 10,20 ; dumps memory with header and trailer help -
        
        I tried to find a more meaningful term for the htparameter, help would be good too, but the argument against help is that it might be necessary for a built-in online help. And then that would be rather confusing if used somewhere else in a different context.
        But you could also simply use the word human, as is known from various Unix command line programs (du -h, free -h etc.) for an output that is easy for humans to understand or read.
        
        -d human 10, 20 ; dumps memory with header and trailer help
        
        (By the way, you can omit the DS: prefix for a search range as ds is already the default.)
        
        I know, thanks anyway. I used DS to be more precise.
        
        Last edit: Oliver 2023-07-21
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        E. C. Masloch - 2023-07-21
        
        but
        
        rh 1 3
        
        is still confusing, because the last step seems to be counted as step 1, not step 0 like we programmers are used to and how rh with only one number given works.
        Thus step 0 would be the next step, which isn't taken so far.
        
        No, the two-parameter form specifies the first parameter (start of dump) exactly the same as the one-parameter form. rh 1 3 asks to start the dump at step 1 (the second-most recent step) and go on for 3 steps (counting down towards the present moment, from 1 to 0 to ...). It doesn't care that there are only 2 steps to show, that's on you as the user.
        
        When i load it in LDEBUG and trace it unreal AX reaches value 3, i get at step 3 counting from 0:
        
        -t ; step 0, = rh 3
        AX=0000 ...
        -t ; step 1, = rh 2
        AX=0001 ...
        -t ; step 2, = rh 1
        AX=0002 ...
        -t ; step 3, = rh 0
        AX=0003 ...
        -
        
        Then i try different rh commands.
        Here the last step is step 3 (rh 0) as expected.
        
        -rh 0
        AX=0003...
        -
        
        And as you said, rh 0 1 should give step 3 (the last step) counting from 0 backwards only 1 step.
        So this is correct too:
        
        -rh 0 1
        AX=0003...
        -
        
        But rh 0 2, should give us now two register outputs, rh 0 and rh 1, but it doesn't.
        Only rh 0 is printed:
        
        -rh 0 2
        AX=0003...
        -
        
        Same user error as above. You're specifying to start at step rh 0 and then display 2 steps. But there is only the one if you count towards the present moment.
        
        It's confusing, because i expected that 2 steps are printed.
        
        The command rh 1 2 outputs, what i expected from rh 0 2:
        
        -rh 1 2
        AX=0002...
        AX=0003...
        -
        
        But according to the definition, that rh 0 is the last step, there should be no AX=0003 in the output. Instead, there should be:
        
        AX=0001...
        AX=0002...
        
        Which corresponds to individual
        
        -rh 2
        AX=0001
        
        and
        
        -rh 1
        AX=0002
        
        rh 1 2 starts dumping at step rh 1 and then counts up to two steps towards the present moment. So rh 1 2 is like rh 1 then rh 0.
        
        And rh 1 3 seems to output the same as the above rh 1 2, only 2 register outputs. Expected where 3.
        
        -rh 1 3
        AX=0002...
        AX=0003...
        -
        
        Same user error. This is like rh 1 then rh 0 then a no-op as the counter doesn't ever go negative.
        
        It get's even more confusing with rh 2 3, now i get 3 steps in the output:
        
        -rh 2 3
        AX=0001
        AX=0002
        AX=0003
        -
        
        Start from step rh 2 then do count down to rh 1 and rh 0. Exactly as I intended.
        
        The world seems to look good again if we do the input in reverse. Oldest step (largest number) first, newest step last:
        
        -rh 1 0
        AX=0002
        AX=0003
        -
        
        0 as the second parameter is (now) special. It will display every step, starting at the step specified by the first parameter, down to step rh 0.
        
        So a rh 2 1 should give us
        
        AX=0001
        AX=0002
        
        Does it? No, we get:
        
        -rh 2 1
        AX=0001
        -
        
        Which corresponds to:
        
        -rh 2
        AX=0001
        -
        
        This is correct. rh 2 1 means start with step rh 2 and display exactly one step. This is the same as rh 2.
        
        So we just learned rh 2 1 gives us the output of only 1 step, so rh 2 0 should give us 2 steps, right?
        No:
        
        -rh 2 0
        AX=0001...
        AX=0002...
        AX=0003...
        
        This is correct, if we expected the output of rh 2, rh 1 and rh 0, but then the above rh 2 1should work the same way and give us rh 2 and rh 1.
        
        As I mentioned 0 as the second parameter is special, it means all subsequent steps are displayed. As I wrote, rh 2 1 really is the same as rh 2.
        
        Now i try to print several steps at once, but this time i use a comma to have them not coherently. So this time no range, only individual steps:
        
        -rh 4,3
        AX=0000...
        AX=0001...
        -
        
        Looks good. As expected.
        
        This means start at step rh 4 (nonexistent) and dump up to 3 steps. It is the same as rh 4 then rh 3 then rh 2. The rh 4 command newly will not output anything because a step that old doesn't exist.
        
        Do note that in the two-parameter RH command form, the comma is completely optional. It has no effect. rh 4,3 is the same as rh 4 3.
        
        By the way, if you run re.replace @r ax . before the t trace commands then the debugger will output only the AX value for each trace step. That should help making examples. Reset this state using re.replace @r.
        
        What about:
        
        -rh 4, 2
        AX=0000
        -
        
        Only rh 4 is printed, no rh 2.
        
        No, this is step rh 3. The step rh 4 doesn't exist so it produces no output.
        
        And several unrelated steps:
        
        -rh 4, 2, 0
        ^ Error
        
        So this seems not to work this way.
        
        There is no three-parameter form of the command. If you want to pass a list use rh in 4,2,0 (the comma is needed here).
        
        So i wonder, can this be done in reverse order?
        
        rh in from 3 to 0
        ^Error
        
        Sadly no.
        
        This would require changes to the match range parsing.
        
        Above i tried rh in from 0 to 3, now i wanted to see what rh in from 0 length 3will give me:
        
        -rh in from 0 length 3
        AX=0001
        AX=0002
        AX=0003
        
        Okay, this is correct, if we assume that length n is the number of steps and not a range.
        
        Personally i think i will stick with
        rh in from 0 to 3
        because that's what is not confusing and will be the range feature i will very likely need the most time.
        
        But it would be more desirable if rh 0 3 equaled rh in from 0 to 3.
        
        Actually, this will require two different forms of encoding the registers, one into the intermediate binary format and another from that to the text output. This would be a considerably large change. RH mode and the silent buffer would be the only users of this.
        
        Why do you need two encodings for this when the text output could be created on the fly from the information encoded in the intermediate binary format?
        
        By "encoding" I didn't mean the data layout of the data here. I meant the program logic that "encodes" the data. The two encodings I referred to are the two different handlers needed for your scheme: One that encodes from the debugger variables to the binary/compressed form, and another that decodes the binary form and encodes this data in text form.
        
        The text output is done by the text output function that takes the encoded intermediate binary format data as parameter.
        
        This would be the second encoding in my notation. Creating this binary form would be the first encoding.
        
        But i accept, when you don't want do it. In this case your suggestion of a +NNNN hint would be the best option, i think. At least for 16 Bit real mode programs.
        
        I should get to that some time soon.
        
        Couldn't this function be made part of dumpdso it's available on a case use basis when you need it? So that you just have to add a parameter with d, and then d automatically displays this help?
        
        I don't think I will add that.
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Oliver - 2023-07-22
        
        Thank you, now i understand the working of rh i k.
        
        BTW, there is an issue when using g followed by rh. rh only shows steps that where done by t or p, but not by g.
        And when doing some steps followed by L and then some steps again, rh shows the steps before and after the L.
        If I understood that correctly, L loads the initial state of the program and, as I understand it, should also flush rh. But that is not the case, the history of the old steps is kept.
        
        There is no three-parameter form of the command. If you want to pass a list use rh in 4,2,0(the comma is needed here).
        
        Thank you for the hint.
        
        By "encoding" I didn't mean the data layout of the data here. I meant the program logic that "encodes" the data.
        
        I understand. Thank you for your clarification.
        
        I should get to that some time soon.
        
        Sounds good.
        
        Couldn't this function be made part of dumpdso it's available on a case use basis when you need it? So that you just have to add a parameter with d, and then d automatically displays this help?
        
        I don't think I will add that.
        
        That's sad to hear. It would be a useful feature.
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        E. C. Masloch - 2023-07-22
        
        BTW, there is an issue when using g followed by rh. rh only shows steps that where done by t or p, but not by g.
        
        That would be quite the bug, but I cannot reproduce it. Please list an entire session showing this problem.
        
        And when doing some steps followed by L and then some steps again, rh shows the steps before and after the L.
        If I understood that correctly, L loads the initial state of the program and, as I understand it, should also flush rh. But that is not the case, the history of the old steps is kept.
        
        Yes. You can uninstall rh then install rh to discard the earlier entries explicitly.
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        E. C. Masloch - 2023-07-22
        
        Session not showing the G/RH bug:
        
        E:\>ldebug lddebugu.com &; Welcome to lDebug! -install rh Register dump history enabled. -u 20A9:0140 8CC8 mov ax, cs 20A9:0142 31DB xor bx, bx 20A9:0144 055D1A add ax, 1A5D 20A9:0147 50 push ax 20A9:0148 53 push bx 20A9:0149 CB retf 20A9:014A 26807F0200 cmp byte [es:bx+02], 00 20A9:014F 7414 jz 0165 20A9:0151 26C747030001 mov word [es:bx+03], 0100 20A9:0157 26807F020E cmp byte [es:bx+02], 0E 20A9:015C 7406 jz 0164 20A9:015E 26C747030381 mov word [es:bx+03], 8103 -g 148 AX=3B06 BX=0000 CX=C630 DX=0000 SP=07FE BP=0000 SI=0000 DI=0000 DS=20A9 ES=20A9 SS=4784 CS=20A9 IP=0148 NV UP EI PL NZ AC PE NC 20A9:0148 53 push bx -rh AX=3B06 BX=0000 CX=C630 DX=0000 SP=07FE BP=0000 SI=0000 DI=0000 DS=20A9 ES=20A9 SS=4784 CS=20A9 IP=0148 NV UP EI PL NZ AC PE NC 20A9:0148 53 push bx -re.replace @r ax . -g 149 AX 3B06 -rh AX=3B06 BX=0000 CX=C630 DX=0000 SP=07FE BP=0000 SI=0000 DI=0000 DS=20A9 ES=20A9 SS=4784 CS=20A9 IP=0148 NV UP EI PL NZ AC PE NC 20A9:0148 53 push bx AX 3B06 -rh 0 AX 3B06 -rh 1 AX=3B06 BX=0000 CX=C630 DX=0000 SP=07FE BP=0000 SI=0000 DI=0000 DS=20A9 ES=20A9 SS=4784 CS=20A9 IP=0148 NV UP EI PL NZ AC PE NC 20A9:0148 53 push bx -
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Oliver - 2023-07-22
        
        I made a screenshot but it's not a bug.
        I think it's because of the way how rh works. When i wrote my last comment, i have forgotten that rh only logs the output. Thus there can be of course no steps logged made by g, except the output Program terminated normally (xxxx).
        So it's not a bug, it's just a result of how rh works.
        
        In order to change this, internal logging of the steps would be necessary. But we've already discussed that. And with such an internal logging of the steps, for example, the execution of a debug session with g until the next brake point appears, would certainly slow down the execution if every step were logged internally. So that wouldn't be good either.
        
        Yes. You can uninstall rhthen install rhto discard the earlier entries explicitly.
        
        Thank you, that workaround will be useful.
        
        Are there cases where it might be useful to keep the old outputs from before the L command? If not, then you could automatically initiate this uninstall rh and install rh with the L command.
        
        g_and_rh.png
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Oliver - 2023-07-24
        
        I have a new idea. If you decide one day to integrate internal logging this disadvantage of a too slow logging g command could be circumvented, by just offering two g commands. One g command that woks normally as before without logging and another command started with gg, thus two g, that uses internal logging.
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        E. C. Masloch - 2023-07-26
        
        I think it's because of the way how rh works. When i wrote my last comment, i have forgotten that rh only logs the output. Thus there can be of course no steps logged made by g, except the output Program terminated normally (xxxx).
        So it's not a bug, it's just a result of how rh works.
        
        Correct. If you happen upon a breakpoint (temporary (gg), permanent (bb) hit or pass, or not managed by the debugger at all; that is "unexpected") then the RE output from the G command is also captured into the RH buffer. But no other output is ever written by G.
        
        In order to change this, internal logging of the steps would be necessary. But we've already discussed that. And with such an internal logging of the steps, for example, the execution of a debug session with g until the next brake point appears, would certainly slow down the execution if every step were logged internally. So that wouldn't be good either.
        
        This is actually what the silent buffer was originally created for. You can use tp ffffff silent 1 (the Fs are just to provide a very large repetition count) and the debugger will stay silent, record the last X steps into the silent/RH buffer, and then once the control flow returns to the user, the debugger will show only the very last step from the buffer. (Omit the number to show the full buffer contents. Omit the silent keyword clause entirely to display every step as it occurs.)
        
        Afterwards, you can use the RH commands like usual to inspect the last X steps still saved in the buffer.
        
        The execution is indeed much slower if you dump registers and disassemble for every step. Even just the overhead of the debugger's tracing (if you disable the register dump and disassembly) is enough to make this hundreds if not thousands of times slower than using the G command.
        
        Are there cases where it might be useful to keep the old outputs from before the L command? If not, then you could automatically initiate this uninstall rh and install rh with the L command.
        
        For questions like this I like to apply a small heuristic: Which choice can emulate the other one completely? If L did this, I would have to either add an option for L not to do this, or I would be unable to issue an L command without resetting the RH buffer. If L continues not to reset the buffer, you can work-around this by explicitly issuing these commands. So I won't change this.
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Oliver - 2023-07-27
        
        I triggered a bug in the new version 2023-07-24.
        Instead of my HELLO.EXE file for testing, i accidentally loaded my HELLO.ASM file.
        Then i entered g.
        This resulted in the following output:
        
        -g Invalid opcode AX=FFDE BX=0000 CX=9FFF DX=2000 SP=FFFF BP=FFF9 SI=FFFE DI=FFFF DS=1FFF ES=1FFF SS=1FFF CS=03AD IP=441B O_ D_ I_ S1 Z_ A_ P_ C_ 03AD:441B 2E8F06BC0D pop word [cs:0DBC] CS:0DBCS=3082 -
        
        Then i entered l
        
        -l Register dump history disabled. -g 10 ^ Error -q -Q -^C -
        
        From this state on, i can no more exit with q. I can print the help with ? and use commands like rh and u, but the latter two seem to produce garbage and errors at this state.
        Only a reboot of my FreeDOS VM helped.
        
        Gladly this bug is reproducible.
        I added my hello world asm file to this bug report so you can test it.
        
        I also tried to see, what happens, when i load my HELLO.EXE executable file.
        And there seems to be some issue with the character output after reloading the program with l:
        
        C:\TEMP>ldebug hello.exe &; Startup configuration file "LDEBUG.SLD" loaded Register dump history enabled. lDebug (2023-07-24) -g Hallo Welt! Program terminated normally (0024) -l -g 10 ╠allo Welt! Program terminated normally (0024) -q C:\TEMP>
        
        Here q quit works, but something changed the H to a ╠with the l load command. I also added the HELLO.EXE file, so that you can reproduce it.
        
        I will now test your new features, this will take a little bit.
        
        By the way, I have a new suggestion.
        Before loading a file into ldebug, how about checking the file extension, whether it's a .EXE, .COM, .ROM or .BIN file?
        And if it's something else, the user should be asked if he really wants to load that file. This can prevent accidental loading of non-executable files, like I did..
        
        So sth. like this:
        
        C:\TEMP>ldebug hello.asm File has the extension *.asm, do you really want to load that file (y/n)? n C:\TEMP\
        
        C:\TEMP>ldebug hello.img File has the extension *.img, do you really want to load that file (y/n)? y -
        
        And for EXE, COM, ROM and BIN files, everything remains the same as the original behavior.
        
        C:\TEMP>ldebug hello.exe -q C:\TEMP>ldebug hello.com -q C:\TEMP>ldebug hello.rom -q C:\TEMP>ldebug hello.bin -q
        
        HELLO.ASM
        
        HELLO.EXE
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Oliver - 2023-07-27
        
        BTW, this ╠ character seems to be moved along the string when the G go command points behind the last program address.
        
        -u 2041:0000 B84220 mov ax, 2042 ... 2041:000C B44C mov ah, 4C 2041:000E CD21 int 21 ... -g 0e Hallo Welt! AX=4C24... -t Program terminated normally (0024) -g 10 ╠allo Welt! Program terminated normally (0024) -l -g 11 H╠llo Welt! Program terminated normally (0024) -l -g 12 Ha╠lo Welt! Program terminated normally (0024) -l -g 19 Hallo Wel╠! Program terminated normally (0024)
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        E. C. Masloch - 2023-07-27
        
        This codepoint is what 0CCh, the int3 breakpoint instruction, looks like when you display it to your terminal. By running g 10 you happen to replace the "H" byte of your text by the 0CCh. This is not a bug, it is expected that your data can be corrupted by placing a temporary breakpoint into it.
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Oliver - 2023-07-27
        
        Hm interesting, the next instruction at offset 0010 after the int 21 instruction to exit to dos is a dec ax, so i assume it will never be executed with g 10.
        
        -u ... 2041:000C B44C mov ah, 4C 2041:000E CD21 int 21 2041:0010 48 dec ax ... -
        
        Otherwise shouldn't g 10 then cause AX to be decremented by 1?
        
        If i try to simulate it manually:
        
        -g 0E ; run the last instruction before int 21 Hallo Welt! AX=.... 2041:000E CD21 int 21 -p Program terminated normally (0024)
        
        Let's see where i am
        
        -r AX=0000 BX=0000 CX=0000... ....IP=0100 1FFF:0100 C3 retn
        
        It seems to be, that IP has changed pointing to 100h, which is something different than this dec AX. I assume it's probably a return point to some DOS routine or ldebug.
        
        -u 100 1FFF:0100 C3 retn 1FFF:0101 41 inc cx 1FFF:0102 53 pusb bx .... -
        
        And if i do this last step:
        
        -t AX=0000 BX=0000 CX=0000 DX=0000 SP=0000 BP=0000 SI=0000 DI=0000 DS=1FFF ES=1FFF SS=1FFF CS=1FFF IP=0000 O_ D_ I1 Z1 A_ P1 C_ 1FFF:0000 CD20 int 20 -
        
        All registers seem to be set to 0, including IP.
        
        My Hello World String is also unchanged and not touched.
        
        -S DS:0,FFFF "Hallo Welt!" 1FFF:0430 +0B 0D 0A 24 0A .... 0001 matches -
        
        So there seems to also be no write access to DS's memory.
        And also no output to stdout. And if that were the case, wouldn't the character have to be at the end of the string?
        
        I also tried to see, what happens when i do a traced execution:
        
        -TP 10 ... Hallo Welt! ... 2041:000E CD21 int 21 Program terminated normally (0024) -
        
        rh also ends at the Program terminated normally message.
        
        Why does g 10 execute another command at all? Shouldn't it end when int 21h ran?
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        E. C. Masloch - 2023-07-29
        
        It seems to be, that IP has changed pointing to 100h, which is something different than this dec AX. I assume it's probably a return point to some DOS routine or ldebug.
        
        Calling the DOS terminate process function (21.4C) will indeed return control flow to the parent process, which is the debugger. The debugger will then re-create an empty process.
        
        All registers seem to be set to 0, including IP.
        
        An empty process created by the debugger, as well as a process created from loading a flat-format .COM file, will have a word with the value zero on the stack. (Usually at word [SS:FFFEh].) If the program runs a retn instruction with this stack, it will jump to PSP:0, which holds an int 20h instruction which is another DOS terminate process call. IP and SP happen to be zero then, the other zeroes are from the process initialisation.
        
        So there seems to also be no write access to DS's memory.
        
        The PSP, the retn instruction, and the stack are re-initialised. The other data that may happen to be in the process segment are not overwritten. (But your DS may change from the prior process segment.)
        
        And also no output to stdout. And if that were the case, wouldn't the character have to be at the end of the string?
        
        I don't understand. What output do you expect?
        
        I also tried to see, what happens when i do a traced execution:
        
        TP 10 and G 10 are very different. One specifies to do up to 16 steps, unless interrupted by an unexpected fault. (This includes the process termination.) The other specifies to run code with a temporary breakpoint at address CS:0010h.
        
        Why does g 10 execute another command at all? Shouldn't it end when int 21h ran?
        
        Don't understand this either. Please detail exactly what command you used, what it did, and what you expected instead.
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Oliver - 2023-07-29
        
        An empty process created by the debugger, as well as a process created from loading a flat-format .COM file, will have a word with the value zero on the stack. (Usually at word [SS:FFFEh].) If the program runs a retn instruction with this stack, it will jump to PSP:0, which holds an int 20h instruction which is another DOS terminate process call. IP and SP happen to be zero then, the other zeroes are from the process initialisation.
        
        That's good to know. Thank you for your answer.
        
        The PSP, the retn instruction, and the stack are re-initialised. The other data that may happen to be in the process segment are not overwritten. (But your DS may change from the prior process segment.)
        
        Thanks.
        
        I don't understand. What output do you expect?
        
        Well, i was wondering about this '╠' character.
        Depending on the steps g has to take, it is written over the Hello World! in stdout.
        The string "Hello World!" is not changed in memory, so it can't be that. Also, the string is printed out before "int 21" is reached. Therefore, that character can only be printed after it, and in that case it overwrites the string instead of printing it at the cursor position after the string.
        Normally, however, I would assume here that a character is always output at the end of the cursor, unless an explicit cursor position has been selected or the cursor has been reset or changed. So after "Hello World!". But that doesn't seem to be the case here.
        It looks like the cursor does a carriage return and then, depending on the steps in g n , moves the cursor one character at a time to the right and only then prints that character.
        
        TP 10 and G 10 are very different. One specifies to do up to 16 steps, unless interrupted by an unexpected fault. (This includes the process termination.) The other specifies to run code with a temporary breakpoint at address CS:0010h.
        
        Ah, now i understand.
        Now i counted the steps g 10 has to take by listing U.
        I counted 7 steps + 1 outside the program after the termination ended to emulate that g 10. So this equals tp 8.
        
        With tp 8 i get at the same position as g 10, but this character '╠' is printed nowhere like it is done with g 10. Strange.
        
        Don't understand this either. Please detail exactly what command you used, what it did, and what you expected instead.
        
        Well i assumed or expected when a program terminates before reaching the brakepoint of g, then the g command stops before going to that breakpoint.
        But that seems to be not the case. G just continues with its mission printing ╠characters, until the additional steps taken correspond to the breakpoint value.
        
        The int 21 instruction to terminate the program and return to dos/ldebug is at IP = 0E and requires two bytes like expected.
        Thus the next command and where the breakpoint is set would be at IP = 10, but this is obviously never reached because of the retn, but G still continues.
        And between G 10 and G 19it prints this special character over the Hallo Welt! output.
        
        And depending on the first digit, the cursor is moved accordingly.
        It looks like the following: (Before every g command, i quit ldebug and restarted it with hello.exe loaded.)
        
        -g 0e Hallo Welt!... ; ends before int 21 like it should -p ... ; normal termination without a '╠' printed -g 0f Hallo Welt Invalid opcode AX=0005... ...IP=0053... 2041:0053 6345D7 arpl [di-29],ax DS:D660=0000 ; interesting where this have us taken. Strange new worlds. :) -g 10 ╠allo Welt!... -g 11 H╠llo Welt!... -g 12 Ha╠lo Welt!... -g 13 Hal╠o Welt!... -g 14 Hall╠ Welt!... -g 15 Hallo╠Welt!... -g 16 Hallo ╠elt!... -g 17 Hallo W╠lt!... -g 18 Hallo We╠t!... -g 19 Hallo Wel╠!... -g 1A Hallo Welt╠... -g 1B Hallo Welt!╠... -g 1C ╠allo Welt!... ; overrund, starts at position of H again -g 1D Hallo Welt! ╠ .... ; prints the character in a new line and after that in the next line a lot of garbage -g 1E Hallo Welt!... ; everything looks fine from here. -g 1F Hallo Welt!... -g 20 Hallo Welt!...
        
        Interestingly, whatever the value of n in G n has, as long as n is greater than 0E, g stops always execution before the retn instruction. This means, if i run r, retn is the next instruction.
        But, this ╠ character is still printed.
        
        If i count that ╠ steps, they correspond to the length of the "Hallo Welt!" string + its CR and LF and "$" character.
        In the asm code, the string is this:
        
        .DATA STRING DB "Hallo Welt!", 13, 10, "$"
        
        Thus 14 bytes. And this ╠ character is printed 14 times between g 10 and g 1D.
        
        -h 1D 10 002D 000D -h D 000D decimal: 13
        
        There is the relation. The length of the string determines the maximum steps this ╠ character takes in stdout.
        
        My point is, g 10 to g 1D should just end after the program termination and not trying to print a ╠ character.
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        E. C. Masloch - 2023-07-30
        
        Well, i was wondering about this '╠' character.
        Depending on the steps g has to take, it is written over the Hello World! in stdout.
        
        The parameter (number) that you give to G is not "the steps G has to take". The parameter for G is completely different from the parameter for T/TP/P.
        
        https://pushbx.org/ecm/doc/ldebug.htm#cmdg :
        
        The G command allows specifying breakpoints, which are either segmented addresses (86M or PM addresses depending on DebugX's mode) or linear addresses prefixed by an "@ " or "@(", similar to how the BP command allows a breakpoint specification. G breakpoints are identified by their position in the command line, as the 1st, 2nd, 3rd, etc. By default, 16 G breakpoints are supported.
        
        https://pushbx.org/ecm/doc/ldebug.htm#cmdp :
        
        a count may be specified, which causes the command to execute as many P steps as the count indicates.
        
        The fact that you specify a single number to G does not make this a "step count" like the (single) P parameter. It is simply parsed as a single breakpoint specification, writing a temporary breakpoint at the specified segmented address, parsed with a default segment of the current CS.
        
        The string "Hello World!" is not changed in memory, so it can't be that. Also, the string is printed out before "int 21" is reached. Therefore, that character can only be printed after it, and in that case it overwrites the string instead of printing it at the cursor position after the string.
        
        This is wrong. The "G 10" command works like the following sequence. It saves the original contents of the byte [cs:10] then overwrites it with a 0CCh (int3) opcode.
        
        r v0 = byte [cs:10] r byte [cs:10] = CC r v1 = cs r v2 = 10 g r byte [v1:v2] = v0
        
        If you enter the first two commands into the debugger you can observe that the memory is changed during this sequence:
        
        Welcome to dosemu2! Build 2.0pre9-dev-20230728-1370-g988effd35 lDebug (2023-07-30) -d cs:10 l 10 2E6D:0010 48 61 6C 6C 6F 20 57 65-6C 74 21 0D 0A 24 00 00 Hallo Welt!..$.. -r v0 = byte [cs:10] -r byte [cs:10] = CC -d cs:10 l 10 2E6D:0010 CC 61 6C 6C 6F 20 57 65-6C 74 21 0D 0A 24 00 00 .allo Welt!..$.. -g ╠allo Welt! Program terminated normally (0024) -
        
        Normally, however, I would assume here that a character is always output at the end of the cursor, unless an explicit cursor position has been selected or the cursor has been reset or changed. So after "Hello World!". But that doesn't seem to be the case here.
        It looks like the cursor does a carriage return and then, depending on the steps in g n , moves the cursor one character at a time to the right and only then prints that character.
        
        This would be right if the memory indeed wasn't changed during the int 21h service 09h call, but as it is all of this is irrelevant.
        
        Ah, now i understand.
        Now i counted the steps g 10 has to take by listing U.
        I counted 7 steps + 1 outside the program after the termination ended to emulate that g 10. So this equals tp 8.
        
        With tp 8 i get at the same position as g 10, but this character '╠' is printed nowhere like it is done with g 10. Strange.
        
        Not strange at all. TP commands will trace/proceed past single instructions, so the only breakpoints it writes will be like after int instructions or the like. These will all be in the code section of your program, except for the breakpoint after the final int 21h (the one running function 4Ch). But that final breakpoint, while it does overwrite the "H" of your message, will only be set while executing the final int call, not while the function 09h call occurs earlier.
        
        Well i assumed or expected when a program terminates before reaching the brakepoint of g, then the g command stops before going to that breakpoint.
        But that seems to be not the case. G just continues with its mission printing ╠characters, until the additional steps taken correspond to the breakpoint value.
        
        No. As explained before, the temporary breakpoint you are writing is in the data part of your program. After DOS returns to the debuggee process's PRA (Parent Return Address), that is to the debugger, the G command will finally restore its breakpoints if possible (if they haven't been overwritten yet). Then the G command detects that no pass or non-pass bb point was executed, but rather that the PRA was entered, so it will display the message about "Program terminated normally" and return to the debugger command line. (The cmd3 command loop of the debugger will re-create an empty process when it detects that the current process has terminated.)
        
        The int 21 instruction to terminate the program and return to dos/ldebug is at IP = 0E and requires two bytes like expected.
        Thus the next command and where the breakpoint is set would be at IP = 10, but this is obviously never reached because of the retn, but G still continues.
        And between G 10 and G 19it prints this special character over the Hallo Welt! output.
        
        This happens because your example's data section can be addressed from your example's code segment (behind the end of your code section). If something else, such as the bottom of the stack section, was at this address then you wouldn't get the same result.
        
        And depending on the first digit, the cursor is moved accordingly.
        It looks like the following: (Before every g command, i quit ldebug and restarted it with hello.exe loaded.)
        
        If it doesn't crash you can reload the program by using the no-parameter "L" command. (Or in the case of loading eg "HELLO.ASM", you can use the "QA" command then the "L" command. I noticed that the behaviour of the debugger differs for this case as it will not re-initialise the CS:IP registers then. This is a holdover from MSDebug.)
        
        ~~~
        -g 0f
        Hallo Welt
        Invalid opcode
        AX=0005...
        ...IP=0053...
        2041:0053 6345D7 arpl [di-29],ax DS:D660=0000
        ; interesting where this have us taken. Strange new worlds. :)
        ~~~
        
        This is because you overwrote the second byte of the int 21h (CDh 21h) instruction with the CCh byte (int3 single-byte instruction). So you just changed int 21h to int 0CCh which presumably crashes sooner or later.
        
        ~~~
        -g 1D
        Hallo Welt!
        ╠
        .... ; prints the character in a new line and after that in the next line a lot of garbage
        ~~~
        
        In this case you overwrote the dollar-sign (U+0024) terminator with the 0CCh so after the CR LF linebreak you get that point and then garbage afterwards as DOS continues to display things until it randomly encounters a 24h byte.
        
        Interestingly, whatever the value of n in G n has, as long as n is greater than 0E, g stops always execution before the retn instruction. This means, if i run r, retn is the next instruction.
        But, this ╠ character is still printed.
        
        Exactingly, G stops execution once the program control flow enters the debugger's PRA it set up for the debuggee process. The command line loop of the debugger then detects that the process terminated and subsequently initialises an empty process with the retn instruction. This may or may not happen to be at the same segment address as your program's process was.
        
        My point is, g 10 to g 1D should just end after the program termination and not trying to print a ╠ character.
        
        As explained, there is no bug here. You're happening to access your data section using your program's code segment, and that corrupts your string data. The G command does not run anything after the program terminated. (Except for the creation of a new empty process, but that's in the debugger's cmd3 command loop.)
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Oliver - 2023-08-03
        
        The parameter (number) that you give to G is not "the steps G has to take". The parameter for G is completely different from the parameter for T/TP/P.
        
        Yes, these are addresses, but with steps i meant complete instruction steps.
        I also said:
        
        Now i counted the steps g 10 has to take by listing U.
        I counted 7 steps + 1 outside the program after the termination ended to emulate that g 10. So this equals tp 8.
        
        So i took that into account, that these are actual addresses.
        I probably worded it a bit unhappily
        
        The fact that you specify a single number to G does not make this a "step count" like the (single) P parameter. It is simply parsed as a single breakpoint specification, writing a temporary breakpoint at the specified segmented address, parsed with a default segment of the current CS.
        
        I agree.
        
        This is wrong. The "G 10" command works like the following sequence. It saves the original contents of the byte [cs:10] then overwrites it with a 0CCh (int3) opcode.
        
        r v0 = byte [cs:10]
        r byte [cs:10] = CC
        r v1 = cs
        r v2 = 10
        g
        r byte [v1:v2] = v0
        
        If you enter the first two commands into the debugger you can observe that the memory is changed during this sequence:
        
        Thank you for the clarification.
        Now i understand it.
        
        Suggestion:
        How about an additional column when outputting the assembler listing with the u key?
        There is still free space on the right side and you could display the opcodes in the second column as ASCII characters in this new last column.
        This would immediately show in cases where the code and data segment are the same when the data section starts and if there is a readable string present.
        
        Example:
        
        -u C 2041:000C B44C mov ah, 4C .L 2041:000E CD21 int 21 .! 2041:0010 48 dec ax H 2041:0011 61 popa a 2041:0012 6C insb l 2041:0013 6C insb l 2041:0014 6F outsw o 2041:0015 205765 and [bx+65], dl We 2041:0018 6C insb l 2041:0019 7421 jz 003C t! ... -
        
        In this u listing, it is clearly visible, that at offset 0010 begins the data area with a 'H' character and when read vertically it is the string "Hallo Welt!".
        And in the code section it is also usable to directly see, what kind of ASCII character is copied into or from a register.
        Example:
        
        1234:0000 B433 mov ah, 33 .3
        
        It might be only a nice to have feature, but if the code size allows it and doesn't consume too much RAM, it might still be useful.
        
        Not strange at all. TP commands will trace/proceed past single instructions, so the only breakpoints it writes will be like after int instructions or the like. These will all be in the code section of your program, except for the breakpoint after the final int 21h (the one running function 4Ch). But that final breakpoint, while it does overwrite the "H" of your message, will only be set while executing the final int call, not while the function 09h call occurs earlier.
        
        and
        
        No. As explained before, the temporary breakpoint you are writing is in the data part of your program. After DOS returns to the debuggee process's PRA (Parent Return Address), that is to the debugger, the G command will finally restore its breakpoints if possible (if they haven't been overwritten yet). Then the G command detects that no pass or non-pass bb point was executed, but rather that the PRA was entered, so it will display the message about "Program terminated normally" and return to the debugger command line. (The cmd3 command loop of the debugger will re-create an empty process when it detects that the current process has terminated.)
        
        I understand. Thank you for clarification.
        
        This happens because your example's data section can be addressed from your example's code segment (behind the end of your code section). If something else, such as the bottom of the stack section, was at this address then you wouldn't get the same result.
        
        and
        
        In this case you overwrote the dollar-sign (U+0024) terminator with the 0CCh so after the CR LF linebreak you get that point and then garbage afterwards as DOS continues to display things until it randomly encounters a 24h byte.
        
        I understand and i agree.
        
        Exactingly, G stops execution once the program control flow enters the debugger's PRA it set up for the debuggee process. The command line loop of the debugger then detects that the process terminated and subsequently initialises an empty process with the retn instruction. This may or may not happen to be at the same segment address as your program's process was.
        
        and
        
        As explained, there is no bug here. You're happening to access your data section using your program's code segment, and that corrupts your string data. The G command does not run anything after the program terminated. (Except for the creation of a new empty process, but that's in the debugger's cmd3 command loop.)
        
        Thank you for your clarification.
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        E. C. Masloch - 2023-07-29
        
        The hello.asm file does crash eventually for me if I use T commands, but QA, L, and Q commands still work at that point. If I do use the G command it crashes my dosemu2 machine, either directly (-E "ldebug c:\bin\lddebug.com hello.asm") or similarly to your case I get an "Invalid opcode" fault and then the debugger doesn't work correctly any longer (-E "ldebug.com hello.asm").
        
        However, this is unlikely to be due to a debugger bug. When the machine crashes, it can corrupt parts of its own process, of the DOS, or of the debugger.
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Oliver - 2023-07-29
        
        However, this is unlikely to be due to a debugger bug. When the machine crashes, it can corrupt parts of its own process, of the DOS, or of the debugger.
        
        I agree on that if random data is loaded as program code, strange things can happen if that data is executed as code.
        But that's why i recommended to check the file extension that is loaded to inform the user and prevent such accidents by asking him again, if he is sure if he wants to load data as program code.
        
        The crash of dosemu2 shows, that dosemu2 is working differently like the emulator QEMU i use. In QEMU, loading this data in ldebug allows to do random stuff inside the memory of the emulated machine, without crashing the VM.
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        E. C. Masloch - 2023-07-30
        
        But that's why i recommended to check the file extension that is loaded to inform the user and prevent such accidents by asking him again, if he is sure if he wants to load data as program code.
        
        I understand where you're coming from, but I think I'll just chalk this up to user error. You were able to figure out the cause of your problem by yourself, after all. I could add an option for this but I would not want to include this code in the default build, as the lCDebugX build is very close to 65_536 bytes of the code segment being filled, already. So if you wanted this option either you would have to build the debugger yourself or I would have to add a special build with options like this one enabled. (Trivially possible, but a bit more work for me. And would have to decide what to call this build.)
        
        The crash of dosemu2 shows, that dosemu2 is working differently like the emulator QEMU i use. In QEMU, loading this data in ldebug allows to do random stuff inside the memory of the emulated machine, without crashing the VM.
        
        Yes, dosemu2 is not as robust as qemu in this regard I would guess.
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link: