Menu

#118 DEBUG - Scrolling or a history command showing the previous output or register states.

open
nobody
DEBUG (27)
5
2023-12-12
2023-07-08
Oliver
No

Could you implement a scrolling or history function in DEBUG from FreeDOS that shows the previous stepping output or the previous register states?

Discussion

1 2 3 .. 5 > >> (Page 1 of 5)
  • Oliver

    Oliver - 2023-07-08

    @C.Masloch

    Discussion about LDEBUG and these two new feature requests.

    Basically they are:
    2)
    Scrolling support for the output of a LDEBUG session. So that you can scroll some pages back and see the output of previous debugging steps.
    But i don't know if that is possible to implement in DOS, that's why i also have suggestions 3:

    3)
    A history function in LDEBUG that shows the previous register states of previous steps.
    Here I could imagine two possibilities, either you only save the register states in the history that have changed from the last n steps and reconstruct the rest for the output starting from the current register status.
    Or you save all register states of the last n steps completely.
    The former takes up more space in the code segment, the latter in the data segment. And you said that you have still some free space left in the data segment, so this could be used for such a helpful function.
    The history is only about showing the previous register states of the last n steps, so it is not meant to be a rollback of the program to a previous program execution state.
    If it is also possible to store the output of the debugged program itself, this would be even better. But i don't know if that is possible.

    The output of the register state history should be page by page, but also selective.
    As a command name I would suggest dh as dump history. As an alternative history or sh for show history could also be possible

    Example:

    - dh
    ... ; all last n register states are shown counting from current register status which is step 0 back to step n.
    - dh 5
    ... ; Counting from the current status, the previous register status of step 5 backwards is displayed.
    AX=0000 BX=0000 ; etc....
    DS=1F8B ES=1F8B ; etc....
    1F8B:0100   C3          retn
    - dh 7-10
    ... ; Counting from the current status, the previous register status of steps 7 to 10 backwards are displayed.
    - dh 7, 10
    ... ; Counting from the current status, the previous register status of step 7 and 10 is shown.
    
     
    • E. C. Masloch

      E. C. Masloch - 2023-07-09

      3)
      A history function in LDEBUG that shows the previous register states of previous steps.
      Here I could imagine two possibilities, either you only save the register states in the history that have changed from the last n steps and reconstruct the rest for the output starting from the current register status.
      Or you save all register states of the last n steps completely.
      The former takes up more space in the code segment, the latter in the data segment. And you said that you have still some free space left in the data segment, so this could be used for such a helpful function.
      The history is only about showing the previous register states of the last n steps, so it is not meant to be a rollback of the program to a previous program execution state.

      I implemented the RH mode and corresponding RH command.

      If it is also possible to store the output of the debugged program itself, this would be even better. But i don't know if that is possible.

      Possible yes, at least for plain text output, but you would have to hook int 21h and/or int 10h. I think debugging with a serial terminal or with video screen swapping are alternatives to this though.

      The output of the register state history should be page by page, but also selective.
      As a command name I would suggest dh as dump history. As an alternative history or sh for show history could also be possible

      I chose RH for Register (dump) History. You have to enable RH mode using install rh. Then any subsequent R, RE, or T/P/G command output is stored in the auxiliary buffer, which is currently a fixed size of about 8 KiB. (Can be made larger, but for now only with a build time option. Considering a startup time option for a larger size.) Unlike your suggestion this stores the text displayed rather than the register values in binary.

      The benefit is that other messages like permanent breakpoint hits and the disassembly, including jumping and memory access notices, are also preserved. And it was fairly easy to implement, re-using a lot of the silent tracing buffering code.

      The big disadvantage is the memory use. At 8 KiB we can store about forty steps at a time, steps older than that are lost. Each dump (using default RE buffer content) comes in at about 200 bytes. Storing 8086 registers (excluding the 386 parts) would be about 8 times smaller, but that's without the disassembly.

      Anyway, the RH command has three modes:

      • Just RH: Display entire saved contents.
      • RH number: Display one dump from saved content. With 0 the most recent dump is displayed, 1 is the second-most recent dump, etc. If too high then the oldest dump is shown.
      • RH number number: First number is start, same as for single-parameter form. Second number is how many dumps to display at most.

      The output of RH is paged by default. DCO3 options for the silent dump can be modified to disable paging:

      0100 T/TP/P: modify paging for silent dump
      0200 T/TP/P: if 0100 set: turn paging on, else off

      So you want r dco3 = dco3 clr 200 or 100 to disable paging for RH.

      To use the RH command you should enable RH mode using install rh. (This command will clear the RH buffers when enabling RH mode.) However, this disables commands that require the auxiliary buffer, like RN, RM, re.replace, DIL, and S. Boot loaded mode Y script file reading also requires the auxiliary buffer. The Q command will disable RH mode in order to operate using the auxiliary buffer.

       
      • Oliver

        Oliver - 2023-07-10

        I implemented the RH mode and corresponding RH command.

        These are great news. I have never had the case that feature requests are implemented as quickly as yours. My highest respect and a big thank you.

        Possible yes, at least for plain text output, but you would have to hook int 21h and/or int 10h. I think debugging with a serial terminal or with video screen swapping are alternatives to this though.

        Ok, It doesn't matter.

        The benefit is that other messages like permanent breakpoint hits and the disassembly, including jumping and memory access notices, are also preserved. And it was fairly easy to implement, re-using a lot of the silent tracing buffering code.

        Sounds good.

        At 8 KiB we can store about forty steps at a time, steps older than that are lost.

        40 steps is more than enough. I assumed 20, which should be sufficient in most cases. But 40 is even better.

        RH number number: First number is start, same as for single-parameter form. Second number is how many dumps to display at most.

        I noticed, that if you start with a smaller number than an older step only one step is printed.
        For example RH 0 3. This prints only step 0.

        Also there seems to be a bug with the ordering.
        See the screenshots. There i did a:

        - rh 3
        ; prints step 3 which is mov ah 09
        - rh 1 3
        ; prints only step 0 and 1, but not 2 (first int 21) and 3 (mov ah 09)
        - rh 1 2
        ; prints step 0 and 1, but not 1 (mov ah, 4C) and 2 (first int 21)
        - rh
        ; prints all steps, expect the first one.
        

        Anyway, the RH command has three modes:

        Is it possible to add a fourth mode, where an enumeration of certain steps is given and a separator is used.
        Example:

        RH 3, 5, 8, 10
        ; prints steps 10, 8, 5, 3
        

        This could be handy if steps 9, 7, 6, and 4 are not important and thus should not be shown.
        You could print them individually of course, but this will always take one line for the input of the next RH number command.

        To use the RH command you should enable RH mode using install rh. (This command will clear the RH buffers when enabling RH mode.) However, this disables commands that require the auxiliary buffer, like RN, RM, re.replace, DIL, and S.

        I could live with not being able to use RM, it's quite unlikely that i will debug DOS programs that use MMX registers. I am unsure about RNand DIL.

        But no S is definitely a loss.
        How much work would it be to write a unified register output function that takes a struct of register values as binary values (in C language expression integers and pointers), as well as the other information you said, like permanent breakpoint hits, the disassembly, including jumping and memory access notices, and a step number and converts them to ASCII text and outputs them?

        So you could use this function for all commands t,p, r and rh and you could save a lot of storage space in the auxiliary buffer, so that S, RN, RM,RHand DIL would be possible again at the same time.

        It also seems that S search does have a regression (INSTALL RH is not used here). Its HEX and ASCII output doesn't match that from D dump. Basically the Hex numbers and ASCII text of the search string is not printed and ignored, which leads to having start the output at the wrong memory address.
        I made a screenshot of this, there it is easily visible.

        The Q command will disable RH mode in order to operate using the auxiliary buffer.

        Did you mean Q RH or QRH or Q H?
        Using these leads to an error message and only Q exits LDEBUG.

         
        • E. C. Masloch

          E. C. Masloch - 2023-07-10

          40 steps is more than enough. I assumed 20, which should be sufficient in most cases. But 40 is even better.

          The number may be a little lower if you have many memory references, or if you use the register change highlighting as you do.

          I noticed, that if you start with a smaller number than an older step only one step is printed.
          For example RH 0 3. This prints only step 0.

          Yes, this is intended.

          Also there seems to be a bug with the ordering.
          See the screenshots. There i did a:

          I don't think there is any bug here. The steps are:

          5 (oldest) = mov ds, ax
          4 = mov dx, 0
          3 = mov ah, 09
          2 = int 21
          1 = mov ah, 4C
          0 (most recent) =int 21

          • rh 3
            ; prints step 3 which is mov ah 09

          Correct, and as intended.

          • rh 1 3
            ; prints only step 0 and 1, but not 2 (first int 21) and 3 (mov ah 09)

          Also correct. The start step is 1, and then up to 3 steps starting from it are displayed. As there are only 2, steps 1 then 0 are displayed.

          • rh 1 2
            ; prints step 0 and 1, but not 1 (mov ah, 4C) and 2 (first int 21)

          Your description is wrong, this does display step 1 (mov ah, 4C) and also step 0 the second int 21. The display in your screenshot is exactly what I intended.

          (Writing of which, on Linux dosemu2 you can copy text from the dosemu2 graphical window when pressing Shift it seems.)

          • rh
            ; prints all steps, expect the first one.

          If you want to include a dump from before the first trace, run install rh then a plain r command before t commands.

          Is it possible to add a fourth mode, where an enumeration of certain steps is given and a separator is used.
          Example:

          RH 3, 5, 8, 10
          ; prints steps 10, 8, 5, 3

          This could be handy if steps 9, 7, 6, and 4 are not important and thus should not be shown.
          You could print them individually of course, but this will always take one line for the input of the next RH number command.

          As a workaround you could use rc.replace @rh 3; @rh 5; @rh 8; @rh 10 then run rc. (No, wait, you cannot use rc.replace while RH mode is active. Hmm.) However, I can probably squeeze an RH IN number, number, ... syntax into the code segment later.

          But no S is definitely a loss.

          I prepared a small patch today https://hg.pushbx.org/ecm/ldebug/rev/14a4c72ffbab which allows to use S while in RH mode. It uses the (new) WHILE buffer, which means S called from T/TP/P with a WHILE condition with RH mode or silent tracing enabled will fail. However, outside use in the RE buffer the S command will always work now.

          How much work would it be to write a unified register output function that takes a struct of register values as binary values (in C language expression integers and pointers), as well as the other information you said, like permanent breakpoint hits, the disassembly, including jumping and memory access notices, and a step number and converts them to ASCII text and outputs them?
          So you could use this function for all commands t,p, r and rh and you could save a lot of storage space in the auxiliary buffer, so that S, RN, RM,RHand DIL would be possible again at the same time.

          I am considering a different form of compression but either way will probably eat a lot of code space size.

          It also seems that S search does have a regression (INSTALL RH is not used here). Its HEX and ASCII output doesn't match that from D dump. Basically the Hex numbers and ASCII text of the search string is not printed and ignored, which leads to having start the output at the wrong memory address.
          I made a screenshot of this, there it is easily visible.

          This is intended. It is expected that you already know the search string, so the S dump displays the 16 bytes after the end of the search pattern match. This is documented in
          https://pushbx.org/ecm/doc/ldebug.htm#cmdhttps://pushbx.org/ecm/doc/ldebug.htm#cmds And this is not a regression: Even if you consider it a bug, it always worked like this so it didn't regress at any point. (The term "regression" to me means something that used to work no longer working the same way.)

          The Q command will disable RH mode in order to operate using the auxiliary buffer.

          Did you mean Q RH or QRH or Q H?
          Using these leads to an error message and only Q exits LDEBUG.

          No, to disable RH mode without quitting the debugger just use uninstall rh. I meant that a plain Q command asks for the auxiliary buffer, which is in use if RH mode is enabled. Instead of failing the Q command with an error message (which annoyed me), Q will now disable RH mode first before attempting to carry out the Q command's intended task which is to quit the debugger. (If the Q command fails then RH mode stays disabled.)

          I also made a small change today so that the QA command will no longer need the auxiliary buffer nor disable RH mode. https://hg.pushbx.org/ecm/ldebug/rev/5e51cb3e6dc7

           
          • Oliver

            Oliver - 2023-07-10
             rh 1 3
             ; prints only step 0 and 1, but not 2 (first int 21) and 3 (mov ah 09)
            

            Also correct. The start step is 1, and then up to 3 steps starting from it are displayed. As there are only 2, steps 1 then 0 are displayed.

            That's confusing. The last step, most recent is 0 in the history, counting backwards from there, why not just printing step 1 to step 3:
            3 = mov ah, 09
            2 = int 21
            1 = mov ah, 4C

            and not:
            1 =mov ah, 4C
            0 (most recent) =int 21

            From a user perspective the question from the program to the user should be:

            "What steps do you want?"

            And the user answers:

            "Steps 1 to 3 (which means 3 is included so a <= 3, not < 3) "

            And then the program should print steps 1 to 3, not 0 and 1.
            The latter is quite confusing and complicates more than required.

                    rh 1 2
                    ; prints step 0 and 1, but not 1 (mov ah, 4C) and 2 (first int 21)
            

            Your description is wrong, this does display step 1 (mov ah, 4C) and also step 0 the second int 21. The display in your screenshot is exactly what I intended.

            No, with first int 21 i meant the first int 21in the program execution.
            According to your above list, this would be:
            2 = int 21 where AX = 09E7, after step 3 with (mov ah, 09)

            And when entering rh 1 2 i expected:
            2 = int 21 where AX = 09E7
            1 =mov ah, 4C

            Not:
            1 = mov ah, 4C
            0 (most recent) =int 21 where AX = 4C24

            (Writing of which, on Linux dosemu2 you can copy text from the dosemu2 graphical window when pressing Shift it seems.)

            That's good to know. Thank you for the information.

            If you want to include a dump from before the first trace, run install rh then a plain r command before t commands.

            Ah, i understand.
            Suggestion:
            Is it possible to do a call of r without printing an output under the hood after entering install rh, that way the history is already fed with the first r until it gets overwritten after n steps.

            However, I can probably squeeze an RH IN number, number, ... syntax into the code segment later.

            Sounds okay, it's better than not having that mode available.

            I prepared a small patch today https://hg.pushbx.org/ecm/ldebug/rev/14a4c72ffbab which allows to use S while in RH mode. It uses the (new) WHILE buffer, which means S called from T/TP/P with a WHILE condition with RH mode or silent tracing enabled will fail. However, outside use in the RE buffer the S command will always work now.

            That sounds better. I will need to test it tomorrow when the new binary is compiled and available.

            I am considering a different form of compression but either way will probably eat a lot of code space size.

            If you ask me, if feature after feature is added, it's sometime a good idea to do a refactoring. That would also be a good opportunity to add the error code and error messages thing.
            I would use a uniform output for R, RH and what else goes with it and work internally with the data in binary form, as described, so i would not save it as ASCII. I would only convert to ASCII at the end of the output. So you could work internally with binary data without much overhead.
            I could imagine that such a model would also be better for later expansions, whatever that may be.

            A simple line of Rthat looks like:

            AX=2031 BX=0000 CX=0004 DX=0000 SP=0100 BP=0000 SI=0000 DI=000A
            DS=20EE ES=20EE SS=20F0 CS=20E6 IP=001C 0_ D_ I_ S_ Z_ A1 P_ C_
            20E6:001C 743F           jz       005D                     not jumping
            

            Seems to require 64 Bytes for the first and second line and 80 Bytes, if tabs are not used, for the third line. That's 208 Bytes if stored as ASCII/Codepage characters.

            If every register is stored in binary form it's only:
            13 registers x 2 Bytes + 2 Bytes for the Flag registers, 6 Bytes for the instruction (includes 1 Byte to 6 Byte long instruction) = 34 Bytes for these 3 line and a 8086 and the rest could be reconstructed from this information during the output.
            This would leave additional space for the FPU registers, the 2 new segment registers of the 386, 2 bytes wider flag register and its 4 byte wide extended registers, and the MMX register of the Pentium MMX.
            And with smart, i.e. conditional storage, you could omit the MMX and FPU registers if not needed.

            Code size would increase a little but, but not much, because the output function is required anyway and a more flexible one will not add that much.

            This is intended. It is expected that you already know the search string, so the S dump displays the 16 bytes after the end of the search pattern match. This is documented in..

            I understand. Ok then it is my fault. I thought the search pattern is also shown, so that the user knows, that this search string is really in the shown memory range.

            And this is not a regression: Even if you consider it a bug, it always worked like this so it didn't regress at any point. (The term "regression" to me means something that used to work no longer working the same way.)

            You are correct, but my assumption was, that it was different before. And here i was wrong. But that's the reason why i used the term regression.

            But the address value or the output is still wrong when you compare the output of the D dump with the output of the S search .
            The output of the search command gives the impression that the word "Welt" starts at address 20D6:0111.

            -S DS:0,FFFF "Hallo"
            20D6:0110  20 57 65 6C 74 21 0D 0A-24 4D 89 81 53 C9 41 B8  Welt!..$M..S.A.
                       0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F     
            

            But the dump command shows, that it is actually starting at 20D6:0116, because the W is where the 6h is.

            -D DS:110,120
            20D6:0110  48 61 6C 6C 6F 20 57 65-6C 74 21 0D 0A 24 4D 89  Hallo Welt!..$M. 
                       0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F 
            

            So to fix that, either the offset address in the output of the search command must be adjusted or the output from hex value 0 to 5 must be filled with blanks, because W starts at 6h.
            If you want omit the search patter in the output, i would consider this a correct output, as one is used to from Dump.

            -S DS:0,FFFF "Hallo"
            20D6:0110                 20 57 65-6C 74 21 0D 0A 24 4D 89        Welt!..$M. 
                       0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F     
            

            No, to disable RH mode without quitting the debugger just use uninstall rh

            I understand.

            I also made a small change today so that the QA command will no longer need the auxiliary buffer nor disable RH mode. https://hg.pushbx.org/ecm/ldebug/rev/5e51cb3e6dc7

            Sounds good.

             

            Last edit: Oliver 2023-07-10
            • E. C. Masloch

              E. C. Masloch - 2023-07-11

              That's confusing. The last step, most recent is 0 in the history, counting backwards from there, why not just printing step 1 to step 3:

              You can use rh in from 1 length 3 for that. I'm not convinced to change the default behaviour because it was very simple to implement based on the silent trace buffering.

              Suggestion:
              Is it possible to do a call of r without printing an output under the hood after entering install rh, that way the history is already fed with the first r until it gets overwritten after n steps.

              I would not want that always, so this would need an additional option. Furthermore, it would require more handling to do a silent R command dump. The workaround of needing to run R yourself is not too bad I think.

              That sounds better. I will need to test it tomorrow when the new binary is compiled and available.

              I also modified rc.replace today in the same way to allow using it in RH mode.

              If you ask me, if feature after feature is added, it's sometime a good idea to do a refactoring. That would also be a good opportunity to add the error code and error messages thing.

              It's important to me to work in small steps and commit changesets often rather than prepare large far-reaching changes. This is why the error messages will probably be handled by first adding the new functions, then incrementally changing users to make use of it.

              I would use a uniform output for R, RH and what else goes with it and work internally with the data in binary form, as described, so i would not save it as ASCII. I would only convert to ASCII at the end of the output. So you could work internally with binary data without much overhead.
              I could imagine that such a model would also be better for later expansions, whatever that may be.

              A simple line of Rthat looks like:

              AX=2031 BX=0000 CX=0004 DX=0000 SP=0100 BP=0000 SI=0000 DI=000A
              DS=20EE ES=20EE SS=20F0 CS=20E6 IP=001C 0_ D_ I_ S_ Z_ A1 P_ C_
              20E6:001C 743F jz 005D not jumping

              Seems to require 64 Bytes for the first and second line and 80 Bytes, if tabs are not used, for the third line. That's 208 Bytes if stored as ASCII/Codepage characters.

              If every register is stored in binary form it's only:
              13 registers x 2 Bytes + 2 Bytes for the Flag registers, 6 Bytes for the instruction (includes 1 Byte to 6 Byte long instruction) = 34 Bytes for these 3 line and a 8086 and the rest could be reconstructed from this information during the output.
              This would leave additional space for the FPU registers, the 2 new segment registers of the 386, 2 bytes wider flag register and its 4 byte wide extended registers, and the MMX register of the Pentium MMX.
              And with smart, i.e. conditional storage, you could omit the MMX and FPU registers if not needed.

              Code size would increase a little but, but not much, because the output function is required anyway and a more flexible one will not add that much.

              Actually, this will require two different forms of encoding the registers, one into the intermediate binary format and another from that to the text output. This would be a considerably large change. RH mode and the silent buffer would be the only users of this.

              You are correct, but my assumption was, that it was different before. And here i was wrong. But that's the reason why i used the term regression.

              Yes, it would be accurate then.

              But the address value or the output is still wrong when you compare the output of the D dump with the output of the S search .
              The output of the search command gives the impression that the word "Welt" starts at address 20D6:0111.

              This is not ideal, but does work as intended. A 16-bit segment search could add a small hint like +NNNN where the number would be the length of the search string. However, this would not fit for 32-bit segments, or I'd have to shorten the dump to fit this in the line.

              But the dump command shows, that it is actually starting at 20D6:0116, because the W is where the 6h is.

              Hint: If you use r dco2 or= 333 the debugger will draw headers and trailers with the offsets in D/DB/DW/DD commands.

              So to fix that, either the offset address in the output of the search command must be adjusted or the output from hex value 0 to 5 must be filled with blanks, because W starts at 6h.
              If you want omit the search patter in the output, i would consider this a correct output, as one is used to from Dump.

              -S DS:0,FFFF "Hallo"

              20D6:0110                 20 57 65-6C 74 21 0D 0A 24 4D 89        Welt!..$M. 
                         0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F     
              

              This would lessen the use of the data dump the longer the search pattern gets, with a completely blank dump eventually for 16-byte or longer patterns. So I won't do this.

              (By the way, you can omit the DS: prefix for a search range as ds is already the default.)

               
              • Oliver

                Oliver - 2023-07-21

                Sorry if it took a little longer to answer that. I just needed a break from assembly programming the last few days.

                You can use rh in from 1 length 3 for that. I'm not convinced to change the default behaviour because it was very simple to implement based on the silent trace buffering.

                rh in from 1 length 3
                

                works, but

                rh 1 3
                

                is still confusing, because the last step seems to be counted as step 1, not step 0 like we programmers are used to and how rh with only one number given works.
                Thus step 0 would be the next step, which isn't taken so far.

                I wrote a new assembly program wich looks like this in the code section.
                Basically it increments AX until AX reaches 5. It saves me some work to write the output here because I only have to write the AX register.:

                START: MOV AX, 0
                       INC AX
                       INC AX
                       INC AX
                       INC AX
                       INC AX
                
                       MOV AH, 4Ch ; return to DOS
                       INT 21h
                END START
                

                When i load it in LDEBUG and trace it unreal AX reaches value 3, i get at step 3 counting from 0:

                -t   ; step 0, = rh 3
                AX=0000 ...
                -t   ; step 1, = rh 2
                AX=0001 ...
                -t   ; step 2, = rh 1 
                AX=0002 ...
                -t   ; step 3, = rh 0
                AX=0003 ...
                -
                

                Then i try different rh commands.
                Here the last step is step 3 (rh 0) as expected.

                -rh 0
                AX=0003...
                -
                

                And as you said, rh 0 1 should give step 3 (the last step) counting from 0 backwards only 1 step.
                So this is correct too:

                -rh 0 1
                AX=0003...
                -
                

                But rh 0 2, should give us now two register outputs, rh 0 and rh 1, but it doesn't.
                Only rh 0 is printed:

                -rh 0 2
                AX=0003...
                -
                

                It's confusing, because i expected that 2 steps are printed.

                The command rh 1 2 outputs, what i expected from rh 0 2:

                -rh 1 2
                AX=0002...
                AX=0003...
                -
                

                But according to the definition, that rh 0 is the last step, there should be no AX=0003 in the output. Instead, there should be:

                AX=0001...
                AX=0002...
                

                Which corresponds to individual

                -rh 2
                AX=0001
                

                and

                -rh 1
                AX=0002
                

                And rh 1 3 seems to output the same as the above rh 1 2, only 2 register outputs. Expected where 3.

                -rh 1 3
                AX=0002...
                AX=0003...
                -
                

                It get's even more confusing with rh 2 3, now i get 3 steps in the output:

                -rh 2 3
                AX=0001
                AX=0002
                AX=0003
                -
                

                The world seems to look good again if we do the input in reverse. Oldest step (largest number) first, newest step last:

                -rh 1 0 
                AX=0002
                AX=0003
                -
                

                So a rh 2 1 should give us

                AX=0001
                AX=0002
                

                Does it? No, we get:

                -rh 2 1
                AX=0001
                -
                

                Which corresponds to:

                -rh 2
                AX=0001
                -
                

                So we just learned rh 2 1 gives us the output of only 1 step, so rh 2 0 should give us 2 steps, right?
                No:

                -rh 2 0
                AX=0001...
                AX=0002...
                AX=0003...
                

                This is correct, if we expected the output of rh 2, rh 1 and rh 0, but then the above rh 2 1should work the same way and give us rh 2 and rh 1.


                Now i try to print several steps at once, but this time i use a comma to have them not coherently. So this time no range, only individual steps:

                -rh 4,3
                AX=0000...
                AX=0001...
                -
                

                Looks good. As expected.

                What about:

                -rh 4, 2
                AX=0000
                -
                

                Only rh 4 is printed, no rh 2.

                And several unrelated steps:

                -rh 4, 2, 0
                        ^ Error
                

                So this seems not to work this way.


                rh in from 0 to 3 looks better, it gives at least what i have expected:

                -rh in from 0 to 3
                AX=0000
                AX=0001
                AX=0002
                AX=0003
                -
                

                So i wonder, can this be done in reverse order?

                - rh in from 3 to 0
                                   ^Error
                

                Sadly no.


                Above i tried rh in from 0 to 3, now i wanted to see what rh in from 0 length 3will give me:

                -rh in from 0 length 3
                AX=0001
                AX=0002
                AX=0003
                

                Okay, this is correct, if we assume that length n is the number of steps and not a range.

                Personally i think i will stick with
                rh in from 0 to 3
                because that's what is not confusing and will be the range feature i will very likely need the most time.

                But it would be more desirable if rh 0 3 equaled rh in from 0 to 3.


                I also modifiedrc.replace today in the same way to allow using it in RH mode.

                Thank you.


                It's important to me to work in small steps and commit changesets often rather than prepare large far-reaching changes. This is why the error messages will probably be handled by first adding the new functions, then incrementally changing users to make use of it.

                Okay.

                Actually, this will require two different forms of encoding the registers, one into the intermediate binary format and another from that to the text output. This would be a considerably large change. RH mode and the silent buffer would be the only users of this.

                Why do you need two encodings for this when the text output could be created on the fly from the information encoded in the intermediate binary format?

                You only need a text output function that creates the text output from the information in intermediate binary format. And this output function could be used for R, RH and what else goes with it and you would have a silent buffer too, because the encoded intermediate binary format is only internal data, not text output on the screen.
                The text output is done by the text output function that takes the encoded intermediate binary format data as parameter.


                This is not ideal, but does work as intended. A 16-bit segment search could add a small hint like +NNNN where the number would be the length of the search string. However, this would not fit for 32-bit segments, or I'd have to shorten the dump to fit this in the line.

                and

                This would lessen the use of the data dump the longer the search pattern gets, with a completely blank dump eventually for 16-byte or longer patterns. So I won't do this.

                It would lessen the use of the data dump in some cases, but it would be correct and consistent to dump D. And in cases of an empty dump, this could be supplemented with another line.
                But i accept, when you don't want do it. In this case your suggestion of a +NNNN hint would be the best option, i think. At least for 16 Bit real mode programs.


                Hint: If you use r dco2 or= 333 the debugger will draw headers and trailers with the offsets in D/DB/DW/DD commands.

                Thank you very much for that hint.
                I thought about taking this option into my default configuration file, but since then two additional lines always have to be outputted when this option is enabled, I changed my mind.

                Couldn't this function be made part of dumpdso it's available on a case use basis when you need it? So that you just have to add a parameter with d, and then d automatically displays this help?

                The top parameter for outputting non-ASCII characters already exists in d dump, so how about another short parameter word for this help?
                Something like ht for header and trailer.

                -d top, ht 10,20
                ; dumps memory with header and trailer help and including non-ASCII characters
                -d ht 10,20
                ; dumps memory with header and trailer help
                -
                

                I tried to find a more meaningful term for the htparameter, help would be good too, but the argument against help is that it might be necessary for a built-in online help. And then that would be rather confusing if used somewhere else in a different context.
                But you could also simply use the word human, as is known from various Unix command line programs (du -h, free -h etc.) for an output that is easy for humans to understand or read.

                -d human 10, 20
                ; dumps memory with header and trailer help
                

                (By the way, you can omit the DS: prefix for a search range as ds is already the default.)

                I know, thanks anyway. I used DS to be more precise.

                 

                Last edit: Oliver 2023-07-21
                • E. C. Masloch

                  E. C. Masloch - 2023-07-21

                  but

                  rh 1 3

                  is still confusing, because the last step seems to be counted as step 1, not step 0 like we programmers are used to and how rh with only one number given works.
                  Thus step 0 would be the next step, which isn't taken so far.

                  No, the two-parameter form specifies the first parameter (start of dump) exactly the same as the one-parameter form. rh 1 3 asks to start the dump at step 1 (the second-most recent step) and go on for 3 steps (counting down towards the present moment, from 1 to 0 to ...). It doesn't care that there are only 2 steps to show, that's on you as the user.

                  When i load it in LDEBUG and trace it unreal AX reaches value 3, i get at step 3 counting from 0:

                  -t ; step 0, = rh 3
                  AX=0000 ...
                  -t ; step 1, = rh 2
                  AX=0001 ...
                  -t ; step 2, = rh 1
                  AX=0002 ...
                  -t ; step 3, = rh 0
                  AX=0003 ...
                  -

                  Then i try different rh commands.
                  Here the last step is step 3 (rh 0) as expected.

                  -rh 0
                  AX=0003...
                  -

                  And as you said, rh 0 1 should give step 3 (the last step) counting from 0 backwards only 1 step.
                  So this is correct too:

                  -rh 0 1
                  AX=0003...
                  -

                  But rh 0 2, should give us now two register outputs, rh 0 and rh 1, but it doesn't.
                  Only rh 0 is printed:

                  -rh 0 2
                  AX=0003...
                  -

                  Same user error as above. You're specifying to start at step rh 0 and then display 2 steps. But there is only the one if you count towards the present moment.

                  It's confusing, because i expected that 2 steps are printed.

                  The command rh 1 2 outputs, what i expected from rh 0 2:

                  -rh 1 2
                  AX=0002...
                  AX=0003...
                  -

                  But according to the definition, that rh 0 is the last step, there should be no AX=0003 in the output. Instead, there should be:

                  AX=0001...
                  AX=0002...

                  Which corresponds to individual

                  -rh 2
                  AX=0001

                  and

                  -rh 1
                  AX=0002

                  rh 1 2 starts dumping at step rh 1 and then counts up to two steps towards the present moment. So rh 1 2 is like rh 1 then rh 0.

                  And rh 1 3 seems to output the same as the above rh 1 2, only 2 register outputs. Expected where 3.

                  -rh 1 3
                  AX=0002...
                  AX=0003...
                  -

                  Same user error. This is like rh 1 then rh 0 then a no-op as the counter doesn't ever go negative.

                  It get's even more confusing with rh 2 3, now i get 3 steps in the output:

                  -rh 2 3
                  AX=0001
                  AX=0002
                  AX=0003
                  -

                  Start from step rh 2 then do count down to rh 1 and rh 0. Exactly as I intended.

                  The world seems to look good again if we do the input in reverse. Oldest step (largest number) first, newest step last:

                  -rh 1 0
                  AX=0002
                  AX=0003
                  -

                  0 as the second parameter is (now) special. It will display every step, starting at the step specified by the first parameter, down to step rh 0.

                  So a rh 2 1 should give us

                  AX=0001
                  AX=0002

                  Does it? No, we get:

                  -rh 2 1
                  AX=0001
                  -

                  Which corresponds to:

                  -rh 2
                  AX=0001
                  -

                  This is correct. rh 2 1 means start with step rh 2 and display exactly one step. This is the same as rh 2.

                  So we just learned rh 2 1 gives us the output of only 1 step, so rh 2 0 should give us 2 steps, right?
                  No:

                  -rh 2 0
                  AX=0001...
                  AX=0002...
                  AX=0003...

                  This is correct, if we expected the output of rh 2, rh 1 and rh 0, but then the above rh 2 1should work the same way and give us rh 2 and rh 1.

                  As I mentioned 0 as the second parameter is special, it means all subsequent steps are displayed. As I wrote, rh 2 1 really is the same as rh 2.

                  Now i try to print several steps at once, but this time i use a comma to have them not coherently. So this time no range, only individual steps:

                  -rh 4,3
                  AX=0000...
                  AX=0001...
                  -

                  Looks good. As expected.

                  This means start at step rh 4 (nonexistent) and dump up to 3 steps. It is the same as rh 4 then rh 3 then rh 2. The rh 4 command newly will not output anything because a step that old doesn't exist.

                  Do note that in the two-parameter RH command form, the comma is completely optional. It has no effect. rh 4,3 is the same as rh 4 3.

                  By the way, if you run re.replace @r ax . before the t trace commands then the debugger will output only the AX value for each trace step. That should help making examples. Reset this state using re.replace @r.

                  What about:

                  -rh 4, 2
                  AX=0000
                  -

                  Only rh 4 is printed, no rh 2.

                  No, this is step rh 3. The step rh 4 doesn't exist so it produces no output.

                  And several unrelated steps:

                  -rh 4, 2, 0
                  ^ Error

                  So this seems not to work this way.

                  There is no three-parameter form of the command. If you want to pass a list use rh in 4,2,0 (the comma is needed here).

                  So i wonder, can this be done in reverse order?

                  • rh in from 3 to 0
                    ^Error

                  Sadly no.

                  This would require changes to the match range parsing.

                  Above i tried rh in from 0 to 3, now i wanted to see what rh in from 0 length 3will give me:

                  -rh in from 0 length 3
                  AX=0001
                  AX=0002
                  AX=0003

                  Okay, this is correct, if we assume that length n is the number of steps and not a range.

                  Personally i think i will stick with
                  rh in from 0 to 3
                  because that's what is not confusing and will be the range feature i will very likely need the most time.

                  But it would be more desirable if rh 0 3 equaled rh in from 0 to 3.

                  Actually, this will require two different forms of encoding the registers, one into the intermediate binary format and another from that to the text output. This would be a considerably large change. RH mode and the silent buffer would be the only users of this.

                  Why do you need two encodings for this when the text output could be created on the fly from the information encoded in the intermediate binary format?

                  By "encoding" I didn't mean the data layout of the data here. I meant the program logic that "encodes" the data. The two encodings I referred to are the two different handlers needed for your scheme: One that encodes from the debugger variables to the binary/compressed form, and another that decodes the binary form and encodes this data in text form.

                  The text output is done by the text output function that takes the encoded intermediate binary format data as parameter.

                  This would be the second encoding in my notation. Creating this binary form would be the first encoding.

                  But i accept, when you don't want do it. In this case your suggestion of a +NNNN hint would be the best option, i think. At least for 16 Bit real mode programs.

                  I should get to that some time soon.

                  Couldn't this function be made part of dumpdso it's available on a case use basis when you need it? So that you just have to add a parameter with d, and then d automatically displays this help?

                  I don't think I will add that.

                   
                  • Oliver

                    Oliver - 2023-07-22

                    Thank you, now i understand the working of rh i k.

                    BTW, there is an issue when using g followed by rh. rh only shows steps that where done by t or p, but not by g.
                    And when doing some steps followed by L and then some steps again, rh shows the steps before and after the L.
                    If I understood that correctly, L loads the initial state of the program and, as I understand it, should also flush rh. But that is not the case, the history of the old steps is kept.

                    There is no three-parameter form of the command. If you want to pass a list use rh in 4,2,0(the comma is needed here).

                    Thank you for the hint.

                    By "encoding" I didn't mean the data layout of the data here. I meant the program logic that "encodes" the data.

                    I understand. Thank you for your clarification.

                    I should get to that some time soon.

                    Sounds good.

                    Couldn't this function be made part of dumpdso it's available on a case use basis when you need it? So that you just have to add a parameter with d, and then d automatically displays this help?

                    I don't think I will add that.

                    That's sad to hear. It would be a useful feature.

                     
                    • E. C. Masloch

                      E. C. Masloch - 2023-07-22

                      BTW, there is an issue when using g followed by rh. rh only shows steps that where done by t or p, but not by g.

                      That would be quite the bug, but I cannot reproduce it. Please list an entire session showing this problem.

                      And when doing some steps followed by L and then some steps again, rh shows the steps before and after the L.
                      If I understood that correctly, L loads the initial state of the program and, as I understand it, should also flush rh. But that is not the case, the history of the old steps is kept.

                      Yes. You can uninstall rh then install rh to discard the earlier entries explicitly.

                       
                      • E. C. Masloch

                        E. C. Masloch - 2023-07-22

                        Session not showing the G/RH bug:

                        E:\>ldebug lddebugu.com
                        &; Welcome to lDebug!
                        -install rh
                        Register dump history enabled.
                        -u
                        20A9:0140 8CC8              mov     ax, cs
                        20A9:0142 31DB              xor     bx, bx
                        20A9:0144 055D1A            add     ax, 1A5D
                        20A9:0147 50                push    ax
                        20A9:0148 53                push    bx
                        20A9:0149 CB                retf
                        20A9:014A 26807F0200        cmp     byte [es:bx+02], 00
                        20A9:014F 7414              jz      0165
                        20A9:0151 26C747030001      mov     word [es:bx+03], 0100
                        20A9:0157 26807F020E        cmp     byte [es:bx+02], 0E
                        20A9:015C 7406              jz      0164
                        20A9:015E 26C747030381      mov     word [es:bx+03], 8103
                        -g 148
                        AX=3B06 BX=0000 CX=C630 DX=0000 SP=07FE BP=0000 SI=0000 DI=0000
                        DS=20A9 ES=20A9 SS=4784 CS=20A9 IP=0148 NV UP EI PL NZ AC PE NC
                        20A9:0148 53                push    bx
                        -rh
                        AX=3B06 BX=0000 CX=C630 DX=0000 SP=07FE BP=0000 SI=0000 DI=0000
                        DS=20A9 ES=20A9 SS=4784 CS=20A9 IP=0148 NV UP EI PL NZ AC PE NC
                        20A9:0148 53                push    bx
                        -re.replace @r ax .
                        -g 149
                        AX 3B06
                        -rh
                        AX=3B06 BX=0000 CX=C630 DX=0000 SP=07FE BP=0000 SI=0000 DI=0000
                        DS=20A9 ES=20A9 SS=4784 CS=20A9 IP=0148 NV UP EI PL NZ AC PE NC
                        20A9:0148 53                push    bx
                        AX 3B06
                        -rh 0
                        AX 3B06
                        -rh 1
                        AX=3B06 BX=0000 CX=C630 DX=0000 SP=07FE BP=0000 SI=0000 DI=0000
                        DS=20A9 ES=20A9 SS=4784 CS=20A9 IP=0148 NV UP EI PL NZ AC PE NC
                        20A9:0148 53                push    bx
                        -
                        
                         
                      • Oliver

                        Oliver - 2023-07-22

                        I made a screenshot but it's not a bug.
                        I think it's because of the way how rh works. When i wrote my last comment, i have forgotten that rh only logs the output. Thus there can be of course no steps logged made by g, except the output Program terminated normally (xxxx).
                        So it's not a bug, it's just a result of how rh works.

                        In order to change this, internal logging of the steps would be necessary. But we've already discussed that. And with such an internal logging of the steps, for example, the execution of a debug session with g until the next brake point appears, would certainly slow down the execution if every step were logged internally. So that wouldn't be good either.

                        Yes. You can uninstall rhthen install rhto discard the earlier entries explicitly.

                        Thank you, that workaround will be useful.

                        Are there cases where it might be useful to keep the old outputs from before the L command? If not, then you could automatically initiate this uninstall rh and install rh with the L command.

                         
                        • Oliver

                          Oliver - 2023-07-24

                          I have a new idea. If you decide one day to integrate internal logging this disadvantage of a too slow logging g command could be circumvented, by just offering two g commands. One g command that woks normally as before without logging and another command started with gg, thus two g, that uses internal logging.

                           
                        • E. C. Masloch

                          E. C. Masloch - 2023-07-26

                          I think it's because of the way how rh works. When i wrote my last comment, i have forgotten that rh only logs the output. Thus there can be of course no steps logged made by g, except the output Program terminated normally (xxxx).
                          So it's not a bug, it's just a result of how rh works.

                          Correct. If you happen upon a breakpoint (temporary (gg), permanent (bb) hit or pass, or not managed by the debugger at all; that is "unexpected") then the RE output from the G command is also captured into the RH buffer. But no other output is ever written by G.

                          In order to change this, internal logging of the steps would be necessary. But we've already discussed that. And with such an internal logging of the steps, for example, the execution of a debug session with g until the next brake point appears, would certainly slow down the execution if every step were logged internally. So that wouldn't be good either.

                          This is actually what the silent buffer was originally created for. You can use tp ffffff silent 1 (the Fs are just to provide a very large repetition count) and the debugger will stay silent, record the last X steps into the silent/RH buffer, and then once the control flow returns to the user, the debugger will show only the very last step from the buffer. (Omit the number to show the full buffer contents. Omit the silent keyword clause entirely to display every step as it occurs.)

                          Afterwards, you can use the RH commands like usual to inspect the last X steps still saved in the buffer.

                          The execution is indeed much slower if you dump registers and disassemble for every step. Even just the overhead of the debugger's tracing (if you disable the register dump and disassembly) is enough to make this hundreds if not thousands of times slower than using the G command.

                          Are there cases where it might be useful to keep the old outputs from before the L command? If not, then you could automatically initiate this uninstall rh and install rh with the L command.

                          For questions like this I like to apply a small heuristic: Which choice can emulate the other one completely? If L did this, I would have to either add an option for L not to do this, or I would be unable to issue an L command without resetting the RH buffer. If L continues not to reset the buffer, you can work-around this by explicitly issuing these commands. So I won't change this.

                           
                          • Oliver

                            Oliver - 2023-07-27

                            I triggered a bug in the new version 2023-07-24.
                            Instead of my HELLO.EXE file for testing, i accidentally loaded my HELLO.ASM file.
                            Then i entered g.
                            This resulted in the following output:

                            -g
                            Invalid opcode
                            AX=FFDE BX=0000 CX=9FFF DX=2000 SP=FFFF BP=FFF9 SI=FFFE DI=FFFF
                            DS=1FFF ES=1FFF SS=1FFF CS=03AD IP=441B O_ D_ I_ S1 Z_ A_ P_ C_
                            03AD:441B 2E8F06BC0D        pop     word [cs:0DBC]                 CS:0DBCS=3082
                            -
                            

                            Then i entered l

                            -l
                            Register dump history disabled.
                            -g 10
                            ^ Error
                            -q
                            -Q
                            -^C
                            -
                            

                            From this state on, i can no more exit with q. I can print the help with ? and use commands like rh and u, but the latter two seem to produce garbage and errors at this state.
                            Only a reboot of my FreeDOS VM helped.

                            Gladly this bug is reproducible.
                            I added my hello world asm file to this bug report so you can test it.

                            I also tried to see, what happens, when i load my HELLO.EXE executable file.
                            And there seems to be some issue with the character output after reloading the program with l:

                            C:\TEMP>ldebug hello.exe
                            &; Startup configuration file "LDEBUG.SLD" loaded
                            Register dump history enabled.
                            lDebug (2023-07-24)
                            -g
                            Hallo Welt!
                            
                            Program terminated normally (0024)
                            -l
                            -g 10
                            allo Welt!
                            
                            Program terminated normally (0024)
                            -q
                            C:\TEMP>
                            

                            Here q quit works, but something changed the H to a with the l load command. I also added the HELLO.EXE file, so that you can reproduce it.

                            I will now test your new features, this will take a little bit.

                            By the way, I have a new suggestion.
                            Before loading a file into ldebug, how about checking the file extension, whether it's a .EXE, .COM, .ROM or .BIN file?
                            And if it's something else, the user should be asked if he really wants to load that file. This can prevent accidental loading of non-executable files, like I did..

                            So sth. like this:

                            C:\TEMP>ldebug hello.asm
                            File has the extension *.asm, do you really want to load that file (y/n)? n
                            C:\TEMP\
                            
                            C:\TEMP>ldebug hello.img
                            File has the extension *.img, do you really want to load that file (y/n)? y
                            -
                            

                            And for EXE, COM, ROM and BIN files, everything remains the same as the original behavior.

                            C:\TEMP>ldebug hello.exe
                            -q
                            C:\TEMP>ldebug hello.com
                            -q
                            C:\TEMP>ldebug hello.rom
                            -q
                            C:\TEMP>ldebug hello.bin
                            -q
                            
                             
                            • Oliver

                              Oliver - 2023-07-27

                              BTW, this character seems to be moved along the string when the G go command points behind the last program address.

                              -u
                              2041:0000 B84220        mov     ax, 2042
                              ...
                              2041:000C B44C          mov     ah, 4C
                              2041:000E CD21          int     21
                              ...
                              -g 0e
                              Hallo Welt!
                              AX=4C24...
                              -t
                              
                              Program terminated normally (0024)
                              -g 10
                              ╠allo Welt!
                              
                              Program terminated normally (0024)
                              -l
                              -g 11
                              H╠llo Welt!
                              
                              Program terminated normally (0024)
                              -l
                              -g 12
                              Ha╠lo Welt!
                              
                              Program terminated normally (0024)
                              -l
                              -g 19
                              Hallo Wel╠!
                              
                              Program terminated normally (0024)
                              
                               
                              • E. C. Masloch

                                E. C. Masloch - 2023-07-27

                                This codepoint is what 0CCh, the int3 breakpoint instruction, looks like when you display it to your terminal. By running g 10 you happen to replace the "H" byte of your text by the 0CCh. This is not a bug, it is expected that your data can be corrupted by placing a temporary breakpoint into it.

                                 
                                • Oliver

                                  Oliver - 2023-07-27

                                  Hm interesting, the next instruction at offset 0010 after the int 21 instruction to exit to dos is a dec ax, so i assume it will never be executed with g 10.

                                  -u
                                  ...
                                  2041:000C B44C    mov ah, 4C
                                  2041:000E CD21    int 21
                                  2041:0010 48      dec ax
                                  ...
                                  -
                                  

                                  Otherwise shouldn't g 10 then cause AX to be decremented by 1?

                                  If i try to simulate it manually:

                                  -g 0E   ; run the last instruction before int 21
                                  Hallo Welt!
                                  AX=....
                                  2041:000E CD21       int 21
                                  -p
                                  
                                  Program terminated normally (0024)
                                  

                                  Let's see where i am

                                  -r 
                                  AX=0000 BX=0000 CX=0000...
                                  ....IP=0100
                                  1FFF:0100 C3        retn
                                  

                                  It seems to be, that IP has changed pointing to 100h, which is something different than this dec AX. I assume it's probably a return point to some DOS routine or ldebug.

                                  -u 100
                                  1FFF:0100 C3         retn
                                  1FFF:0101 41         inc cx
                                  1FFF:0102 53         pusb bx
                                  ....
                                  -
                                  

                                  And if i do this last step:

                                  -t
                                  AX=0000 BX=0000 CX=0000 DX=0000 SP=0000 BP=0000 SI=0000 DI=0000
                                  DS=1FFF ES=1FFF SS=1FFF CS=1FFF IP=0000 O_ D_ I1 Z1 A_ P1 C_
                                  1FFF:0000 CD20        int    20
                                  -
                                  

                                  All registers seem to be set to 0, including IP.

                                  My Hello World String is also unchanged and not touched.

                                  -S DS:0,FFFF "Hallo Welt!"
                                  1FFF:0430 +0B 0D 0A 24 0A ....
                                  0001 matches
                                  -
                                  

                                  So there seems to also be no write access to DS's memory.
                                  And also no output to stdout. And if that were the case, wouldn't the character have to be at the end of the string?

                                  I also tried to see, what happens when i do a traced execution:

                                  -TP 10
                                  ...
                                  Hallo Welt!
                                  ...
                                  2041:000E CD21           int 21
                                  
                                  Program terminated normally (0024)
                                  -
                                  

                                  rh also ends at the Program terminated normally message.

                                  Why does g 10 execute another command at all? Shouldn't it end when int 21h ran?

                                   
                                  • E. C. Masloch

                                    E. C. Masloch - 2023-07-29

                                    It seems to be, that IP has changed pointing to 100h, which is something different than this dec AX. I assume it's probably a return point to some DOS routine or ldebug.

                                    Calling the DOS terminate process function (21.4C) will indeed return control flow to the parent process, which is the debugger. The debugger will then re-create an empty process.

                                    All registers seem to be set to 0, including IP.

                                    An empty process created by the debugger, as well as a process created from loading a flat-format .COM file, will have a word with the value zero on the stack. (Usually at word [SS:FFFEh].) If the program runs a retn instruction with this stack, it will jump to PSP:0, which holds an int 20h instruction which is another DOS terminate process call. IP and SP happen to be zero then, the other zeroes are from the process initialisation.

                                    So there seems to also be no write access to DS's memory.

                                    The PSP, the retn instruction, and the stack are re-initialised. The other data that may happen to be in the process segment are not overwritten. (But your DS may change from the prior process segment.)

                                    And also no output to stdout. And if that were the case, wouldn't the character have to be at the end of the string?

                                    I don't understand. What output do you expect?

                                    I also tried to see, what happens when i do a traced execution:

                                    TP 10 and G 10 are very different. One specifies to do up to 16 steps, unless interrupted by an unexpected fault. (This includes the process termination.) The other specifies to run code with a temporary breakpoint at address CS:0010h.

                                    Why does g 10 execute another command at all? Shouldn't it end when int 21h ran?

                                    Don't understand this either. Please detail exactly what command you used, what it did, and what you expected instead.

                                     
                                    • Oliver

                                      Oliver - 2023-07-29

                                      An empty process created by the debugger, as well as a process created from loading a flat-format .COM file, will have a word with the value zero on the stack. (Usually at word [SS:FFFEh].) If the program runs a retn instruction with this stack, it will jump to PSP:0, which holds an int 20h instruction which is another DOS terminate process call. IP and SP happen to be zero then, the other zeroes are from the process initialisation.

                                      That's good to know. Thank you for your answer.

                                      The PSP, the retn instruction, and the stack are re-initialised. The other data that may happen to be in the process segment are not overwritten. (But your DS may change from the prior process segment.)

                                      Thanks.

                                      I don't understand. What output do you expect?

                                      Well, i was wondering about this '╠' character.
                                      Depending on the steps g has to take, it is written over the Hello World! in stdout.
                                      The string "Hello World!" is not changed in memory, so it can't be that. Also, the string is printed out before "int 21" is reached. Therefore, that character can only be printed after it, and in that case it overwrites the string instead of printing it at the cursor position after the string.
                                      Normally, however, I would assume here that a character is always output at the end of the cursor, unless an explicit cursor position has been selected or the cursor has been reset or changed. So after "Hello World!". But that doesn't seem to be the case here.
                                      It looks like the cursor does a carriage return and then, depending on the steps in g n , moves the cursor one character at a time to the right and only then prints that character.

                                      TP 10 and G 10 are very different. One specifies to do up to 16 steps, unless interrupted by an unexpected fault. (This includes the process termination.) The other specifies to run code with a temporary breakpoint at address CS:0010h.

                                      Ah, now i understand.
                                      Now i counted the steps g 10 has to take by listing U.
                                      I counted 7 steps + 1 outside the program after the termination ended to emulate that g 10. So this equals tp 8.

                                      With tp 8 i get at the same position as g 10, but this character '╠' is printed nowhere like it is done with g 10. Strange.

                                      Don't understand this either. Please detail exactly what command you used, what it did, and what you expected instead.

                                      Well i assumed or expected when a program terminates before reaching the brakepoint of g, then the g command stops before going to that breakpoint.
                                      But that seems to be not the case. G just continues with its mission printing characters, until the additional steps taken correspond to the breakpoint value.

                                      The int 21 instruction to terminate the program and return to dos/ldebug is at IP = 0E and requires two bytes like expected.
                                      Thus the next command and where the breakpoint is set would be at IP = 10, but this is obviously never reached because of the retn, but G still continues.
                                      And between G 10 and G 19it prints this special character over the Hallo Welt! output.

                                      And depending on the first digit, the cursor is moved accordingly.
                                      It looks like the following: (Before every g command, i quit ldebug and restarted it with hello.exe loaded.)

                                      -g 0e
                                      Hallo Welt!... ; ends before int 21 like it should
                                      -p
                                      ...  ; normal termination without a '╠' printed
                                      -g 0f
                                      Hallo Welt
                                      Invalid opcode
                                      AX=0005...
                                      ...IP=0053...
                                      2041:0053 6345D7    arpl [di-29],ax     DS:D660=0000
                                      ; interesting where this have us taken. Strange new worlds. :)
                                      -g 10 
                                      allo Welt!...
                                      -g 11 
                                      Hllo Welt!...
                                      -g 12 
                                      Halo Welt!...
                                      -g 13 
                                      Halo Welt!...
                                      -g 14 
                                      Hall Welt!...
                                      -g 15 
                                      HalloWelt!...
                                      -g 16 
                                      Hallo elt!...
                                      -g 17 
                                      Hallo Wlt!...
                                      -g 18 
                                      Hallo Wet!...
                                      -g 19 
                                      Hallo Wel!...
                                      -g 1A 
                                      Hallo Welt...
                                      -g 1B
                                      Hallo Welt!... 
                                      -g 1C
                                      allo Welt!... ; overrund, starts at position of H again
                                      -g 1D
                                      Hallo Welt!
                                      
                                      .... ; prints the character in a new line and after that in the next line a lot of garbage
                                      -g 1E
                                      Hallo Welt!... ; everything looks fine from here.
                                      -g 1F
                                      Hallo Welt!... 
                                      -g 20
                                      Hallo Welt!... 
                                      

                                      Interestingly, whatever the value of n in G n has, as long as n is greater than 0E, g stops always execution before the retn instruction. This means, if i run r, retn is the next instruction.
                                      But, this ╠ character is still printed.

                                      If i count that ╠ steps, they correspond to the length of the "Hallo Welt!" string + its CR and LF and "$" character.
                                      In the asm code, the string is this:

                                      .DATA
                                      STRING DB "Hallo Welt!", 13, 10, "$"
                                      

                                      Thus 14 bytes. And this ╠ character is printed 14 times between g 10 and g 1D.

                                      -h 1D 10
                                      002D 000D
                                      -h D
                                      000D  decimal: 13
                                      

                                      There is the relation. The length of the string determines the maximum steps this ╠ character takes in stdout.

                                      My point is, g 10 to g 1D should just end after the program termination and not trying to print a character.

                                       
                                      • E. C. Masloch

                                        E. C. Masloch - 2023-07-30

                                        Well, i was wondering about this '╠' character.
                                        Depending on the steps g has to take, it is written over the Hello World! in stdout.

                                        The parameter (number) that you give to G is not "the steps G has to take". The parameter for G is completely different from the parameter for T/TP/P.

                                        https://pushbx.org/ecm/doc/ldebug.htm#cmdg :

                                        The G command allows specifying breakpoints, which are either segmented addresses (86M or PM addresses depending on DebugX's mode) or linear addresses prefixed by an "@ " or "@(", similar to how the BP command allows a breakpoint specification. G breakpoints are identified by their position in the command line, as the 1st, 2nd, 3rd, etc. By default, 16 G breakpoints are supported.

                                        https://pushbx.org/ecm/doc/ldebug.htm#cmdp :

                                        a count may be specified, which causes the command to execute as many P steps as the count indicates.

                                        The fact that you specify a single number to G does not make this a "step count" like the (single) P parameter. It is simply parsed as a single breakpoint specification, writing a temporary breakpoint at the specified segmented address, parsed with a default segment of the current CS.

                                        The string "Hello World!" is not changed in memory, so it can't be that. Also, the string is printed out before "int 21" is reached. Therefore, that character can only be printed after it, and in that case it overwrites the string instead of printing it at the cursor position after the string.

                                        This is wrong. The "G 10" command works like the following sequence. It saves the original contents of the byte [cs:10] then overwrites it with a 0CCh (int3) opcode.

                                        r v0 = byte [cs:10]
                                        r byte [cs:10] = CC
                                        r v1 = cs
                                        r v2 = 10
                                        g
                                        r byte [v1:v2] = v0
                                        

                                        If you enter the first two commands into the debugger you can observe that the memory is changed during this sequence:

                                        Welcome to dosemu2!
                                            Build 2.0pre9-dev-20230728-1370-g988effd35
                                        lDebug (2023-07-30)
                                        -d cs:10 l 10
                                        2E6D:0010  48 61 6C 6C 6F 20 57 65-6C 74 21 0D 0A 24 00 00 Hallo Welt!..$..
                                        -r v0 = byte [cs:10]
                                        -r byte [cs:10] = CC
                                        -d cs:10 l 10
                                        2E6D:0010  CC 61 6C 6C 6F 20 57 65-6C 74 21 0D 0A 24 00 00 .allo Welt!..$..
                                        -g 
                                        ╠allo Welt!
                                        
                                        Program terminated normally (0024)
                                        -
                                        

                                        Normally, however, I would assume here that a character is always output at the end of the cursor, unless an explicit cursor position has been selected or the cursor has been reset or changed. So after "Hello World!". But that doesn't seem to be the case here.
                                        It looks like the cursor does a carriage return and then, depending on the steps in g n , moves the cursor one character at a time to the right and only then prints that character.

                                        This would be right if the memory indeed wasn't changed during the int 21h service 09h call, but as it is all of this is irrelevant.

                                        Ah, now i understand.
                                        Now i counted the steps g 10 has to take by listing U.
                                        I counted 7 steps + 1 outside the program after the termination ended to emulate that g 10. So this equals tp 8.

                                        With tp 8 i get at the same position as g 10, but this character '╠' is printed nowhere like it is done with g 10. Strange.

                                        Not strange at all. TP commands will trace/proceed past single instructions, so the only breakpoints it writes will be like after int instructions or the like. These will all be in the code section of your program, except for the breakpoint after the final int 21h (the one running function 4Ch). But that final breakpoint, while it does overwrite the "H" of your message, will only be set while executing the final int call, not while the function 09h call occurs earlier.

                                        Well i assumed or expected when a program terminates before reaching the brakepoint of g, then the g command stops before going to that breakpoint.
                                        But that seems to be not the case. G just continues with its mission printing ╠characters, until the additional steps taken correspond to the breakpoint value.

                                        No. As explained before, the temporary breakpoint you are writing is in the data part of your program. After DOS returns to the debuggee process's PRA (Parent Return Address), that is to the debugger, the G command will finally restore its breakpoints if possible (if they haven't been overwritten yet). Then the G command detects that no pass or non-pass bb point was executed, but rather that the PRA was entered, so it will display the message about "Program terminated normally" and return to the debugger command line. (The cmd3 command loop of the debugger will re-create an empty process when it detects that the current process has terminated.)

                                        The int 21 instruction to terminate the program and return to dos/ldebug is at IP = 0E and requires two bytes like expected.
                                        Thus the next command and where the breakpoint is set would be at IP = 10, but this is obviously never reached because of the retn, but G still continues.
                                        And between G 10 and G 19it prints this special character over the Hallo Welt! output.

                                        This happens because your example's data section can be addressed from your example's code segment (behind the end of your code section). If something else, such as the bottom of the stack section, was at this address then you wouldn't get the same result.

                                        And depending on the first digit, the cursor is moved accordingly.
                                        It looks like the following: (Before every g command, i quit ldebug and restarted it with hello.exe loaded.)

                                        If it doesn't crash you can reload the program by using the no-parameter "L" command. (Or in the case of loading eg "HELLO.ASM", you can use the "QA" command then the "L" command. I noticed that the behaviour of the debugger differs for this case as it will not re-initialise the CS:IP registers then. This is a holdover from MSDebug.)

                                        ~~~
                                        -g 0f
                                        Hallo Welt
                                        Invalid opcode
                                        AX=0005...
                                        ...IP=0053...
                                        2041:0053 6345D7 arpl [di-29],ax DS:D660=0000
                                        ; interesting where this have us taken. Strange new worlds. :)
                                        ~~~

                                        This is because you overwrote the second byte of the int 21h (CDh 21h) instruction with the CCh byte (int3 single-byte instruction). So you just changed int 21h to int 0CCh which presumably crashes sooner or later.

                                        ~~~
                                        -g 1D
                                        Hallo Welt!

                                        .... ; prints the character in a new line and after that in the next line a lot of garbage
                                        ~~~

                                        In this case you overwrote the dollar-sign (U+0024) terminator with the 0CCh so after the CR LF linebreak you get that point and then garbage afterwards as DOS continues to display things until it randomly encounters a 24h byte.

                                        Interestingly, whatever the value of n in G n has, as long as n is greater than 0E, g stops always execution before the retn instruction. This means, if i run r, retn is the next instruction.
                                        But, this ╠ character is still printed.

                                        Exactingly, G stops execution once the program control flow enters the debugger's PRA it set up for the debuggee process. The command line loop of the debugger then detects that the process terminated and subsequently initialises an empty process with the retn instruction. This may or may not happen to be at the same segment address as your program's process was.

                                        My point is, g 10 to g 1D should just end after the program termination and not trying to print a ╠ character.

                                        As explained, there is no bug here. You're happening to access your data section using your program's code segment, and that corrupts your string data. The G command does not run anything after the program terminated. (Except for the creation of a new empty process, but that's in the debugger's cmd3 command loop.)

                                         
                                        • Oliver

                                          Oliver - 2023-08-03

                                          The parameter (number) that you give to G is not "the steps G has to take". The parameter for G is completely different from the parameter for T/TP/P.

                                          Yes, these are addresses, but with steps i meant complete instruction steps.
                                          I also said:

                                          Now i counted the steps g 10 has to take by listing U.
                                          I counted 7 steps + 1 outside the program after the termination ended to emulate that g 10. So this equals tp 8.

                                          So i took that into account, that these are actual addresses.
                                          I probably worded it a bit unhappily

                                          The fact that you specify a single number to G does not make this a "step count" like the (single) P parameter. It is simply parsed as a single breakpoint specification, writing a temporary breakpoint at the specified segmented address, parsed with a default segment of the current CS.

                                          I agree.

                                          This is wrong. The "G 10" command works like the following sequence. It saves the original contents of the byte [cs:10] then overwrites it with a 0CCh (int3) opcode.

                                          r v0 = byte [cs:10]
                                          r byte [cs:10] = CC
                                          r v1 = cs
                                          r v2 = 10
                                          g
                                          r byte [v1:v2] = v0

                                          If you enter the first two commands into the debugger you can observe that the memory is changed during this sequence:

                                          Thank you for the clarification.
                                          Now i understand it.

                                          Suggestion:
                                          How about an additional column when outputting the assembler listing with the u key?
                                          There is still free space on the right side and you could display the opcodes in the second column as ASCII characters in this new last column.
                                          This would immediately show in cases where the code and data segment are the same when the data section starts and if there is a readable string present.

                                          Example:

                                          -u C
                                          2041:000C B44C              mov     ah, 4C            .L
                                          2041:000E CD21              int     21                .!
                                          2041:0010 48                dec     ax                H
                                          2041:0011 61                popa                      a
                                          2041:0012 6C                insb                      l
                                          2041:0013 6C                insb                      l
                                          2041:0014 6F                outsw                     o
                                          2041:0015 205765            and     [bx+65], dl        We
                                          2041:0018 6C                insb                      l
                                          2041:0019 7421              jz      003C              t!
                                          ...
                                          -
                                          

                                          In this u listing, it is clearly visible, that at offset 0010 begins the data area with a 'H' character and when read vertically it is the string "Hallo Welt!".
                                          And in the code section it is also usable to directly see, what kind of ASCII character is copied into or from a register.
                                          Example:

                                          1234:0000 B433            mov     ah, 33            .3
                                          

                                          It might be only a nice to have feature, but if the code size allows it and doesn't consume too much RAM, it might still be useful.

                                          Not strange at all. TP commands will trace/proceed past single instructions, so the only breakpoints it writes will be like after int instructions or the like. These will all be in the code section of your program, except for the breakpoint after the final int 21h (the one running function 4Ch). But that final breakpoint, while it does overwrite the "H" of your message, will only be set while executing the final int call, not while the function 09h call occurs earlier.

                                          and

                                          No. As explained before, the temporary breakpoint you are writing is in the data part of your program. After DOS returns to the debuggee process's PRA (Parent Return Address), that is to the debugger, the G command will finally restore its breakpoints if possible (if they haven't been overwritten yet). Then the G command detects that no pass or non-pass bb point was executed, but rather that the PRA was entered, so it will display the message about "Program terminated normally" and return to the debugger command line. (The cmd3 command loop of the debugger will re-create an empty process when it detects that the current process has terminated.)

                                          I understand. Thank you for clarification.

                                          This happens because your example's data section can be addressed from your example's code segment (behind the end of your code section). If something else, such as the bottom of the stack section, was at this address then you wouldn't get the same result.

                                          and

                                          In this case you overwrote the dollar-sign (U+0024) terminator with the 0CCh so after the CR LF linebreak you get that point and then garbage afterwards as DOS continues to display things until it randomly encounters a 24h byte.

                                          I understand and i agree.

                                          Exactingly, G stops execution once the program control flow enters the debugger's PRA it set up for the debuggee process. The command line loop of the debugger then detects that the process terminated and subsequently initialises an empty process with the retn instruction. This may or may not happen to be at the same segment address as your program's process was.

                                          and

                                          As explained, there is no bug here. You're happening to access your data section using your program's code segment, and that corrupts your string data. The G command does not run anything after the program terminated. (Except for the creation of a new empty process, but that's in the debugger's cmd3 command loop.)

                                          Thank you for your clarification.

                                           
                            • E. C. Masloch

                              E. C. Masloch - 2023-07-29

                              The hello.asm file does crash eventually for me if I use T commands, but QA, L, and Q commands still work at that point. If I do use the G command it crashes my dosemu2 machine, either directly (-E "ldebug c:\bin\lddebug.com hello.asm") or similarly to your case I get an "Invalid opcode" fault and then the debugger doesn't work correctly any longer (-E "ldebug.com hello.asm").

                              However, this is unlikely to be due to a debugger bug. When the machine crashes, it can corrupt parts of its own process, of the DOS, or of the debugger.

                               
                              • Oliver

                                Oliver - 2023-07-29

                                However, this is unlikely to be due to a debugger bug. When the machine crashes, it can corrupt parts of its own process, of the DOS, or of the debugger.

                                I agree on that if random data is loaded as program code, strange things can happen if that data is executed as code.
                                But that's why i recommended to check the file extension that is loaded to inform the user and prevent such accidents by asking him again, if he is sure if he wants to load data as program code.

                                The crash of dosemu2 shows, that dosemu2 is working differently like the emulator QEMU i use. In QEMU, loading this data in ldebug allows to do random stuff inside the memory of the emulated machine, without crashing the VM.

                                 
                                • E. C. Masloch

                                  E. C. Masloch - 2023-07-30

                                  But that's why i recommended to check the file extension that is loaded to inform the user and prevent such accidents by asking him again, if he is sure if he wants to load data as program code.

                                  I understand where you're coming from, but I think I'll just chalk this up to user error. You were able to figure out the cause of your problem by yourself, after all. I could add an option for this but I would not want to include this code in the default build, as the lCDebugX build is very close to 65_536 bytes of the code segment being filled, already. So if you wanted this option either you would have to build the debugger yourself or I would have to add a special build with options like this one enabled. (Trivially possible, but a bit more work for me. And would have to decide what to call this build.)

                                  The crash of dosemu2 shows, that dosemu2 is working differently like the emulator QEMU i use. In QEMU, loading this data in ldebug allows to do random stuff inside the memory of the emulated machine, without crashing the VM.

                                  Yes, dosemu2 is not as robust as qemu in this regard I would guess.

                                   
1 2 3 .. 5 > >> (Page 1 of 5)

Log in to post a comment.

MongoDB Logo MongoDB