Discussion about LDEBUG and these two new feature requests.
Basically they are:
2)
Scrolling support for the output of a LDEBUG session. So that you can scroll some pages back and see the output of previous debugging steps.
But i don't know if that is possible to implement in DOS, that's why i also have suggestions 3:
3)
A history function in LDEBUG that shows the previous register states of previous steps.
Here I could imagine two possibilities, either you only save the register states in the history that have changed from the last n steps and reconstruct the rest for the output starting from the current register status.
Or you save all register states of the last n steps completely.
The former takes up more space in the code segment, the latter in the data segment. And you said that you have still some free space left in the data segment, so this could be used for such a helpful function.
The history is only about showing the previous register states of the last n steps, so it is not meant to be a rollback of the program to a previous program execution state.
If it is also possible to store the output of the debugged program itself, this would be even better. But i don't know if that is possible.
The output of the register state history should be page by page, but also selective.
As a command name I would suggest dh as dump history. As an alternative history or sh for show history could also be possible
3)
A history function in LDEBUG that shows the previous register states of previous steps.
Here I could imagine two possibilities, either you only save the register states in the history that have changed from the last n steps and reconstruct the rest for the output starting from the current register status.
Or you save all register states of the last n steps completely.
The former takes up more space in the code segment, the latter in the data segment. And you said that you have still some free space left in the data segment, so this could be used for such a helpful function.
The history is only about showing the previous register states of the last n steps, so it is not meant to be a rollback of the program to a previous program execution state.
I implemented the RH mode and corresponding RH command.
If it is also possible to store the output of the debugged program itself, this would be even better. But i don't know if that is possible.
Possible yes, at least for plain text output, but you would have to hook int 21h and/or int 10h. I think debugging with a serial terminal or with video screen swapping are alternatives to this though.
The output of the register state history should be page by page, but also selective.
As a command name I would suggest dh as dump history. As an alternative history or sh for show history could also be possible
I chose RH for Register (dump) History. You have to enable RH mode using install rh. Then any subsequent R, RE, or T/P/G command output is stored in the auxiliary buffer, which is currently a fixed size of about 8 KiB. (Can be made larger, but for now only with a build time option. Considering a startup time option for a larger size.) Unlike your suggestion this stores the text displayed rather than the register values in binary.
The benefit is that other messages like permanent breakpoint hits and the disassembly, including jumping and memory access notices, are also preserved. And it was fairly easy to implement, re-using a lot of the silent tracing buffering code.
The big disadvantage is the memory use. At 8 KiB we can store about forty steps at a time, steps older than that are lost. Each dump (using default RE buffer content) comes in at about 200 bytes. Storing 8086 registers (excluding the 386 parts) would be about 8 times smaller, but that's without the disassembly.
Anyway, the RH command has three modes:
Just RH: Display entire saved contents.
RH number: Display one dump from saved content. With 0 the most recent dump is displayed, 1 is the second-most recent dump, etc. If too high then the oldest dump is shown.
RH number number: First number is start, same as for single-parameter form. Second number is how many dumps to display at most.
The output of RH is paged by default. DCO3 options for the silent dump can be modified to disable paging:
0100 T/TP/P: modify paging for silent dump
0200 T/TP/P: if 0100 set: turn paging on, else off
So you want r dco3 = dco3 clr 200 or 100 to disable paging for RH.
To use the RH command you should enable RH mode using install rh. (This command will clear the RH buffers when enabling RH mode.) However, this disables commands that require the auxiliary buffer, like RN, RM, re.replace, DIL, and S. Boot loaded mode Y script file reading also requires the auxiliary buffer. The Q command will disable RH mode in order to operate using the auxiliary buffer.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I implemented the RH mode and corresponding RH command.
These are great news. I have never had the case that feature requests are implemented as quickly as yours. My highest respect and a big thank you.
Possible yes, at least for plain text output, but you would have to hook int 21h and/or int 10h. I think debugging with a serial terminal or with video screen swapping are alternatives to this though.
Ok, It doesn't matter.
The benefit is that other messages like permanent breakpoint hits and the disassembly, including jumping and memory access notices, are also preserved. And it was fairly easy to implement, re-using a lot of the silent tracing buffering code.
Sounds good.
At 8 KiB we can store about forty steps at a time, steps older than that are lost.
40 steps is more than enough. I assumed 20, which should be sufficient in most cases. But 40 is even better.
RH number number: First number is start, same as for single-parameter form. Second number is how many dumps to display at most.
I noticed, that if you start with a smaller number than an older step only one step is printed.
For example RH 0 3. This prints only step 0.
Also there seems to be a bug with the ordering.
See the screenshots. There i did a:
- rh 3
; prints step 3 which is mov ah 09
- rh 1 3
; prints only step 0 and 1, but not 2 (first int 21) and 3 (mov ah 09)
- rh 1 2
; prints step 0 and 1, but not 1 (mov ah, 4C) and 2 (first int 21)
- rh
; prints all steps, expect the first one.
Anyway, the RH command has three modes:
Is it possible to add a fourth mode, where an enumeration of certain steps is given and a separator is used.
Example:
RH 3, 5, 8, 10
; prints steps 10, 8, 5, 3
This could be handy if steps 9, 7, 6, and 4 are not important and thus should not be shown.
You could print them individually of course, but this will always take one line for the input of the next RH number command.
To use the RH command you should enable RH mode using install rh. (This command will clear the RH buffers when enabling RH mode.) However, this disables commands that require the auxiliary buffer, like RN, RM, re.replace, DIL, and S.
I could live with not being able to use RM, it's quite unlikely that i will debug DOS programs that use MMX registers. I am unsure about RNand DIL.
But no S is definitely a loss.
How much work would it be to write a unified register output function that takes a struct of register values as binary values (in C language expression integers and pointers), as well as the other information you said, like permanent breakpoint hits, the disassembly, including jumping and memory access notices, and a step number and converts them to ASCII text and outputs them?
So you could use this function for all commands t,p, r and rh and you could save a lot of storage space in the auxiliary buffer, so that S, RN, RM,RHand DIL would be possible again at the same time.
It also seems that S search does have a regression (INSTALL RH is not used here). Its HEX and ASCII output doesn't match that from D dump. Basically the Hex numbers and ASCII text of the search string is not printed and ignored, which leads to having start the output at the wrong memory address.
I made a screenshot of this, there it is easily visible.
The Q command will disable RH mode in order to operate using the auxiliary buffer.
Did you mean Q RH or QRH or Q H?
Using these leads to an error message and only Q exits LDEBUG.
rh 1 3
; prints only step 0 and 1, but not 2 (first int 21) and 3 (mov ah 09)
Also correct. The start step is 1, and then up to 3 steps starting from it are displayed. As there are only 2, steps 1 then 0 are displayed.
rh 1 2
; prints step 0 and 1, but not 1 (mov ah, 4C) and 2 (first int 21)
Your description is wrong, this does display step 1 (mov ah, 4C) and also step 0 the second int 21. The display in your screenshot is exactly what I intended.
(Writing of which, on Linux dosemu2 you can copy text from the dosemu2 graphical window when pressing Shift it seems.)
rh
; prints all steps, expect the first one.
If you want to include a dump from before the first trace, run install rh then a plain r command before t commands.
Is it possible to add a fourth mode, where an enumeration of certain steps is given and a separator is used.
Example:
RH 3, 5, 8, 10
; prints steps 10, 8, 5, 3
This could be handy if steps 9, 7, 6, and 4 are not important and thus should not be shown.
You could print them individually of course, but this will always take one line for the input of the next RH number command.
As a workaround you could use rc.replace @rh 3; @rh 5; @rh 8; @rh 10 then run rc. (No, wait, you cannot use rc.replace while RH mode is active. Hmm.) However, I can probably squeeze an RH IN number, number, ... syntax into the code segment later.
But no S is definitely a loss.
I prepared a small patch today https://hg.pushbx.org/ecm/ldebug/rev/14a4c72ffbab which allows to use S while in RH mode. It uses the (new) WHILE buffer, which means S called from T/TP/P with a WHILE condition with RH mode or silent tracing enabled will fail. However, outside use in the RE buffer the S command will always work now.
How much work would it be to write a unified register output function that takes a struct of register values as binary values (in C language expression integers and pointers), as well as the other information you said, like permanent breakpoint hits, the disassembly, including jumping and memory access notices, and a step number and converts them to ASCII text and outputs them?
So you could use this function for all commands t,p, r and rh and you could save a lot of storage space in the auxiliary buffer, so that S, RN, RM,RHand DIL would be possible again at the same time.
I am considering a different form of compression but either way will probably eat a lot of code space size.
It also seems that S search does have a regression (INSTALL RH is not used here). Its HEX and ASCII output doesn't match that from D dump. Basically the Hex numbers and ASCII text of the search string is not printed and ignored, which leads to having start the output at the wrong memory address.
I made a screenshot of this, there it is easily visible.
This is intended. It is expected that you already know the search string, so the S dump displays the 16 bytes after the end of the search pattern match. This is documented in https://pushbx.org/ecm/doc/ldebug.htm#cmdhttps://pushbx.org/ecm/doc/ldebug.htm#cmds And this is not a regression: Even if you consider it a bug, it always worked like this so it didn't regress at any point. (The term "regression" to me means something that used to work no longer working the same way.)
The Q command will disable RH mode in order to operate using the auxiliary buffer.
Did you mean Q RH or QRH or Q H?
Using these leads to an error message and only Q exits LDEBUG.
No, to disable RH mode without quitting the debugger just use uninstall rh. I meant that a plain Q command asks for the auxiliary buffer, which is in use if RH mode is enabled. Instead of failing the Q command with an error message (which annoyed me), Q will now disable RH mode first before attempting to carry out the Q command's intended task which is to quit the debugger. (If the Q command fails then RH mode stays disabled.)
rh 1 3
; prints only step 0 and 1, but not 2 (first int 21) and 3 (mov ah 09)
Also correct. The start step is 1, and then up to 3 steps starting from it are displayed. As there are only 2, steps 1 then 0 are displayed.
That's confusing. The last step, most recent is 0 in the history, counting backwards from there, why not just printing step 1 to step 3:
3 = mov ah, 09
2 = int 21
1 = mov ah, 4C
and not:
1 =mov ah, 4C
0 (most recent) =int 21
From a user perspective the question from the program to the user should be:
"What steps do you want?"
And the user answers:
"Steps 1 to 3 (which means 3 is included so a <= 3, not < 3) "
And then the program should print steps 1 to 3, not 0 and 1.
The latter is quite confusing and complicates more than required.
rh 1 2
; prints step 0 and 1, but not 1 (mov ah, 4C) and 2 (first int 21)
Your description is wrong, this does display step 1 (mov ah, 4C) and also step 0 the second int 21. The display in your screenshot is exactly what I intended.
No, with first int 21 i meant the first int 21in the program execution.
According to your above list, this would be:
2 = int 21 where AX = 09E7, after step 3 with (mov ah, 09)
And when entering rh 1 2 i expected:
2 = int 21 where AX = 09E7
1 =mov ah, 4C
(Writing of which, on Linux dosemu2 you can copy text from the dosemu2 graphical window when pressing Shift it seems.)
That's good to know. Thank you for the information.
If you want to include a dump from before the first trace, run install rh then a plain r command before t commands.
Ah, i understand.
Suggestion:
Is it possible to do a call of r without printing an output under the hood after entering install rh, that way the history is already fed with the first r until it gets overwritten after n steps.
However, I can probably squeeze an RH IN number, number, ... syntax into the code segment later.
Sounds okay, it's better than not having that mode available.
I prepared a small patch today https://hg.pushbx.org/ecm/ldebug/rev/14a4c72ffbab which allows to use S while in RH mode. It uses the (new) WHILE buffer, which means S called from T/TP/P with a WHILE condition with RH mode or silent tracing enabled will fail. However, outside use in the RE buffer the S command will always work now.
That sounds better. I will need to test it tomorrow when the new binary is compiled and available.
I am considering a different form of compression but either way will probably eat a lot of code space size.
If you ask me, if feature after feature is added, it's sometime a good idea to do a refactoring. That would also be a good opportunity to add the error code and error messages thing.
I would use a uniform output for R, RH and what else goes with it and work internally with the data in binary form, as described, so i would not save it as ASCII. I would only convert to ASCII at the end of the output. So you could work internally with binary data without much overhead.
I could imagine that such a model would also be better for later expansions, whatever that may be.
Seems to require 64 Bytes for the first and second line and 80 Bytes, if tabs are not used, for the third line. That's 208 Bytes if stored as ASCII/Codepage characters.
If every register is stored in binary form it's only:
13 registers x 2 Bytes + 2 Bytes for the Flag registers, 6 Bytes for the instruction (includes 1 Byte to 6 Byte long instruction) = 34 Bytes for these 3 line and a 8086 and the rest could be reconstructed from this information during the output.
This would leave additional space for the FPU registers, the 2 new segment registers of the 386, 2 bytes wider flag register and its 4 byte wide extended registers, and the MMX register of the Pentium MMX.
And with smart, i.e. conditional storage, you could omit the MMX and FPU registers if not needed.
Code size would increase a little but, but not much, because the output function is required anyway and a more flexible one will not add that much.
This is intended. It is expected that you already know the search string, so the S dump displays the 16 bytes after the end of the search pattern match. This is documented in..
I understand. Ok then it is my fault. I thought the search pattern is also shown, so that the user knows, that this search string is really in the shown memory range.
And this is not a regression: Even if you consider it a bug, it always worked like this so it didn't regress at any point. (The term "regression" to me means something that used to work no longer working the same way.)
You are correct, but my assumption was, that it was different before. And here i was wrong. But that's the reason why i used the term regression.
But the address value or the output is still wrong when you compare the output of the D dump with the output of the S search .
The output of the search command gives the impression that the word "Welt" starts at address 20D6:0111.
-S DS:0,FFFF "Hallo"
20D6:0110 20 57 65 6C 74 21 0D 0A-24 4D 89 81 53 C9 41 B8 Welt!..$M..S.A.
0 1 2 3 4 5 6 7 8 9 A B C D E F
But the dump command shows, that it is actually starting at 20D6:0116, because the W is where the 6h is.
-D DS:110,120
20D6:0110 48 61 6C 6C 6F 20 57 65-6C 74 21 0D 0A 24 4D 89 Hallo Welt!..$M.
0 1 2 3 4 5 6 7 8 9 A B C D E F
So to fix that, either the offset address in the output of the search command must be adjusted or the output from hex value 0 to 5 must be filled with blanks, because W starts at 6h.
If you want omit the search patter in the output, i would consider this a correct output, as one is used to from Dump.
-S DS:0,FFFF "Hallo"
20D6:0110 20 57 65-6C 74 21 0D 0A 24 4D 89 Welt!..$M.
0 1 2 3 4 5 6 7 8 9 A B C D E F
No, to disable RH mode without quitting the debugger just use uninstall rh
That's confusing. The last step, most recent is 0 in the history, counting backwards from there, why not just printing step 1 to step 3:
You can use rh in from 1 length 3 for that. I'm not convinced to change the default behaviour because it was very simple to implement based on the silent trace buffering.
Suggestion:
Is it possible to do a call of r without printing an output under the hood after entering install rh, that way the history is already fed with the first r until it gets overwritten after n steps.
I would not want that always, so this would need an additional option. Furthermore, it would require more handling to do a silent R command dump. The workaround of needing to run R yourself is not too bad I think.
That sounds better. I will need to test it tomorrow when the new binary is compiled and available.
I also modified rc.replace today in the same way to allow using it in RH mode.
If you ask me, if feature after feature is added, it's sometime a good idea to do a refactoring. That would also be a good opportunity to add the error code and error messages thing.
It's important to me to work in small steps and commit changesets often rather than prepare large far-reaching changes. This is why the error messages will probably be handled by first adding the new functions, then incrementally changing users to make use of it.
I would use a uniform output for R, RH and what else goes with it and work internally with the data in binary form, as described, so i would not save it as ASCII. I would only convert to ASCII at the end of the output. So you could work internally with binary data without much overhead.
I could imagine that such a model would also be better for later expansions, whatever that may be.
Seems to require 64 Bytes for the first and second line and 80 Bytes, if tabs are not used, for the third line. That's 208 Bytes if stored as ASCII/Codepage characters.
If every register is stored in binary form it's only:
13 registers x 2 Bytes + 2 Bytes for the Flag registers, 6 Bytes for the instruction (includes 1 Byte to 6 Byte long instruction) = 34 Bytes for these 3 line and a 8086 and the rest could be reconstructed from this information during the output.
This would leave additional space for the FPU registers, the 2 new segment registers of the 386, 2 bytes wider flag register and its 4 byte wide extended registers, and the MMX register of the Pentium MMX.
And with smart, i.e. conditional storage, you could omit the MMX and FPU registers if not needed.
Code size would increase a little but, but not much, because the output function is required anyway and a more flexible one will not add that much.
Actually, this will require two different forms of encoding the registers, one into the intermediate binary format and another from that to the text output. This would be a considerably large change. RH mode and the silent buffer would be the only users of this.
You are correct, but my assumption was, that it was different before. And here i was wrong. But that's the reason why i used the term regression.
Yes, it would be accurate then.
But the address value or the output is still wrong when you compare the output of the D dump with the output of the S search .
The output of the search command gives the impression that the word "Welt" starts at address 20D6:0111.
This is not ideal, but does work as intended. A 16-bit segment search could add a small hint like +NNNN where the number would be the length of the search string. However, this would not fit for 32-bit segments, or I'd have to shorten the dump to fit this in the line.
But the dump command shows, that it is actually starting at 20D6:0116, because the W is where the 6h is.
Hint: If you use r dco2 or= 333 the debugger will draw headers and trailers with the offsets in D/DB/DW/DD commands.
So to fix that, either the offset address in the output of the search command must be adjusted or the output from hex value 0 to 5 must be filled with blanks, because W starts at 6h.
If you want omit the search patter in the output, i would consider this a correct output, as one is used to from Dump.
This would lessen the use of the data dump the longer the search pattern gets, with a completely blank dump eventually for 16-byte or longer patterns. So I won't do this.
(By the way, you can omit the DS: prefix for a search range as ds is already the default.)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Sorry if it took a little longer to answer that. I just needed a break from assembly programming the last few days.
You can use rh in from 1 length 3 for that. I'm not convinced to change the default behaviour because it was very simple to implement based on the silent trace buffering.
rh in from 1 length 3
works, but
rh 1 3
is still confusing, because the last step seems to be counted as step 1, not step 0 like we programmers are used to and how rh with only one number given works.
Thus step 0 would be the next step, which isn't taken so far.
I wrote a new assembly program wich looks like this in the code section.
Basically it increments AX until AX reaches 5. It saves me some work to write the output here because I only have to write the AX register.:
Then i try different rh commands.
Here the last step is step 3 (rh 0) as expected.
-rh 0
AX=0003...
-
And as you said, rh 0 1 should give step 3 (the last step) counting from 0 backwards only 1 step.
So this is correct too:
-rh 0 1
AX=0003...
-
But rh 0 2, should give us now two register outputs, rh 0 and rh 1, but it doesn't.
Only rh 0 is printed:
-rh 0 2
AX=0003...
-
It's confusing, because i expected that 2 steps are printed.
The command rh 1 2 outputs, what i expected from rh 0 2:
-rh 1 2
AX=0002...
AX=0003...
-
But according to the definition, that rh 0 is the last step, there should be no AX=0003 in the output. Instead, there should be:
AX=0001...
AX=0002...
Which corresponds to individual
-rh 2
AX=0001
and
-rh 1
AX=0002
And rh 1 3 seems to output the same as the above rh 1 2, only 2 register outputs. Expected where 3.
-rh 1 3
AX=0002...
AX=0003...
-
It get's even more confusing with rh 2 3, now i get 3 steps in the output:
-rh 2 3
AX=0001
AX=0002
AX=0003
-
The world seems to look good again if we do the input in reverse. Oldest step (largest number) first, newest step last:
-rh 1 0
AX=0002
AX=0003
-
So a rh 2 1 should give us
AX=0001
AX=0002
Does it? No, we get:
-rh 2 1
AX=0001
-
Which corresponds to:
-rh 2
AX=0001
-
So we just learned rh 2 1 gives us the output of only 1 step, so rh 2 0 should give us 2 steps, right?
No:
-rh 2 0
AX=0001...
AX=0002...
AX=0003...
This is correct, if we expected the output of rh 2, rh 1 and rh 0, but then the above rh 2 1should work the same way and give us rh 2 and rh 1.
Now i try to print several steps at once, but this time i use a comma to have them not coherently. So this time no range, only individual steps:
-rh 4,3
AX=0000...
AX=0001...
-
Looks good. As expected.
What about:
-rh 4, 2
AX=0000
-
Only rh 4 is printed, no rh 2.
And several unrelated steps:
-rh 4, 2, 0
^ Error
So this seems not to work this way.
rh in from 0 to 3 looks better, it gives at least what i have expected:
-rh in from 0 to 3
AX=0000
AX=0001
AX=0002
AX=0003
-
So i wonder, can this be done in reverse order?
- rh in from 3 to 0
^Error
Sadly no.
Above i tried rh in from 0 to 3, now i wanted to see what rh in from 0 length 3will give me:
-rh in from 0 length 3
AX=0001
AX=0002
AX=0003
Okay, this is correct, if we assume that length n is the number of steps and not a range.
Personally i think i will stick with rh in from 0 to 3
because that's what is not confusing and will be the range feature i will very likely need the most time.
But it would be more desirable if rh 0 3 equaled rh in from 0 to 3.
I also modifiedrc.replace today in the same way to allow using it in RH mode.
Thank you.
It's important to me to work in small steps and commit changesets often rather than prepare large far-reaching changes. This is why the error messages will probably be handled by first adding the new functions, then incrementally changing users to make use of it.
Okay.
Actually, this will require two different forms of encoding the registers, one into the intermediate binary format and another from that to the text output. This would be a considerably large change. RH mode and the silent buffer would be the only users of this.
Why do you need two encodings for this when the text output could be created on the fly from the information encoded in the intermediate binary format?
You only need a text output function that creates the text output from the information in intermediate binary format. And this output function could be used for R, RH and what else goes with it and you would have a silent buffer too, because the encoded intermediate binary format is only internal data, not text output on the screen.
The text output is done by the text output function that takes the encoded intermediate binary format data as parameter.
This is not ideal, but does work as intended. A 16-bit segment search could add a small hint like +NNNN where the number would be the length of the search string. However, this would not fit for 32-bit segments, or I'd have to shorten the dump to fit this in the line.
and
This would lessen the use of the data dump the longer the search pattern gets, with a completely blank dump eventually for 16-byte or longer patterns. So I won't do this.
It would lessen the use of the data dump in some cases, but it would be correct and consistent to dump D. And in cases of an empty dump, this could be supplemented with another line.
But i accept, when you don't want do it. In this case your suggestion of a +NNNN hint would be the best option, i think. At least for 16 Bit real mode programs.
Hint: If you use r dco2 or= 333 the debugger will draw headers and trailers with the offsets in D/DB/DW/DD commands.
Thank you very much for that hint.
I thought about taking this option into my default configuration file, but since then two additional lines always have to be outputted when this option is enabled, I changed my mind.
Couldn't this function be made part of dumpdso it's available on a case use basis when you need it? So that you just have to add a parameter with d, and then d automatically displays this help?
The top parameter for outputting non-ASCII characters already exists in d dump, so how about another short parameter word for this help?
Something like ht for header and trailer.
-d top, ht 10,20
; dumps memory with header and trailer help and including non-ASCII characters
-d ht 10,20
; dumps memory with header and trailer help
-
I tried to find a more meaningful term for the htparameter, help would be good too, but the argument against help is that it might be necessary for a built-in online help. And then that would be rather confusing if used somewhere else in a different context.
But you could also simply use the word human, as is known from various Unix command line programs (du -h, free -h etc.) for an output that is easy for humans to understand or read.
-d human 10, 20
; dumps memory with header and trailer help
(By the way, you can omit the DS: prefix for a search range as ds is already the default.)
I know, thanks anyway. I used DS to be more precise.
Last edit: Oliver 2023-07-21
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
is still confusing, because the last step seems to be counted as step 1, not step 0 like we programmers are used to and how rh with only one number given works.
Thus step 0 would be the next step, which isn't taken so far.
No, the two-parameter form specifies the first parameter (start of dump) exactly the same as the one-parameter form. rh 1 3 asks to start the dump at step 1 (the second-most recent step) and go on for 3 steps (counting down towards the present moment, from 1 to 0 to ...). It doesn't care that there are only 2 steps to show, that's on you as the user.
When i load it in LDEBUG and trace it unreal AX reaches value 3, i get at step 3 counting from 0:
Then i try different rh commands.
Here the last step is step 3 (rh 0) as expected.
-rh 0
AX=0003...
-
And as you said, rh 0 1 should give step 3 (the last step) counting from 0 backwards only 1 step.
So this is correct too:
-rh 0 1
AX=0003...
-
But rh 0 2, should give us now two register outputs, rh 0 and rh 1, but it doesn't.
Only rh 0 is printed:
-rh 0 2
AX=0003...
-
Same user error as above. You're specifying to start at step rh 0 and then display 2 steps. But there is only the one if you count towards the present moment.
It's confusing, because i expected that 2 steps are printed.
The command rh 1 2 outputs, what i expected from rh 0 2:
-rh 1 2
AX=0002...
AX=0003...
-
But according to the definition, that rh 0 is the last step, there should be no AX=0003 in the output. Instead, there should be:
AX=0001...
AX=0002...
Which corresponds to individual
-rh 2
AX=0001
and
-rh 1
AX=0002
rh 1 2 starts dumping at step rh 1 and then counts up to two steps towards the present moment. So rh 1 2 is like rh 1 then rh 0.
And rh 1 3 seems to output the same as the above rh 1 2, only 2 register outputs. Expected where 3.
-rh 1 3
AX=0002...
AX=0003...
-
Same user error. This is like rh 1 then rh 0 then a no-op as the counter doesn't ever go negative.
It get's even more confusing with rh 2 3, now i get 3 steps in the output:
-rh 2 3
AX=0001
AX=0002
AX=0003
-
Start from step rh 2 then do count down to rh 1 and rh 0. Exactly as I intended.
The world seems to look good again if we do the input in reverse. Oldest step (largest number) first, newest step last:
-rh 1 0
AX=0002
AX=0003
-
0 as the second parameter is (now) special. It will display every step, starting at the step specified by the first parameter, down to step rh 0.
So a rh 2 1 should give us
AX=0001
AX=0002
Does it? No, we get:
-rh 2 1
AX=0001
-
Which corresponds to:
-rh 2
AX=0001
-
This is correct. rh 2 1 means start with step rh 2 and display exactly one step. This is the same as rh 2.
So we just learned rh 2 1 gives us the output of only 1 step, so rh 2 0 should give us 2 steps, right?
No:
-rh 2 0
AX=0001...
AX=0002...
AX=0003...
This is correct, if we expected the output of rh 2, rh 1 and rh 0, but then the above rh 2 1should work the same way and give us rh 2 and rh 1.
As I mentioned 0 as the second parameter is special, it means all subsequent steps are displayed. As I wrote, rh 2 1 really is the same as rh 2.
Now i try to print several steps at once, but this time i use a comma to have them not coherently. So this time no range, only individual steps:
-rh 4,3
AX=0000...
AX=0001...
-
Looks good. As expected.
This means start at step rh 4 (nonexistent) and dump up to 3 steps. It is the same as rh 4 then rh 3 then rh 2. The rh 4 command newly will not output anything because a step that old doesn't exist.
Do note that in the two-parameter RH command form, the comma is completely optional. It has no effect. rh 4,3 is the same as rh 4 3.
By the way, if you run re.replace @r ax . before the t trace commands then the debugger will output only the AX value for each trace step. That should help making examples. Reset this state using re.replace @r.
What about:
-rh 4, 2
AX=0000
-
Only rh 4 is printed, no rh 2.
No, this is step rh 3. The step rh 4 doesn't exist so it produces no output.
And several unrelated steps:
-rh 4, 2, 0
^ Error
So this seems not to work this way.
There is no three-parameter form of the command. If you want to pass a list use rh in 4,2,0 (the comma is needed here).
So i wonder, can this be done in reverse order?
rh in from 3 to 0
^Error
Sadly no.
This would require changes to the match range parsing.
Above i tried rh in from 0 to 3, now i wanted to see what rh in from 0 length 3will give me:
-rh in from 0 length 3
AX=0001
AX=0002
AX=0003
Okay, this is correct, if we assume that length n is the number of steps and not a range.
Personally i think i will stick with
rh in from 0 to 3
because that's what is not confusing and will be the range feature i will very likely need the most time.
But it would be more desirable if rh 0 3 equaled rh in from 0 to 3.
Actually, this will require two different forms of encoding the registers, one into the intermediate binary format and another from that to the text output. This would be a considerably large change. RH mode and the silent buffer would be the only users of this.
Why do you need two encodings for this when the text output could be created on the fly from the information encoded in the intermediate binary format?
By "encoding" I didn't mean the data layout of the data here. I meant the program logic that "encodes" the data. The two encodings I referred to are the two different handlers needed for your scheme: One that encodes from the debugger variables to the binary/compressed form, and another that decodes the binary form and encodes this data in text form.
The text output is done by the text output function that takes the encoded intermediate binary format data as parameter.
This would be the second encoding in my notation. Creating this binary form would be the first encoding.
But i accept, when you don't want do it. In this case your suggestion of a +NNNN hint would be the best option, i think. At least for 16 Bit real mode programs.
I should get to that some time soon.
Couldn't this function be made part of dumpdso it's available on a case use basis when you need it? So that you just have to add a parameter with d, and then d automatically displays this help?
I don't think I will add that.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thank you, now i understand the working of rh i k.
BTW, there is an issue when using g followed by rh. rh only shows steps that where done by t or p, but not by g.
And when doing some steps followed by L and then some steps again, rh shows the steps before and after the L.
If I understood that correctly, L loads the initial state of the program and, as I understand it, should also flush rh. But that is not the case, the history of the old steps is kept.
There is no three-parameter form of the command. If you want to pass a list use rh in 4,2,0(the comma is needed here).
Thank you for the hint.
By "encoding" I didn't mean the data layout of the data here. I meant the program logic that "encodes" the data.
I understand. Thank you for your clarification.
I should get to that some time soon.
Sounds good.
Couldn't this function be made part of dumpdso it's available on a case use basis when you need it? So that you just have to add a parameter with d, and then d automatically displays this help?
I don't think I will add that.
That's sad to hear. It would be a useful feature.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
BTW, there is an issue when using g followed by rh. rh only shows steps that where done by t or p, but not by g.
That would be quite the bug, but I cannot reproduce it. Please list an entire session showing this problem.
And when doing some steps followed by L and then some steps again, rh shows the steps before and after the L.
If I understood that correctly, L loads the initial state of the program and, as I understand it, should also flush rh. But that is not the case, the history of the old steps is kept.
Yes. You can uninstall rh then install rh to discard the earlier entries explicitly.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I made a screenshot but it's not a bug.
I think it's because of the way how rh works. When i wrote my last comment, i have forgotten that rh only logs the output. Thus there can be of course no steps logged made by g, except the output Program terminated normally (xxxx).
So it's not a bug, it's just a result of how rh works.
In order to change this, internal logging of the steps would be necessary. But we've already discussed that. And with such an internal logging of the steps, for example, the execution of a debug session with g until the next brake point appears, would certainly slow down the execution if every step were logged internally. So that wouldn't be good either.
Yes. You can uninstall rhthen install rhto discard the earlier entries explicitly.
Thank you, that workaround will be useful.
Are there cases where it might be useful to keep the old outputs from before the L command? If not, then you could automatically initiate this uninstall rh and install rh with the L command.
I have a new idea. If you decide one day to integrate internal logging this disadvantage of a too slow logging g command could be circumvented, by just offering two g commands. One g command that woks normally as before without logging and another command started with gg, thus two g, that uses internal logging.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I think it's because of the way how rh works. When i wrote my last comment, i have forgotten that rh only logs the output. Thus there can be of course no steps logged made by g, except the output Program terminated normally (xxxx).
So it's not a bug, it's just a result of how rh works.
Correct. If you happen upon a breakpoint (temporary (gg), permanent (bb) hit or pass, or not managed by the debugger at all; that is "unexpected") then the RE output from the G command is also captured into the RH buffer. But no other output is ever written by G.
In order to change this, internal logging of the steps would be necessary. But we've already discussed that. And with such an internal logging of the steps, for example, the execution of a debug session with g until the next brake point appears, would certainly slow down the execution if every step were logged internally. So that wouldn't be good either.
This is actually what the silent buffer was originally created for. You can use tp ffffff silent 1 (the Fs are just to provide a very large repetition count) and the debugger will stay silent, record the last X steps into the silent/RH buffer, and then once the control flow returns to the user, the debugger will show only the very last step from the buffer. (Omit the number to show the full buffer contents. Omit the silent keyword clause entirely to display every step as it occurs.)
Afterwards, you can use the RH commands like usual to inspect the last X steps still saved in the buffer.
The execution is indeed much slower if you dump registers and disassemble for every step. Even just the overhead of the debugger's tracing (if you disable the register dump and disassembly) is enough to make this hundreds if not thousands of times slower than using the G command.
Are there cases where it might be useful to keep the old outputs from before the L command? If not, then you could automatically initiate this uninstall rh and install rh with the L command.
For questions like this I like to apply a small heuristic: Which choice can emulate the other one completely? If L did this, I would have to either add an option for L not to do this, or I would be unable to issue an L command without resetting the RH buffer. If L continues not to reset the buffer, you can work-around this by explicitly issuing these commands. So I won't change this.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I triggered a bug in the new version 2023-07-24.
Instead of my HELLO.EXE file for testing, i accidentally loaded my HELLO.ASM file.
Then i entered g.
This resulted in the following output:
From this state on, i can no more exit with q. I can print the help with ? and use commands like rh and u, but the latter two seem to produce garbage and errors at this state.
Only a reboot of my FreeDOS VM helped.
Gladly this bug is reproducible.
I added my hello world asm file to this bug report so you can test it.
I also tried to see, what happens, when i load my HELLO.EXE executable file.
And there seems to be some issue with the character output after reloading the program with l:
Here q quit works, but something changed the H to a ╠with the l load command. I also added the HELLO.EXE file, so that you can reproduce it.
I will now test your new features, this will take a little bit.
By the way, I have a new suggestion.
Before loading a file into ldebug, how about checking the file extension, whether it's a .EXE, .COM, .ROM or .BIN file?
And if it's something else, the user should be asked if he really wants to load that file. This can prevent accidental loading of non-executable files, like I did..
This codepoint is what 0CCh, the int3 breakpoint instruction, looks like when you display it to your terminal. By running g 10 you happen to replace the "H" byte of your text by the 0CCh. This is not a bug, it is expected that your data can be corrupted by placing a temporary breakpoint into it.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hm interesting, the next instruction at offset 0010 after the int 21 instruction to exit to dos is a dec ax, so i assume it will never be executed with g 10.
-u
...
2041:000C B44C mov ah, 4C
2041:000E CD21 int 21
2041:0010 48 dec ax
...
-
Otherwise shouldn't g 10 then cause AX to be decremented by 1?
If i try to simulate it manually:
-g 0E ; run the last instruction before int 21
Hallo Welt!
AX=....
2041:000E CD21 int 21
-p
Program terminated normally (0024)
It seems to be, that IP has changed pointing to 100h, which is something different than this dec AX. I assume it's probably a return point to some DOS routine or ldebug.
So there seems to also be no write access to DS's memory.
And also no output to stdout. And if that were the case, wouldn't the character have to be at the end of the string?
I also tried to see, what happens when i do a traced execution:
-TP 10
...
Hallo Welt!
...
2041:000E CD21 int 21
Program terminated normally (0024)
-
rh also ends at the Program terminated normally message.
Why does g 10 execute another command at all? Shouldn't it end when int 21h ran?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
It seems to be, that IP has changed pointing to 100h, which is something different than this dec AX. I assume it's probably a return point to some DOS routine or ldebug.
Calling the DOS terminate process function (21.4C) will indeed return control flow to the parent process, which is the debugger. The debugger will then re-create an empty process.
All registers seem to be set to 0, including IP.
An empty process created by the debugger, as well as a process created from loading a flat-format .COM file, will have a word with the value zero on the stack. (Usually at word [SS:FFFEh].) If the program runs a retn instruction with this stack, it will jump to PSP:0, which holds an int 20h instruction which is another DOS terminate process call. IP and SP happen to be zero then, the other zeroes are from the process initialisation.
So there seems to also be no write access to DS's memory.
The PSP, the retn instruction, and the stack are re-initialised. The other data that may happen to be in the process segment are not overwritten. (But your DS may change from the prior process segment.)
And also no output to stdout. And if that were the case, wouldn't the character have to be at the end of the string?
I don't understand. What output do you expect?
I also tried to see, what happens when i do a traced execution:
TP 10 and G 10 are very different. One specifies to do up to 16 steps, unless interrupted by an unexpected fault. (This includes the process termination.) The other specifies to run code with a temporary breakpoint at address CS:0010h.
Why does g 10 execute another command at all? Shouldn't it end when int 21h ran?
Don't understand this either. Please detail exactly what command you used, what it did, and what you expected instead.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
An empty process created by the debugger, as well as a process created from loading a flat-format .COM file, will have a word with the value zero on the stack. (Usually at word [SS:FFFEh].) If the program runs a retn instruction with this stack, it will jump to PSP:0, which holds an int 20h instruction which is another DOS terminate process call. IP and SP happen to be zero then, the other zeroes are from the process initialisation.
That's good to know. Thank you for your answer.
The PSP, the retn instruction, and the stack are re-initialised. The other data that may happen to be in the process segment are not overwritten. (But your DS may change from the prior process segment.)
Thanks.
I don't understand. What output do you expect?
Well, i was wondering about this '╠' character.
Depending on the steps g has to take, it is written over the Hello World! in stdout.
The string "Hello World!" is not changed in memory, so it can't be that. Also, the string is printed out before "int 21" is reached. Therefore, that character can only be printed after it, and in that case it overwrites the string instead of printing it at the cursor position after the string.
Normally, however, I would assume here that a character is always output at the end of the cursor, unless an explicit cursor position has been selected or the cursor has been reset or changed. So after "Hello World!". But that doesn't seem to be the case here.
It looks like the cursor does a carriage return and then, depending on the steps in g n , moves the cursor one character at a time to the right and only then prints that character.
TP 10 and G 10 are very different. One specifies to do up to 16 steps, unless interrupted by an unexpected fault. (This includes the process termination.) The other specifies to run code with a temporary breakpoint at address CS:0010h.
Ah, now i understand.
Now i counted the steps g 10 has to take by listing U.
I counted 7 steps + 1 outside the program after the termination ended to emulate that g 10. So this equals tp 8.
With tp 8 i get at the same position as g 10, but this character '╠' is printed nowhere like it is done with g 10. Strange.
Don't understand this either. Please detail exactly what command you used, what it did, and what you expected instead.
Well i assumed or expected when a program terminates before reaching the brakepoint of g, then the g command stops before going to that breakpoint.
But that seems to be not the case. G just continues with its mission printing ╠characters, until the additional steps taken correspond to the breakpoint value.
The int 21 instruction to terminate the program and return to dos/ldebug is at IP = 0E and requires two bytes like expected.
Thus the next command and where the breakpoint is set would be at IP = 10, but this is obviously never reached because of the retn, but G still continues.
And between G 10 and G 19it prints this special character over the Hallo Welt! output.
And depending on the first digit, the cursor is moved accordingly.
It looks like the following: (Before every g command, i quit ldebug and restarted it with hello.exe loaded.)
Interestingly, whatever the value of n in G n has, as long as n is greater than 0E, g stops always execution before the retn instruction. This means, if i run r, retn is the next instruction.
But, this ╠ character is still printed.
If i count that ╠ steps, they correspond to the length of the "Hallo Welt!" string + its CR and LF and "$" character.
In the asm code, the string is this:
.DATASTRINGDB"HalloWelt!",13,10,"$"
Thus 14 bytes. And this ╠ character is printed 14 times between g 10 and g 1D.
-h 1D 10
002D 000D
-h D
000D decimal: 13
There is the relation. The length of the string determines the maximum steps this ╠ character takes in stdout.
My point is, g 10 to g 1D should just end after the program termination and not trying to print a ╠ character.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The G command allows specifying breakpoints, which are either segmented addresses (86M or PM addresses depending on DebugX's mode) or linear addresses prefixed by an "@ " or "@(", similar to how the BP command allows a breakpoint specification. G breakpoints are identified by their position in the command line, as the 1st, 2nd, 3rd, etc. By default, 16 G breakpoints are supported.
a count may be specified, which causes the command to execute as many P steps as the count indicates.
The fact that you specify a single number to G does not make this a "step count" like the (single) P parameter. It is simply parsed as a single breakpoint specification, writing a temporary breakpoint at the specified segmented address, parsed with a default segment of the current CS.
The string "Hello World!" is not changed in memory, so it can't be that. Also, the string is printed out before "int 21" is reached. Therefore, that character can only be printed after it, and in that case it overwrites the string instead of printing it at the cursor position after the string.
This is wrong. The "G 10" command works like the following sequence. It saves the original contents of the byte [cs:10] then overwrites it with a 0CCh (int3) opcode.
r v0 = byte [cs:10]
r byte [cs:10] = CC
r v1 = cs
r v2 = 10
g
r byte [v1:v2] = v0
If you enter the first two commands into the debugger you can observe that the memory is changed during this sequence:
Normally, however, I would assume here that a character is always output at the end of the cursor, unless an explicit cursor position has been selected or the cursor has been reset or changed. So after "Hello World!". But that doesn't seem to be the case here.
It looks like the cursor does a carriage return and then, depending on the steps in g n , moves the cursor one character at a time to the right and only then prints that character.
This would be right if the memory indeed wasn't changed during the int 21h service 09h call, but as it is all of this is irrelevant.
Ah, now i understand.
Now i counted the steps g 10 has to take by listing U.
I counted 7 steps + 1 outside the program after the termination ended to emulate that g 10. So this equals tp 8.
With tp 8 i get at the same position as g 10, but this character '╠' is printed nowhere like it is done with g 10. Strange.
Not strange at all. TP commands will trace/proceed past single instructions, so the only breakpoints it writes will be like after int instructions or the like. These will all be in the code section of your program, except for the breakpoint after the final int 21h (the one running function 4Ch). But that final breakpoint, while it does overwrite the "H" of your message, will only be set while executing the final int call, not while the function 09h call occurs earlier.
Well i assumed or expected when a program terminates before reaching the brakepoint of g, then the g command stops before going to that breakpoint.
But that seems to be not the case. G just continues with its mission printing ╠characters, until the additional steps taken correspond to the breakpoint value.
No. As explained before, the temporary breakpoint you are writing is in the data part of your program. After DOS returns to the debuggee process's PRA (Parent Return Address), that is to the debugger, the G command will finally restore its breakpoints if possible (if they haven't been overwritten yet). Then the G command detects that no pass or non-pass bb point was executed, but rather that the PRA was entered, so it will display the message about "Program terminated normally" and return to the debugger command line. (The cmd3 command loop of the debugger will re-create an empty process when it detects that the current process has terminated.)
The int 21 instruction to terminate the program and return to dos/ldebug is at IP = 0E and requires two bytes like expected.
Thus the next command and where the breakpoint is set would be at IP = 10, but this is obviously never reached because of the retn, but G still continues.
And between G 10 and G 19it prints this special character over the Hallo Welt! output.
This happens because your example's data section can be addressed from your example's code segment (behind the end of your code section). If something else, such as the bottom of the stack section, was at this address then you wouldn't get the same result.
And depending on the first digit, the cursor is moved accordingly.
It looks like the following: (Before every g command, i quit ldebug and restarted it with hello.exe loaded.)
If it doesn't crash you can reload the program by using the no-parameter "L" command. (Or in the case of loading eg "HELLO.ASM", you can use the "QA" command then the "L" command. I noticed that the behaviour of the debugger differs for this case as it will not re-initialise the CS:IP registers then. This is a holdover from MSDebug.)
~~~
-g 0f
Hallo Welt
Invalid opcode
AX=0005...
...IP=0053...
2041:0053 6345D7 arpl [di-29],ax DS:D660=0000
; interesting where this have us taken. Strange new worlds. :)
~~~
This is because you overwrote the second byte of the int 21h (CDh 21h) instruction with the CCh byte (int3 single-byte instruction). So you just changed int 21h to int 0CCh which presumably crashes sooner or later.
~~~
-g 1D
Hallo Welt!
╠
.... ; prints the character in a new line and after that in the next line a lot of garbage
~~~
In this case you overwrote the dollar-sign (U+0024) terminator with the 0CCh so after the CR LF linebreak you get that point and then garbage afterwards as DOS continues to display things until it randomly encounters a 24h byte.
Interestingly, whatever the value of n in G n has, as long as n is greater than 0E, g stops always execution before the retn instruction. This means, if i run r, retn is the next instruction.
But, this ╠ character is still printed.
Exactingly, G stops execution once the program control flow enters the debugger's PRA it set up for the debuggee process. The command line loop of the debugger then detects that the process terminated and subsequently initialises an empty process with the retn instruction. This may or may not happen to be at the same segment address as your program's process was.
My point is, g 10 to g 1D should just end after the program termination and not trying to print a ╠ character.
As explained, there is no bug here. You're happening to access your data section using your program's code segment, and that corrupts your string data. The G command does not run anything after the program terminated. (Except for the creation of a new empty process, but that's in the debugger's cmd3 command loop.)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The parameter (number) that you give to G is not "the steps G has to take". The parameter for G is completely different from the parameter for T/TP/P.
Yes, these are addresses, but with steps i meant complete instruction steps.
I also said:
Now i counted the steps g 10 has to take by listing U.
I counted 7 steps + 1 outside the program after the termination ended to emulate that g 10. So this equals tp 8.
So i took that into account, that these are actual addresses.
I probably worded it a bit unhappily
The fact that you specify a single number to G does not make this a "step count" like the (single) P parameter. It is simply parsed as a single breakpoint specification, writing a temporary breakpoint at the specified segmented address, parsed with a default segment of the current CS.
I agree.
This is wrong. The "G 10" command works like the following sequence. It saves the original contents of the byte [cs:10] then overwrites it with a 0CCh (int3) opcode.
r v0 = byte [cs:10]
r byte [cs:10] = CC
r v1 = cs
r v2 = 10
g
r byte [v1:v2] = v0
If you enter the first two commands into the debugger you can observe that the memory is changed during this sequence:
Thank you for the clarification.
Now i understand it.
Suggestion:
How about an additional column when outputting the assembler listing with the u key?
There is still free space on the right side and you could display the opcodes in the second column as ASCII characters in this new last column.
This would immediately show in cases where the code and data segment are the same when the data section starts and if there is a readable string present.
Example:
-u C
2041:000C B44C mov ah, 4C .L
2041:000E CD21 int 21 .!
2041:0010 48 dec ax H
2041:0011 61 popa a
2041:0012 6C insb l
2041:0013 6C insb l
2041:0014 6F outsw o
2041:0015 205765 and [bx+65], dl We
2041:0018 6C insb l
2041:0019 7421 jz 003C t!
...
-
In this u listing, it is clearly visible, that at offset 0010 begins the data area with a 'H' character and when read vertically it is the string "Hallo Welt!".
And in the code section it is also usable to directly see, what kind of ASCII character is copied into or from a register.
Example:
1234:0000B433movah,33.3
It might be only a nice to have feature, but if the code size allows it and doesn't consume too much RAM, it might still be useful.
Not strange at all. TP commands will trace/proceed past single instructions, so the only breakpoints it writes will be like after int instructions or the like. These will all be in the code section of your program, except for the breakpoint after the final int 21h (the one running function 4Ch). But that final breakpoint, while it does overwrite the "H" of your message, will only be set while executing the final int call, not while the function 09h call occurs earlier.
and
No. As explained before, the temporary breakpoint you are writing is in the data part of your program. After DOS returns to the debuggee process's PRA (Parent Return Address), that is to the debugger, the G command will finally restore its breakpoints if possible (if they haven't been overwritten yet). Then the G command detects that no pass or non-pass bb point was executed, but rather that the PRA was entered, so it will display the message about "Program terminated normally" and return to the debugger command line. (The cmd3 command loop of the debugger will re-create an empty process when it detects that the current process has terminated.)
I understand. Thank you for clarification.
This happens because your example's data section can be addressed from your example's code segment (behind the end of your code section). If something else, such as the bottom of the stack section, was at this address then you wouldn't get the same result.
and
In this case you overwrote the dollar-sign (U+0024) terminator with the 0CCh so after the CR LF linebreak you get that point and then garbage afterwards as DOS continues to display things until it randomly encounters a 24h byte.
I understand and i agree.
Exactingly, G stops execution once the program control flow enters the debugger's PRA it set up for the debuggee process. The command line loop of the debugger then detects that the process terminated and subsequently initialises an empty process with the retn instruction. This may or may not happen to be at the same segment address as your program's process was.
and
As explained, there is no bug here. You're happening to access your data section using your program's code segment, and that corrupts your string data. The G command does not run anything after the program terminated. (Except for the creation of a new empty process, but that's in the debugger's cmd3 command loop.)
Thank you for your clarification.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The hello.asm file does crash eventually for me if I use T commands, but QA, L, and Q commands still work at that point. If I do use the G command it crashes my dosemu2 machine, either directly (-E "ldebug c:\bin\lddebug.com hello.asm") or similarly to your case I get an "Invalid opcode" fault and then the debugger doesn't work correctly any longer (-E "ldebug.com hello.asm").
However, this is unlikely to be due to a debugger bug. When the machine crashes, it can corrupt parts of its own process, of the DOS, or of the debugger.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
However, this is unlikely to be due to a debugger bug. When the machine crashes, it can corrupt parts of its own process, of the DOS, or of the debugger.
I agree on that if random data is loaded as program code, strange things can happen if that data is executed as code.
But that's why i recommended to check the file extension that is loaded to inform the user and prevent such accidents by asking him again, if he is sure if he wants to load data as program code.
The crash of dosemu2 shows, that dosemu2 is working differently like the emulator QEMU i use. In QEMU, loading this data in ldebug allows to do random stuff inside the memory of the emulated machine, without crashing the VM.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
But that's why i recommended to check the file extension that is loaded to inform the user and prevent such accidents by asking him again, if he is sure if he wants to load data as program code.
I understand where you're coming from, but I think I'll just chalk this up to user error. You were able to figure out the cause of your problem by yourself, after all. I could add an option for this but I would not want to include this code in the default build, as the lCDebugX build is very close to 65_536 bytes of the code segment being filled, already. So if you wanted this option either you would have to build the debugger yourself or I would have to add a special build with options like this one enabled. (Trivially possible, but a bit more work for me. And would have to decide what to call this build.)
The crash of dosemu2 shows, that dosemu2 is working differently like the emulator QEMU i use. In QEMU, loading this data in ldebug allows to do random stuff inside the memory of the emulated machine, without crashing the VM.
Yes, dosemu2 is not as robust as qemu in this regard I would guess.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
@C.Masloch
Discussion about LDEBUG and these two new feature requests.
Basically they are:
2)
Scrolling support for the output of a LDEBUG session. So that you can scroll some pages back and see the output of previous debugging steps.
But i don't know if that is possible to implement in DOS, that's why i also have suggestions 3:
3)
A history function in LDEBUG that shows the previous register states of previous steps.
Here I could imagine two possibilities, either you only save the register states in the history that have changed from the last n steps and reconstruct the rest for the output starting from the current register status.
Or you save all register states of the last n steps completely.
The former takes up more space in the code segment, the latter in the data segment. And you said that you have still some free space left in the data segment, so this could be used for such a helpful function.
The history is only about showing the previous register states of the last n steps, so it is not meant to be a rollback of the program to a previous program execution state.
If it is also possible to store the output of the debugged program itself, this would be even better. But i don't know if that is possible.
The output of the register state history should be page by page, but also selective.
As a command name I would suggest
dhas dump history. As an alternativehistoryorshfor show history could also be possibleExample:
I implemented the RH mode and corresponding RH command.
Possible yes, at least for plain text output, but you would have to hook int 21h and/or int 10h. I think debugging with a serial terminal or with video screen swapping are alternatives to this though.
I chose RH for Register (dump) History. You have to enable RH mode using
install rh. Then any subsequent R, RE, or T/P/G command output is stored in the auxiliary buffer, which is currently a fixed size of about 8 KiB. (Can be made larger, but for now only with a build time option. Considering a startup time option for a larger size.) Unlike your suggestion this stores the text displayed rather than the register values in binary.The benefit is that other messages like permanent breakpoint hits and the disassembly, including jumping and memory access notices, are also preserved. And it was fairly easy to implement, re-using a lot of the silent tracing buffering code.
The big disadvantage is the memory use. At 8 KiB we can store about forty steps at a time, steps older than that are lost. Each dump (using default RE buffer content) comes in at about 200 bytes. Storing 8086 registers (excluding the 386 parts) would be about 8 times smaller, but that's without the disassembly.
Anyway, the RH command has three modes:
RH: Display entire saved contents.RH number: Display one dump from saved content. With 0 the most recent dump is displayed, 1 is the second-most recent dump, etc. If too high then the oldest dump is shown.RH number number: First number is start, same as for single-parameter form. Second number is how many dumps to display at most.The output of RH is paged by default. DCO3 options for the silent dump can be modified to disable paging:
So you want
r dco3 = dco3 clr 200 or 100to disable paging for RH.To use the RH command you should enable RH mode using
install rh. (This command will clear the RH buffers when enabling RH mode.) However, this disables commands that require the auxiliary buffer, likeRN,RM,re.replace,DIL, andS. Boot loaded mode Y script file reading also requires the auxiliary buffer. TheQcommand will disable RH mode in order to operate using the auxiliary buffer.These are great news. I have never had the case that feature requests are implemented as quickly as yours. My highest respect and a big thank you.
Ok, It doesn't matter.
Sounds good.
40 steps is more than enough. I assumed 20, which should be sufficient in most cases. But 40 is even better.
I noticed, that if you start with a smaller number than an older step only one step is printed.
For example
RH 0 3. This prints only step 0.Also there seems to be a bug with the ordering.
See the screenshots. There i did a:
Is it possible to add a fourth mode, where an enumeration of certain steps is given and a separator is used.
Example:
This could be handy if steps 9, 7, 6, and 4 are not important and thus should not be shown.
You could print them individually of course, but this will always take one line for the input of the next
RH numbercommand.I could live with not being able to use
RM, it's quite unlikely that i will debug DOS programs that use MMX registers. I am unsure aboutRNandDIL.But no
Sis definitely a loss.How much work would it be to write a unified register output function that takes a struct of register values as binary values (in C language expression integers and pointers), as well as the other information you said, like permanent breakpoint hits, the disassembly, including jumping and memory access notices, and a step number and converts them to ASCII text and outputs them?
So you could use this function for all commands
t,p,randrhand you could save a lot of storage space in the auxiliary buffer, so thatS,RN,RM,RHandDILwould be possible again at the same time.It also seems that
Ssearch does have a regression (INSTALL RH is not used here). Its HEX and ASCII output doesn't match that fromDdump. Basically the Hex numbers and ASCII text of the search string is not printed and ignored, which leads to having start the output at the wrong memory address.I made a screenshot of this, there it is easily visible.
Did you mean
Q RHorQRHorQ H?Using these leads to an error message and only
Qexits LDEBUG.The number may be a little lower if you have many memory references, or if you use the register change highlighting as you do.
Yes, this is intended.
I don't think there is any bug here. The steps are:
5 (oldest) =
mov ds, ax4 =
mov dx, 03 =
mov ah, 092 =
int 211 =
mov ah, 4C0 (most recent) =
int 21Correct, and as intended.
Also correct. The start step is 1, and then up to 3 steps starting from it are displayed. As there are only 2, steps 1 then 0 are displayed.
Your description is wrong, this does display step 1 (
mov ah, 4C) and also step 0 the secondint 21. The display in your screenshot is exactly what I intended.(Writing of which, on Linux dosemu2 you can copy text from the dosemu2 graphical window when pressing Shift it seems.)
If you want to include a dump from before the first trace, run
install rhthen a plainrcommand beforetcommands.As a workaround you could use
rc.replace @rh 3; @rh 5; @rh 8; @rh 10then runrc. (No, wait, you cannot userc.replacewhile RH mode is active. Hmm.) However, I can probably squeeze anRH IN number, number, ...syntax into the code segment later.I prepared a small patch today https://hg.pushbx.org/ecm/ldebug/rev/14a4c72ffbab which allows to use S while in RH mode. It uses the (new) WHILE buffer, which means S called from T/TP/P with a WHILE condition with RH mode or silent tracing enabled will fail. However, outside use in the RE buffer the S command will always work now.
I am considering a different form of compression but either way will probably eat a lot of code space size.
This is intended. It is expected that you already know the search string, so the S dump displays the 16 bytes after the end of the search pattern match. This is documented in
https://pushbx.org/ecm/doc/ldebug.htm#cmdhttps://pushbx.org/ecm/doc/ldebug.htm#cmds And this is not a regression: Even if you consider it a bug, it always worked like this so it didn't regress at any point. (The term "regression" to me means something that used to work no longer working the same way.)
No, to disable RH mode without quitting the debugger just use
uninstall rh. I meant that a plainQcommand asks for the auxiliary buffer, which is in use if RH mode is enabled. Instead of failing theQcommand with an error message (which annoyed me),Qwill now disable RH mode first before attempting to carry out the Q command's intended task which is to quit the debugger. (If the Q command fails then RH mode stays disabled.)I also made a small change today so that the QA command will no longer need the auxiliary buffer nor disable RH mode. https://hg.pushbx.org/ecm/ldebug/rev/5e51cb3e6dc7
That's confusing. The last step, most recent is 0 in the history, counting backwards from there, why not just printing step 1 to step 3:
3 =
mov ah, 092 =
int 211 =
mov ah, 4Cand not:
1 =
mov ah, 4C0 (most recent) =
int 21From a user perspective the question from the program to the user should be:
And the user answers:
And then the program should print steps 1 to 3, not 0 and 1.
The latter is quite confusing and complicates more than required.
No, with first
int 21i meant the firstint 21in the program execution.According to your above list, this would be:
2 =
int 21where AX = 09E7, after step 3 with (mov ah, 09)And when entering
rh 1 2i expected:2 =
int 21where AX = 09E71 =
mov ah, 4CNot:
1 =
mov ah, 4C0 (most recent) =
int 21where AX = 4C24That's good to know. Thank you for the information.
Ah, i understand.
Suggestion:
Is it possible to do a call of
rwithout printing an output under the hood after enteringinstall rh, that way the history is already fed with the first r until it gets overwritten after n steps.Sounds okay, it's better than not having that mode available.
That sounds better. I will need to test it tomorrow when the new binary is compiled and available.
If you ask me, if feature after feature is added, it's sometime a good idea to do a refactoring. That would also be a good opportunity to add the error code and error messages thing.
I would use a uniform output for R, RH and what else goes with it and work internally with the data in binary form, as described, so i would not save it as ASCII. I would only convert to ASCII at the end of the output. So you could work internally with binary data without much overhead.
I could imagine that such a model would also be better for later expansions, whatever that may be.
A simple line of
Rthat looks like:Seems to require 64 Bytes for the first and second line and 80 Bytes, if tabs are not used, for the third line. That's 208 Bytes if stored as ASCII/Codepage characters.
If every register is stored in binary form it's only:
13 registers x 2 Bytes + 2 Bytes for the Flag registers, 6 Bytes for the instruction (includes 1 Byte to 6 Byte long instruction) = 34 Bytes for these 3 line and a 8086 and the rest could be reconstructed from this information during the output.
This would leave additional space for the FPU registers, the 2 new segment registers of the 386, 2 bytes wider flag register and its 4 byte wide extended registers, and the MMX register of the Pentium MMX.
And with smart, i.e. conditional storage, you could omit the MMX and FPU registers if not needed.
Code size would increase a little but, but not much, because the output function is required anyway and a more flexible one will not add that much.
I understand. Ok then it is my fault. I thought the search pattern is also shown, so that the user knows, that this search string is really in the shown memory range.
You are correct, but my assumption was, that it was different before. And here i was wrong. But that's the reason why i used the term regression.
But the address value or the output is still wrong when you compare the output of the D dump with the output of the S search .
The output of the search command gives the impression that the word "Welt" starts at address 20D6:0111.
But the dump command shows, that it is actually starting at 20D6:0116, because the W is where the 6h is.
So to fix that, either the offset address in the output of the search command must be adjusted or the output from hex value 0 to 5 must be filled with blanks, because W starts at 6h.
If you want omit the search patter in the output, i would consider this a correct output, as one is used to from Dump.
I understand.
Sounds good.
Last edit: Oliver 2023-07-10
You can use
rh in from 1 length 3for that. I'm not convinced to change the default behaviour because it was very simple to implement based on the silent trace buffering.I would not want that always, so this would need an additional option. Furthermore, it would require more handling to do a silent R command dump. The workaround of needing to run R yourself is not too bad I think.
I also modified
rc.replacetoday in the same way to allow using it in RH mode.It's important to me to work in small steps and commit changesets often rather than prepare large far-reaching changes. This is why the error messages will probably be handled by first adding the new functions, then incrementally changing users to make use of it.
Actually, this will require two different forms of encoding the registers, one into the intermediate binary format and another from that to the text output. This would be a considerably large change. RH mode and the silent buffer would be the only users of this.
Yes, it would be accurate then.
This is not ideal, but does work as intended. A 16-bit segment search could add a small hint like
+NNNNwhere the number would be the length of the search string. However, this would not fit for 32-bit segments, or I'd have to shorten the dump to fit this in the line.Hint: If you use
r dco2 or= 333the debugger will draw headers and trailers with the offsets in D/DB/DW/DD commands.This would lessen the use of the data dump the longer the search pattern gets, with a completely blank dump eventually for 16-byte or longer patterns. So I won't do this.
(By the way, you can omit the
DS:prefix for a search range as ds is already the default.)Sorry if it took a little longer to answer that. I just needed a break from assembly programming the last few days.
works, but
is still confusing, because the last step seems to be counted as step 1, not step 0 like we programmers are used to and how rh with only one number given works.
Thus step 0 would be the next step, which isn't taken so far.
I wrote a new assembly program wich looks like this in the code section.
Basically it increments AX until AX reaches 5. It saves me some work to write the output here because I only have to write the AX register.:
When i load it in LDEBUG and trace it unreal AX reaches value 3, i get at step 3 counting from 0:
Then i try different rh commands.
Here the last step is step 3 (rh 0) as expected.
And as you said,
rh 0 1should give step 3 (the last step) counting from 0 backwards only 1 step.So this is correct too:
But
rh 0 2, should give us now two register outputs,rh 0andrh 1, but it doesn't.Only
rh 0is printed:It's confusing, because i expected that 2 steps are printed.
The command
rh 1 2outputs, what i expected fromrh 0 2:But according to the definition, that
rh 0is the last step, there should be no AX=0003 in the output. Instead, there should be:Which corresponds to individual
and
And
rh 1 3seems to output the same as the aboverh 1 2, only 2 register outputs. Expected where 3.It get's even more confusing with
rh 2 3, now i get 3 steps in the output:The world seems to look good again if we do the input in reverse. Oldest step (largest number) first, newest step last:
So a
rh 2 1should give usDoes it? No, we get:
Which corresponds to:
So we just learned
rh 2 1gives us the output of only 1 step, sorh 2 0should give us 2 steps, right?No:
This is correct, if we expected the output of
rh 2,rh 1andrh 0, but then the aboverh 2 1should work the same way and give usrh 2andrh 1.Now i try to print several steps at once, but this time i use a comma to have them not coherently. So this time no range, only individual steps:
Looks good. As expected.
What about:
Only
rh 4is printed, norh 2.And several unrelated steps:
So this seems not to work this way.
rh in from 0 to 3looks better, it gives at least what i have expected:So i wonder, can this be done in reverse order?
Sadly no.
Above i tried
rh in from 0 to 3, now i wanted to see whatrh in from 0 length 3will give me:Okay, this is correct, if we assume that length n is the number of steps and not a range.
Personally i think i will stick with
rh in from 0 to 3because that's what is not confusing and will be the range feature i will very likely need the most time.
But it would be more desirable if
rh 0 3equaledrh in from 0 to 3.Thank you.
Okay.
Why do you need two encodings for this when the text output could be created on the fly from the information encoded in the intermediate binary format?
You only need a text output function that creates the text output from the information in intermediate binary format. And this output function could be used for R, RH and what else goes with it and you would have a silent buffer too, because the encoded intermediate binary format is only internal data, not text output on the screen.
The text output is done by the text output function that takes the encoded intermediate binary format data as parameter.
and
It would lessen the use of the data dump in some cases, but it would be correct and consistent to dump D. And in cases of an empty dump, this could be supplemented with another line.
But i accept, when you don't want do it. In this case your suggestion of a +NNNN hint would be the best option, i think. At least for 16 Bit real mode programs.
Thank you very much for that hint.
I thought about taking this option into my default configuration file, but since then two additional lines always have to be outputted when this option is enabled, I changed my mind.
Couldn't this function be made part of dump
dso it's available on a case use basis when you need it? So that you just have to add a parameter withd, and then d automatically displays this help?The
topparameter for outputting non-ASCII characters already exists inddump, so how about another short parameter word for this help?Something like
htfor header and trailer.I tried to find a more meaningful term for the
htparameter,helpwould be good too, but the argument againsthelpis that it might be necessary for a built-in online help. And then that would be rather confusing if used somewhere else in a different context.But you could also simply use the word
human, as is known from various Unix command line programs (du -h,free -hetc.) for an output that is easy for humans to understand or read.I know, thanks anyway. I used DS to be more precise.
Last edit: Oliver 2023-07-21
No, the two-parameter form specifies the first parameter (start of dump) exactly the same as the one-parameter form.
rh 1 3asks to start the dump at step 1 (the second-most recent step) and go on for 3 steps (counting down towards the present moment, from 1 to 0 to ...). It doesn't care that there are only 2 steps to show, that's on you as the user.Same user error as above. You're specifying to start at step
rh 0and then display 2 steps. But there is only the one if you count towards the present moment.rh 1 2starts dumping at steprh 1and then counts up to two steps towards the present moment. Sorh 1 2is likerh 1thenrh 0.Same user error. This is like
rh 1thenrh 0then a no-op as the counter doesn't ever go negative.Start from step
rh 2then do count down torh 1andrh 0. Exactly as I intended.0 as the second parameter is (now) special. It will display every step, starting at the step specified by the first parameter, down to step
rh 0.This is correct.
rh 2 1means start with steprh 2and display exactly one step. This is the same asrh 2.As I mentioned 0 as the second parameter is special, it means all subsequent steps are displayed. As I wrote,
rh 2 1really is the same asrh 2.This means start at step
rh 4(nonexistent) and dump up to 3 steps. It is the same asrh 4thenrh 3thenrh 2. Therh 4command newly will not output anything because a step that old doesn't exist.Do note that in the two-parameter RH command form, the comma is completely optional. It has no effect.
rh 4,3is the same asrh 4 3.By the way, if you run
re.replace @r ax .before thettrace commands then the debugger will output only the AX value for each trace step. That should help making examples. Reset this state usingre.replace @r.No, this is step
rh 3. The steprh 4doesn't exist so it produces no output.There is no three-parameter form of the command. If you want to pass a list use
rh in 4,2,0(the comma is needed here).This would require changes to the match range parsing.
By "encoding" I didn't mean the data layout of the data here. I meant the program logic that "encodes" the data. The two encodings I referred to are the two different handlers needed for your scheme: One that encodes from the debugger variables to the binary/compressed form, and another that decodes the binary form and encodes this data in text form.
This would be the second encoding in my notation. Creating this binary form would be the first encoding.
I should get to that some time soon.
I don't think I will add that.
Thank you, now i understand the working of
rh i k.BTW, there is an issue when using
gfollowed byrh. rh only shows steps that where done bytorp, but not byg.And when doing some steps followed by
Land then some steps again,rhshows the steps before and after theL.If I understood that correctly,
Lloads the initial state of the program and, as I understand it, should also flushrh. But that is not the case, the history of the old steps is kept.Thank you for the hint.
I understand. Thank you for your clarification.
Sounds good.
That's sad to hear. It would be a useful feature.
That would be quite the bug, but I cannot reproduce it. Please list an entire session showing this problem.
Yes. You can
uninstall rhtheninstall rhto discard the earlier entries explicitly.Session not showing the G/RH bug:
I made a screenshot but it's not a bug.
I think it's because of the way how rh works. When i wrote my last comment, i have forgotten that rh only logs the output. Thus there can be of course no steps logged made by g, except the output
Program terminated normally (xxxx).So it's not a bug, it's just a result of how rh works.
In order to change this, internal logging of the steps would be necessary. But we've already discussed that. And with such an internal logging of the steps, for example, the execution of a debug session with g until the next brake point appears, would certainly slow down the execution if every step were logged internally. So that wouldn't be good either.
Thank you, that workaround will be useful.
Are there cases where it might be useful to keep the old outputs from before the
Lcommand? If not, then you could automatically initiate thisuninstall rhandinstall rhwith theLcommand.I have a new idea. If you decide one day to integrate internal logging this disadvantage of a too slow logging
gcommand could be circumvented, by just offering twogcommands. Onegcommand that woks normally as before without logging and another command started withgg, thus two g, that uses internal logging.Correct. If you happen upon a breakpoint (temporary (gg), permanent (bb) hit or pass, or not managed by the debugger at all; that is "unexpected") then the RE output from the G command is also captured into the RH buffer. But no other output is ever written by G.
This is actually what the silent buffer was originally created for. You can use
tp ffffff silent 1(the Fs are just to provide a very large repetition count) and the debugger will stay silent, record the last X steps into the silent/RH buffer, and then once the control flow returns to the user, the debugger will show only the very last step from the buffer. (Omit the number to show the full buffer contents. Omit thesilentkeyword clause entirely to display every step as it occurs.)Afterwards, you can use the RH commands like usual to inspect the last X steps still saved in the buffer.
The execution is indeed much slower if you dump registers and disassemble for every step. Even just the overhead of the debugger's tracing (if you disable the register dump and disassembly) is enough to make this hundreds if not thousands of times slower than using the G command.
For questions like this I like to apply a small heuristic: Which choice can emulate the other one completely? If L did this, I would have to either add an option for L not to do this, or I would be unable to issue an L command without resetting the RH buffer. If L continues not to reset the buffer, you can work-around this by explicitly issuing these commands. So I won't change this.
I triggered a bug in the new version 2023-07-24.
Instead of my
HELLO.EXEfile for testing, i accidentally loaded myHELLO.ASMfile.Then i entered
g.This resulted in the following output:
Then i entered
lFrom this state on, i can no more exit with
q. I can print the help with?and use commands likerhandu, but the latter two seem to produce garbage and errors at this state.Only a reboot of my FreeDOS VM helped.
Gladly this bug is reproducible.
I added my hello world asm file to this bug report so you can test it.
I also tried to see, what happens, when i load my
HELLO.EXEexecutable file.And there seems to be some issue with the character output after reloading the program with
l:Here
qquit works, but something changed theHto a╠with thelload command. I also added theHELLO.EXEfile, so that you can reproduce it.I will now test your new features, this will take a little bit.
By the way, I have a new suggestion.
Before loading a file into ldebug, how about checking the file extension, whether it's a
.EXE,.COM,.ROMor.BINfile?And if it's something else, the user should be asked if he really wants to load that file. This can prevent accidental loading of non-executable files, like I did..
So sth. like this:
And for EXE, COM, ROM and BIN files, everything remains the same as the original behavior.
BTW, this
╠character seems to be moved along the string when theGgo command points behind the last program address.This codepoint is what 0CCh, the
int3breakpoint instruction, looks like when you display it to your terminal. By runningg 10you happen to replace the "H" byte of your text by the 0CCh. This is not a bug, it is expected that your data can be corrupted by placing a temporary breakpoint into it.Hm interesting, the next instruction at offset 0010 after the
int 21instruction to exit to dos is adec ax, so i assume it will never be executed withg 10.Otherwise shouldn't
g 10then causeAXto be decremented by 1?If i try to simulate it manually:
Let's see where i am
It seems to be, that
IPhas changed pointing to 100h, which is something different than thisdec AX. I assume it's probably a return point to some DOS routine or ldebug.And if i do this last step:
All registers seem to be set to 0, including IP.
My Hello World String is also unchanged and not touched.
So there seems to also be no write access to DS's memory.
And also no output to stdout. And if that were the case, wouldn't the character have to be at the end of the string?
I also tried to see, what happens when i do a traced execution:
rh also ends at the Program terminated normally message.
Why does
g 10execute another command at all? Shouldn't it end when int 21h ran?Calling the DOS terminate process function (21.4C) will indeed return control flow to the parent process, which is the debugger. The debugger will then re-create an empty process.
An empty process created by the debugger, as well as a process created from loading a flat-format .COM file, will have a word with the value zero on the stack. (Usually at
word [SS:FFFEh].) If the program runs aretninstruction with this stack, it will jump to PSP:0, which holds anint 20hinstruction which is another DOS terminate process call. IP and SP happen to be zero then, the other zeroes are from the process initialisation.The PSP, the
retninstruction, and the stack are re-initialised. The other data that may happen to be in the process segment are not overwritten. (But your DS may change from the prior process segment.)I don't understand. What output do you expect?
TP 10andG 10are very different. One specifies to do up to 16 steps, unless interrupted by an unexpected fault. (This includes the process termination.) The other specifies to run code with a temporary breakpoint at address CS:0010h.Don't understand this either. Please detail exactly what command you used, what it did, and what you expected instead.
That's good to know. Thank you for your answer.
Thanks.
Well, i was wondering about this '╠' character.
Depending on the steps g has to take, it is written over the
Hello World!in stdout.The string "Hello World!" is not changed in memory, so it can't be that. Also, the string is printed out before "int 21" is reached. Therefore, that character can only be printed after it, and in that case it overwrites the string instead of printing it at the cursor position after the string.
Normally, however, I would assume here that a character is always output at the end of the cursor, unless an explicit cursor position has been selected or the cursor has been reset or changed. So after "Hello World!". But that doesn't seem to be the case here.
It looks like the cursor does a carriage return and then, depending on the steps in
g n, moves the cursor one character at a time to the right and only then prints that character.Ah, now i understand.
Now i counted the steps
g 10has to take by listing U.I counted 7 steps + 1 outside the program after the termination ended to emulate that
g 10. So this equalstp 8.With
tp 8i get at the same position asg 10, but this character '╠' is printed nowhere like it is done withg 10. Strange.Well i assumed or expected when a program terminates before reaching the brakepoint of g, then the g command stops before going to that breakpoint.
But that seems to be not the case. G just continues with its mission printing
╠characters, until the additional steps taken correspond to the breakpoint value.The
int 21instruction to terminate the program and return to dos/ldebug is atIP = 0Eand requires two bytes like expected.Thus the next command and where the breakpoint is set would be at
IP = 10, but this is obviously never reached because of the retn, but G still continues.And between
G 10andG 19it prints this special character over theHallo Welt!output.And depending on the first digit, the cursor is moved accordingly.
It looks like the following: (Before every
gcommand, i quit ldebug and restarted it withhello.exeloaded.)Interestingly, whatever the value of n in
G nhas, as long as n is greater than 0E, g stops always execution before theretninstruction. This means, if i run r,retnis the next instruction.But, this ╠ character is still printed.
If i count that ╠ steps, they correspond to the length of the "Hallo Welt!" string + its CR and LF and "$" character.
In the asm code, the string is this:
Thus 14 bytes. And this ╠ character is printed 14 times between
g 10andg 1D.There is the relation. The length of the string determines the maximum steps this ╠ character takes in stdout.
My point is,
g 10tog 1Dshould just end after the program termination and not trying to print a╠character.The parameter (number) that you give to G is not "the steps G has to take". The parameter for G is completely different from the parameter for T/TP/P.
https://pushbx.org/ecm/doc/ldebug.htm#cmdg :
https://pushbx.org/ecm/doc/ldebug.htm#cmdp :
The fact that you specify a single number to G does not make this a "step count" like the (single) P parameter. It is simply parsed as a single breakpoint specification, writing a temporary breakpoint at the specified segmented address, parsed with a default segment of the current CS.
This is wrong. The "G 10" command works like the following sequence. It saves the original contents of the
byte [cs:10]then overwrites it with a 0CCh (int3) opcode.If you enter the first two commands into the debugger you can observe that the memory is changed during this sequence:
This would be right if the memory indeed wasn't changed during the int 21h service 09h call, but as it is all of this is irrelevant.
Not strange at all. TP commands will trace/proceed past single instructions, so the only breakpoints it writes will be like after
intinstructions or the like. These will all be in the code section of your program, except for the breakpoint after the finalint 21h(the one running function 4Ch). But that final breakpoint, while it does overwrite the "H" of your message, will only be set while executing the finalintcall, not while the function 09h call occurs earlier.No. As explained before, the temporary breakpoint you are writing is in the data part of your program. After DOS returns to the debuggee process's PRA (Parent Return Address), that is to the debugger, the G command will finally restore its breakpoints if possible (if they haven't been overwritten yet). Then the G command detects that no pass or non-pass bb point was executed, but rather that the PRA was entered, so it will display the message about "Program terminated normally" and return to the debugger command line. (The
cmd3command loop of the debugger will re-create an empty process when it detects that the current process has terminated.)This happens because your example's data section can be addressed from your example's code segment (behind the end of your code section). If something else, such as the bottom of the stack section, was at this address then you wouldn't get the same result.
If it doesn't crash you can reload the program by using the no-parameter "L" command. (Or in the case of loading eg "HELLO.ASM", you can use the "QA" command then the "L" command. I noticed that the behaviour of the debugger differs for this case as it will not re-initialise the CS:IP registers then. This is a holdover from MSDebug.)
This is because you overwrote the second byte of the
int 21h(CDh 21h) instruction with the CCh byte (int3single-byte instruction). So you just changedint 21htoint 0CChwhich presumably crashes sooner or later.In this case you overwrote the dollar-sign (U+0024) terminator with the 0CCh so after the CR LF linebreak you get that point and then garbage afterwards as DOS continues to display things until it randomly encounters a 24h byte.
Exactingly, G stops execution once the program control flow enters the debugger's PRA it set up for the debuggee process. The command line loop of the debugger then detects that the process terminated and subsequently initialises an empty process with the
retninstruction. This may or may not happen to be at the same segment address as your program's process was.As explained, there is no bug here. You're happening to access your data section using your program's code segment, and that corrupts your string data. The G command does not run anything after the program terminated. (Except for the creation of a new empty process, but that's in the debugger's
cmd3command loop.)Yes, these are addresses, but with steps i meant complete instruction steps.
I also said:
So i took that into account, that these are actual addresses.
I probably worded it a bit unhappily
I agree.
Thank you for the clarification.
Now i understand it.
Suggestion:
How about an additional column when outputting the assembler listing with the
ukey?There is still free space on the right side and you could display the opcodes in the second column as ASCII characters in this new last column.
This would immediately show in cases where the code and data segment are the same when the data section starts and if there is a readable string present.
Example:
In this
ulisting, it is clearly visible, that at offset0010begins the data area with a 'H' character and when read vertically it is the string "Hallo Welt!".And in the code section it is also usable to directly see, what kind of ASCII character is copied into or from a register.
Example:
It might be only a nice to have feature, but if the code size allows it and doesn't consume too much RAM, it might still be useful.
and
I understand. Thank you for clarification.
and
I understand and i agree.
and
Thank you for your clarification.
The hello.asm file does crash eventually for me if I use T commands, but QA, L, and Q commands still work at that point. If I do use the G command it crashes my dosemu2 machine, either directly (
-E "ldebug c:\bin\lddebug.com hello.asm") or similarly to your case I get an "Invalid opcode" fault and then the debugger doesn't work correctly any longer (-E "ldebug.com hello.asm").However, this is unlikely to be due to a debugger bug. When the machine crashes, it can corrupt parts of its own process, of the DOS, or of the debugger.
I agree on that if random data is loaded as program code, strange things can happen if that data is executed as code.
But that's why i recommended to check the file extension that is loaded to inform the user and prevent such accidents by asking him again, if he is sure if he wants to load data as program code.
The crash of dosemu2 shows, that dosemu2 is working differently like the emulator QEMU i use. In QEMU, loading this data in ldebug allows to do random stuff inside the memory of the emulated machine, without crashing the VM.
I understand where you're coming from, but I think I'll just chalk this up to user error. You were able to figure out the cause of your problem by yourself, after all. I could add an option for this but I would not want to include this code in the default build, as the lCDebugX build is very close to 65_536 bytes of the code segment being filled, already. So if you wanted this option either you would have to build the debugger yourself or I would have to add a special build with options like this one enabled. (Trivially possible, but a bit more work for me. And would have to decide what to call this build.)
Yes, dosemu2 is not as robust as qemu in this regard I would guess.