Hi,
At this point, I think I'm walking the path @visenri traveled couple of years ago to re-organize this part of STM8 peep.c mechanism and now I'm realizing how much effort it took.
I'm trying to keep my changes to a narrow scope as much as possible. I believe argCont function can do a lot of heavy lifting in this context and we can keep it there, but it needs a complete overhaul.
This patch resolves many issues (distinguishing half-registers, falsely matching user-defined variables and other false positives, sensitivity to irregular whitespace ...) by avoiding string search and just focusing on the structure of STM8 assembly code (looking at the commas and parenthesis for example).
I removed exg stand-alone code and delegated that to argCont. And since it can handle all register "read"s anywhere in the instruction, we don't need that special "ld" "ldw" search procedure at the end of stm8MightRead function.
I did a "count all the non-comment lines of all .asm files" in regression test "gen" directory for a before/after test:
Total .asm files: 4,461
Total non-comment lines: 1,417,044
After the patch:
Total non-comment lines: 1,416,912
Looks like detecting half-registers is already improving peephole scans, peephole rule 630x was a little bit handicapped for example. I hope it's not breaking anything.
Summary for 'stm8': 0 failures, 32370 tests, 6032 test cases, 5855120 bytes, 98306309 ticks
Summary for 'stm8-large': 0 failures, 32363 tests, 6032 test cases, 6258569 bytes, 104679695 ticks
Here's the function for reference:
/* Check if reading arg implies reading what. */
static bool argCont(const char *arg, const char *what)
{
char *tmp;
if (!(tmp = strchr (arg, '(')))
tmp = strchr (arg, ',');
if (!tmp || !*tmp)
return false;
arg = tmp + strspn (tmp + 1, " \t") + 1; // Skip blank characters
if (*arg == what[0] && arg[strspn (arg + 1, " \t") + 1] == ')')
return true;
if (tmp = strchr (arg, ','))
arg = tmp + strspn (tmp + 1, " \t") + 1;
if (*arg != what[0])
{
if (tmp = strchr (arg, ','))
{
arg = tmp + strspn (tmp + 1, " \t") + 1;
if (*arg != what[0])
return false;
}
else
return false;
}
// Register's first letter has already matched
if (!what[1]) // Return true if "what" is a full register like "a" or "x"
return true;
arg++;
if (*arg == what[1]) // Opcodes reading half registers
return true;
arg += strspn (arg, " \t");
return (!*arg || *arg == ')'); // Opcodes reading full registers
}
I tried the patch. For me, stm8 and stm8-large regression tests pass, and I see a small reduction code size.
Looks like you intend to make argCont assume a two-operand instruction where argCont is supposed to check if reading the second argument of the line arg implies reading what. That is a bit different from the previous semantics ("Check if reading arg implies reading what"). IMO, such a change in semantics,if intentional, should be reflected in the function name and comment.
The
strchrat the beginning looks like it could go all the way into a comment containing a (, thus skipping over the actual arguments.Thanks, I tried to clarify the new function with better name and comments (I hope). Also wasn't aware of comments here. See if anything else needs to be addressed here (so I can update the patch accordingly):
This looks better, but has limitations:
* It won't check the first argument, if that argument is a register argument (e.g.
add a, (x)oraddw x, _var.I wonder if this would be simpler/cleaner if it gets split into two function:
* A new argCont that checks a single instruction argument for a possible read of what (this function would receive a pointer to the beginnign of the argument it is supposed to check, and maybe one to the end
* A function like your readReg that finds the two arguments, then passes them to argCont.
Last edit: Philipp Klaus Krause 2025-03-06
Thanks. Based on your feedback, I've decided to focus the function on only index registers and give "a" register its own independent lines of code (since this function is used only in two places inside
stm8MightRead). For "a" register, since we're only interested in ld/ldf instruction, the code is much simpler, and I think we can directly inline it withstm8MightReadfunction:As for readReg function, now renamed
readIdxReg, not much has changed:But semantically it's much clearer, the code is slightly simplified before its return and much importantly, it requires much smaller mental map to navigate.
What do you think?
I'm posting the latest diff file here for the reference, so to keep track of the changes throughout the discussion and to make it easier to review fully when needed. Thanks.
Thanks. Sorry for taking so long to look through it.
In
readIdxReg,tmpandcshould be of typeconst char *,notchar *(usingconst char *was always the better style, and since ISO C23, strchr, etc actually returnconst char *when their argument isconst char *).Otherwise,
readIdxReglooks like it should work, except for theexgw,rlwa, andrrwainstructions (I haven't tested it yet).Thanks!
Very happy to see that you see some potential with this.
Regarding handling
exgw,rlwaand others like these, there are special if/then segments that properly handle them (especially the curious case ofcpwthat needs special handling +readIdxReg!). Because of their simple nature, I don't thinkreadIdxRegshould be asked to handle them.My question though, is how much we're going to accommodate user inline hand-crafted assembly and their abnormal whitespaces. SDCC manual strongly recommends against using
--peep-asmso we're not obliged to. InreadIdxRegI tried to accommodate irregular whitespaces. But they're manypl->line[n] == extrachecks in the code that assumes single spaces. We can cover them by a macro like:and replace the line like
by
But I'm not sure.
Regarding hand-written asm:
It is fine to not fully handle it, as long as we always err on the safe side.