This bug was first reported in z88dk's zsdcc, but the z88dk developers have asked me to also report it upstream.
I wrote a C implementation of a solution for an Advent of Code puzzle (Day 19) to run under CP/M (under both tnylpo and on a Spectrum Next).
In an effort to reduce execution time due to costly indexing operations on the stack, since the functions were non-reentrant, I switched the code to use function-static variables. The revised code runs very nicely using Hi-Tech C (and Clang, including checking with its various sanitizers), but sdcc miscompiles the code.
Although, I have confirmed this bug persists when vanilla SDCC is used (both version 4.0.0 and 4.0.7 #12003), it's easier to build a stand-alone executable with z88dk, so here's a session under z88dk:
ubuntu@ubuntu:~/spectrum$ z88dk.zcc +cpm -compiler=sdcc -DAVOID_SDCC_BUG regexp-s.c -o zdrexp-s.com && tnylpo zdrexp-s
Matched (correct)
ubuntu@ubuntu:~/spectrum$ z88dk.zcc +cpm -compiler=sdcc regexp-s.c -o zdrexp-s.com && tnylpo zdrexp-s
tnylpo: HALT instruction executed
Results under vanilla SDCC are identical, building with -mz80 and no other special options.
Here's the code. I've whittled it down, but perhaps not as much as you'd like. Sorry if the algorithm isn't the easiest to fathom.
#include <stdio.h>
#include <stddef.h>
#include <assert.h>
enum { RULES_MAX = 4 };
typedef int* rule_seq;
typedef rule_seq* rule_alts;
int rule_seq_lists[] = { 3, 1, 3, -1,
2, -1,
2, 1, -1 };
rule_seq rule_alts_lists[] = { &rule_seq_lists[0], NULL,
&rule_seq_lists[4], &rule_seq_lists[6], NULL };
rule_alts rule[RULES_MAX] = { &rule_alts_lists[0],
&rule_alts_lists[2],
NULL,
NULL };
char rule_terminal[RULES_MAX] = { 0, 0, 'a', 'b' };
enum { GOOD_POOL_MAX = 64,
BAD_STACK_MAX = 16 };
struct good {
rule_seq seq;
struct good* prev;
} good_pool[GOOD_POOL_MAX+1]; /* +1 to avoid incorrect SDCC warning */
struct bad {
rule_alts alts;
char* old_pos;
struct good* old_good;
} bad_stack[BAD_STACK_MAX+1]; /* +1 to avoid incorrect SDCC warning */
int match(char* pos_arg)
{
static char* pos;
static struct bad* bad_sp;
#ifdef AVOID_SCCZ80_BUG
#define static
#endif
static struct good* good_alloc;
static struct good* good;
static struct good* rgood;
static rule_alts alts;
#ifdef AVOID_SDCC_BUG
rule_seq seq;
#else
static rule_seq seq; /* this line breaks SDCC */
#endif
#ifdef AVOID_SCCZ80_BUG
#undef static
#endif
static char* rpos;
static int ruleno;
pos = pos_arg;
bad_sp = bad_stack;
good_alloc = good_pool;
good = NULL;
rgood = NULL;
alts = NULL;
seq = NULL;
rpos = NULL;
ruleno = 0;
while (1) {
assert(bad_sp < &bad_stack[BAD_STACK_MAX]);
assert(good_alloc < &good_pool[GOOD_POOL_MAX]);
assert(ruleno >= 0 && ruleno < RULES_MAX);
alts = rule[ruleno];
if (alts) {
rgood = good;
rpos = pos;
goto first_alt;
}
assert(rule_terminal[ruleno]);
if (*pos != rule_terminal[ruleno])
goto failure;
++pos;
success:
if (!good) {
if (*pos != '\0')
goto failure;
return 1;
}
seq = good->seq;
good = good->prev;
ruleno = *seq++;
if (*seq >= 0) {
struct good* ngood = good_alloc++;
ngood->seq = seq;
ngood->prev = good;
good = ngood;
}
continue;
failure:
if (bad_sp == bad_stack) {
return 0;
}
--bad_sp;
alts = bad_sp->alts;
rgood = bad_sp->old_good;
rpos = bad_sp->old_pos;
first_alt:
good = good_alloc++;
good->seq = *alts++;
good->prev = rgood;
pos = rpos;
if (*alts) {
bad_sp->alts = alts;
bad_sp->old_good = rgood;
bad_sp->old_pos = pos;
++bad_sp;
}
goto success;
}
}
int main()
{
if (match("baaab")) {
printf("Matched (correct)\n");
} else {
printf("Failed to match (incorrect)\n");
}
return 0;
}
In [r12013], I added a (currently disabled) test for this bug, based on the example code.
It looks like all z80-related ports are affected, and also stm8.
On the other hand, mcs51 and s08 are apparently not affected.
Last edit: Philipp Klaus Krause 2021-01-04
--nolospre is a workaround.
So this is most likely either a bug in lospre or a bug in a later stage triggered by lospre.
Looks like lospre results in miscompilation of this part:
Here, lospre introduces a new iTemp102, that has the value of seq after *seq++.
The new value of seq (after *seq++) is used in the \seq for the if (ok), the seq for ngood->seq = seq (ok), but also in the calculation of ngood->prev = good (not ok I think) and good = ngood (not ok I think).
Lospre applied assignment forwarding to POINTER_SET operations, as if the pointer itself was assigned rather than what it was pointing at.
Fixed in [r12527]