Menu

#3617 Something weird about Debian GNU/Linux on aarch64 regression tests

closed-fixed
None
other
5
2023-10-10
2023-07-16
No

As can be seen on the snapshots page ("sdcc-snapshot-aarch64-unknown-freebsd13.1-20230715-14232.tar.bz2"), some of the regression tests are failing for Debian GNU/Linux on aarch64: Lots of test-ucz80, some of test-ez80-z80, test-ucr2k, test-stm8-large and test-dpk15-stack.-auto.

I've done an in-tree build on the same machine (a Raspi 4), but I only see a small number of failures for test-ucz80 (all of them float func stuff), but the others pass.

Related

Bugs: #3619
Bugs: #3627
Wiki: SDCC 4.4.0 Release

Discussion

  • Philipp Klaus Krause

    And the small number of failures that I do see in the in-tree test are not really reproducible: I just looked into a test failure today, tracked it down to the exact wrong line in the generated asm. Then rerun the compiler with the same options, and the result was correct! valgrind doesn't show any reads of uninitialized memory either.
    The failures appeared around the time the next and genconstprop branches were merged to trunk. They only affect the Raspi 4 running Debian GNU/LInux, not the FreeBSD one.

    I wonder if I am seeing bitflips: Rowhammer research indicates that they are far more common on Raspi4 than on previous Raspis.

    P.S.: I upgraded Debian on the Raspi4 to latest bookworm, rebooted it, ran memtester a bit. No errors found so far. I'll rebuild sdcc and rerun tests. Later maybe also run memtester while doing the regression tests (in case the problem only happens during high system load).

     

    Last edit: Philipp Klaus Krause 2023-07-29
    • Philipp Klaus Krause

      I still see those problems. This time, __fseq for z80n was affected. After a recompile, the code then looked fine. This is the diff from the correct, to the incorrect version:

          .zxn
          .module _fseq
      @@ -82,7 +82,7 @@
          ld  h, -7 (ix)
       ;  spillPairReg hl
       ;  spillPairReg hl
      -   ld  c, -6 (ix)
      +   ld  c, #0x00
          ld  b, -5 (ix)
          ld  e, -4 (ix)
          ld  d, -3 (ix)
      @@ -115,7 +115,7 @@
          jr  NZ, 00104$
          ld  c, -8 (ix)
          ld  b, -7 (ix)
      -   ld  l, -6 (ix)
      +   ld  l, #0x00
       ;  spillPairReg hl
       ;  spillPairReg hl
          ld  h, -5 (ix)
      

      As we can see (and this is the same problem that I saw in the asm for other functions for other targets earlier), at some places a #0x00 is used instead of the actual variable. Since the problem appeared after merging the genconstprop branch to trunk, the by far most likely cause is that information from generalized constant propagation is incorrect at the time it is used in code generation, in particular aop->valinfo.knownbitsmask when used in aopIsLitVal.
      I do not know where it goes wrong - it could be during generalized constant propagation analysis, or at a later time up to code generation. valgrind still does not show any reads of uninitialized memory, though, so I wonder why the bug cannot be reproduced reliably.

       
      • Philipp Klaus Krause

        I've found and fixed some issues that might have caused this; while the failures seem less common now, there still are some.
        Now I'm using SDCC instrumented with sanitizers, and hope that will help. Even if it doesn't fix this bug, I've already found a few more issues in SDCC, so the effort will improve SDCC code quality.

         
  • Philipp Klaus Krause

    Since today's round of UB fixes in [r14254], I haven't seen these test failures again.
    But I better rerun the test again (and wait for results on the snapshots page) to be more confident.

     

    Related

    Commit: [r14254]

  • Philipp Klaus Krause

    • status: open --> pending-fixed
    • assigned_to: Philipp Klaus Krause
     
  • Philipp Klaus Krause

    • status: pending-fixed --> open
     
  • Philipp Klaus Krause

    I haven't seen failures in three round of regression testing on an (in-tree) build of sdcc. But there are still failures on the snaphots page (which does an out-of-tree build, though I have no idea if that makes the difference).

     
  • Philipp Klaus Krause

    • status: open --> closed-fixed
     
  • Philipp Klaus Krause

    Fixed recently: [r14369].

     

    Related

    Commit: [r14369]


Log in to post a comment.