Menu

#964 Optimize Z80 switch statements

None
open
nobody
None
5
2024-11-27
2024-11-21
Under4Mhz
No

When I have switch statement with both contiguous and individual values, the Z80 output will generate asm using or/sub pairs.

When the items is contiguous, can we use a inc a (or dec a) instead of a sub/or?

Of course, I'd prefer my [feature-requests:#963] (split switch statements up), but this is a good improvement if that's not feasible.

sdcc will generate the below for this switch which uses sub and or for contiguous values.

;./switchopt.c:9: switch( in ) {
    ld  a, l
    or  a, a
    or  a, h
    jr  Z, 00101$
    ld  a, l
    dec a
    or  a, h
    jr  Z, 00102$
    ld  a, l
    sub a, #0x02
    or  a, h
    jr  Z, 00103$
    ld  a, l
    sub a, #0x03
    or  a, h
    jr  Z, 00104$
    ld  a, l
    sub a, #0x0c
    or  a, h
    jr  Z, 00105$
    jr  00107$
;./switchopt.c:11: case 0:
00101$:

Can we change it to something like below?
(I assume the code below is probably wrong and won't work, but my point it to use a inc/dec for contiguous numbers instead of using sub with an incrementing fixed number).

    ld  a, l
    or  a, a
    or  a, h
    jr  Z, 00101$
    inc a
    or  a, h
    jr  Z, 00102$
    inc     a
    or  a, h
    jr  Z, 00103$
    inc     a
    or  a, h
    jr  Z, 00104$
    sub a, #0x09
    or  a, h
    jr  Z, 00105$
    jr  00107$
;./switchopt.c:11: case 0:
00101$:

Related

Feature Requests: #963

Discussion

  • Janko Stamenović

    Yes, your code is wrong, and I would not even understand what your intention would be without seeing 963, where you started with the switch which for the variable in being 1 assigns 1 to the int variable out, for 2 assigns 2, (up to 6), for values 12 and 18 assigns them to out, otherwise assigns 99 to out. That example of yours is solved by having an initial default 99, testing for in values 1..6, 12 and 18 and assigning that in to out. IMO the minimal implementation for many 8-bit CPU-s for that specific input code (in 963) would reduce the switch to at least something like:

    unsigned char t = 99;
    if ( ( (unsigned)in >> 8 ) == 0 ) {
        unsigned char t2 = in;
        if ( t2 < 7 && t2 != 0 )
           t = t2;
        else if ( t2 == 12 || t2 == 18 )
            t = t2;
    }
    out = t;
    

    where the worst case (for input 18) is having five 8-bit compares, for the most of the input range one compare, for the values 1..6 three, and for the input 12 four.

    Further, for your example of assigning values based on input values, one could use a simple table, allowing just two compares in the worst case, which could in some cases result in less bytes and generally be the fastest:

          static unsigned char mytable[] = {0,1,2,3,4,5,6,99,99,99,99,99,12,99,99,99,99,99,18};
           out = ( ( (unsigned)in >> 8 )  ) == 0 && (unsigned char)in < 19 ) )
                         ? mytable[(unsigned char)in];
                         : 99;
    

    At the current moment SDCC is made to, in most cases, reflect what the input is, i.e. assumes that if you are aware of some properties of your code, you already wrote something closest to the best way to handle them (e.g. something like these codes here).

    Independently of that, it is a fact that for the current SDCC there are many examples where the compares inside of the switch, or, more generally, a series of if compares, could be made more efficient by default, and you recognized some of them. If I'd try to describe what I saw, SDCC (seen for Z80) in a series of compares happens to be not aware of all the values it should compare or use, treating some compares as independent of the rest of them and using more complex compares than minimally needed.

     

    Last edit: Janko Stamenović 2024-11-21
    • Under4Mhz

      Under4Mhz - 2024-11-22

      For this ticket, I wanted to point out that it seems the Z80 switch code size could be reduced by 8 t-states per case item by removing the sub/or and replacing it with an inc, for cases that only increase by one.

      This seems like a reasonable optimisation that may be easy enough to implement to be worthwhile. I didn't in anyway mean to suggest that sdcc was doing anything incorrect.

      Ideally, I would have sdcc automatically generate your if/else code internally, while the C code is still a switch statement. This is my 963 request and you've given a perfect example of the code I'd like sdcc to generate.

      I apologise, my code was just a contrived example. You've effectively optimised it, but that's not a typical switch usage.

      I'm typically using an enum as a set of labels with an unknown order and an order that can change. I sometimes call other functions inside the switch. I have 80 functions that each use switch statements on the same enum of various values/ranges.

      Using if's with ranges is difficult, since the order of the values will change over time, as I change the external resource the enums represent. Then I will have to go change my 80 functions to reflect that.

      mytable will be difficult to maintain is well, since the enum order can change, and updating such a table will be difficult and error prone. I have an enum with 250 values, which I reorder, add and remove as my needs change. Often I want to call another function, not just return a value.

      It seems to me that the existing Z80 switch code can be reduced by 8 t-states per case item by using inc a, instead of sub/or, when the values are increasing by 1.

      Please forgive my broken example, I assume one of the talented Z80 programmers on your team will be able to see my intent and solve it correctly.

       
      • Janko Stamenović

        If you have 250 values you should definitely not use int but unsigned char for the value for which you do the switch.

        Once you use unsigned char and there are a lot of values for which something different has to be done, the fast implementation of switch has to internally maintain the table anyway and there couldn't be much saving (i.e. more than two bytes per every value will be needed). Only when there are not many values for which something different is done the compares are reasonable to be used. Your idea to use inc is then usable only in some special cases. Finally, if an assembly programmer would like to use inc or dec in a sequence of compares which differ to one, he wouldn't use or something but something like

        dec a
        jr z, Lcase1
        dec a
        jr z, Lcase2
        

        where the range would be first reduced to be 1..n. Maybe I haven't remembered correctly, but I think I've already seen SDCC using the compare like above in some case for which it recognized such compare could be used. Note that even this case uses 3 bytes per compare, and is limited to only a few compares and short jumps (a little code which does something), otherwise, even this approach would be 4 bytes per compare.

         
        • Maarten Brock

          Maarten Brock - 2024-11-23

          You should use a typedef for your enum. This helps to convey your intent, lets the compiler check the types and gives you more optimal code.

          typedef enum {A, B, C=12} mytype;
          static_assert(sizeof(mytype) == sizeof(unsigned char));
          
          int foo(mytype in)
          {
              switch(in)
              {
              case A:
              case B:
                  return 1;
              case C:
                  return 10;
              default:
                  return 0;
              }
          }
          
          ;rfe-964.c:8: int foo(mytype in)
          ; ---------------------------------
          ; Function foo
          ; ---------------------------------
          _foo::
          ;rfe-964.c:10: switch(in)
                  or      a, a
                  jr      Z, 00102$
                  cp      a, #0x01
                  jr      Z, 00102$
                  sub     a, #0x02
                  jr      Z, 00103$
                  jr      00104$
          ;rfe-964.c:13: case B:
          00102$:
          ;rfe-964.c:14: return 1;
                  ld      de, #0x0001
                  ret
          ;rfe-964.c:15: case C:
          00103$:
          ;rfe-964.c:16: return 10;
                  ld      de, #0x000a
                  ret
          ;rfe-964.c:17: default:
          00104$:
          ;rfe-964.c:18: return 0;
                  ld      de, #0x0000
          ;rfe-964.c:19: }
          ;rfe-964.c:20: }
                  ret
          
           
          • Benedikt Freisen

            You should use a typedef for your enum. This helps to convey your intent, lets the compiler check the types and gives you more optimal code.

            Wouldn't enum mytype : unsigned char { ... }; be the most elegant, now that we have it? The implementation is still very fresh, though, and perhaps not bug-free.

             
  • Philipp Klaus Krause

    A little bit of background (both for here and for [feature-requests:#963]):
    When SDCC encounters a switch statement, it makes a decision to either turn it into a single jump table or a kind of if-else chain. This decision is guided by a jump table size limit, and crude cost estimates for various aspects relevant to jump tables and if-else chains (see the jumptableCost member of the port struct). We currently do not have a mechanism to turn it into anything else.

    The if-else chain compares the original value to various bounds. Your proposal of using inc or dec would mean to adjust these bounds to match a modified value. If this is worth it is highly situational and depends on the target architecture. But it would still be a high-level optimization done on iCode.

    My feeling is that this optimization would be quite a lot of effort for relatively little gain, and that we currently have lower-hanging fruit to look into, such as [feature-requests:#941].

    P.S.: See geniCodeJumpTable in src/SDCCicode.c.

     
    👍
    1

    Related

    Feature Requests: #963


    Last edit: Maarten Brock 2024-11-23
    • Under4Mhz

      Under4Mhz - 2024-11-27

      Thank you Philipp.

      I thought that this might be the case, but I thought I would ask.

       

      Last edit: Under4Mhz 2024-11-27

Log in to post a comment.

MongoDB Logo MongoDB