From: SourceForge.net <no...@so...> - 2011-05-06 20:08:11
|
Feature Requests item #3298447, was opened at 2011-05-06 21:08 Message generated for change (Tracker Item Submitted) made by u6c87 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=350599&aid=3298447&group_id=599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: z80 port Group: None Status: Open Priority: 5 Private: No Submitted By: Brian Ruthven (u6c87) Assigned to: Nobody/Anonymous (nobody) Summary: Optimise certain bitwise operations. Initial Comment: The following code: ======== set_res.c ======== #define BIT7 0x80 unsigned char bitfield; void set_bit(void) { bitfield |= BIT7; } void res_bit(void) { bitfield &= ~BIT7; } ======== end set_res.c ======== when compiled with "sdcc -mz80 -c set_res.c" produces the following assembler (skipping the pre/post-amble): ;set_res.c:6: set_bit(void) ; --------------------------------- ; Function set_bit ; --------------------------------- _set_bit_start:: _set_bit: ;set_res.c:8: bitfield |= BIT7; ld iy,#_bitfield ld a,0 (iy) set 7, a ld 0 (iy),a ret _set_bit_end:: ;set_res.c:12: res_bit(void) ; --------------------------------- ; Function res_bit ; --------------------------------- _res_bit_start:: _res_bit: ;set_res.c:14: bitfield &= ~BIT7; ld iy,#_bitfield ld a,0 (iy) and a,#0x7F ld 0 (iy),a ret _res_bit_end:: There are multiple things I personally think could be done better here (although I accept that registers already in use in more complex examples could get in the way): 1) It's great that something spotted that "|= 0x80" can be optimised to a set operation. However, the load/modify/store could be further optimised to an atomic "set 7, 0(iy)", and even modified to use hl (if available) to further save bytes and execution time. 2) The res_bit function could also be modified to make use of the res instruction. Then, the same as above applies with regard to the register selection. My preference would be hl as it uses fewer bytes, but a direct res 7, 0(iy) could be used as well. I suspect this should be done at the code generation / register selection stage rather than trying to do something with the peephole optimiser. Using "set 7, 0(iy)" reduces the example code from 12 bytes to 8, and using "set 7, (hl)" reduces it to just 5 bytes. Taking this approach would also make such code safe to be shared by interrupt handlers, although only for the single bit case. The above output was produced using SDCC : z80 3.0.2 #6484 (May 6 2011) (Solaris i386) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=350599&aid=3298447&group_id=599 |
From: SourceForge.net <no...@so...> - 2011-05-06 20:42:21
|
Feature Requests item #3298447, was opened at 2011-05-06 22:08 Message generated for change (Comment added) made by spth You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=350599&aid=3298447&group_id=599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: z80 port Group: None Status: Open Priority: 5 Private: No Submitted By: Brian Ruthven (u6c87) >Assigned to: Philipp Klaus Krause (spth) Summary: Optimise certain bitwise operations. Initial Comment: The following code: ======== set_res.c ======== #define BIT7 0x80 unsigned char bitfield; void set_bit(void) { bitfield |= BIT7; } void res_bit(void) { bitfield &= ~BIT7; } ======== end set_res.c ======== when compiled with "sdcc -mz80 -c set_res.c" produces the following assembler (skipping the pre/post-amble): ;set_res.c:6: set_bit(void) ; --------------------------------- ; Function set_bit ; --------------------------------- _set_bit_start:: _set_bit: ;set_res.c:8: bitfield |= BIT7; ld iy,#_bitfield ld a,0 (iy) set 7, a ld 0 (iy),a ret _set_bit_end:: ;set_res.c:12: res_bit(void) ; --------------------------------- ; Function res_bit ; --------------------------------- _res_bit_start:: _res_bit: ;set_res.c:14: bitfield &= ~BIT7; ld iy,#_bitfield ld a,0 (iy) and a,#0x7F ld 0 (iy),a ret _res_bit_end:: There are multiple things I personally think could be done better here (although I accept that registers already in use in more complex examples could get in the way): 1) It's great that something spotted that "|= 0x80" can be optimised to a set operation. However, the load/modify/store could be further optimised to an atomic "set 7, 0(iy)", and even modified to use hl (if available) to further save bytes and execution time. 2) The res_bit function could also be modified to make use of the res instruction. Then, the same as above applies with regard to the register selection. My preference would be hl as it uses fewer bytes, but a direct res 7, 0(iy) could be used as well. I suspect this should be done at the code generation / register selection stage rather than trying to do something with the peephole optimiser. Using "set 7, 0(iy)" reduces the example code from 12 bytes to 8, and using "set 7, (hl)" reduces it to just 5 bytes. Taking this approach would also make such code safe to be shared by interrupt handlers, although only for the single bit case. The above output was produced using SDCC : z80 3.0.2 #6484 (May 6 2011) (Solaris i386) ---------------------------------------------------------------------- >Comment By: Philipp Klaus Krause (spth) Date: 2011-05-06 22:42 Message: The optralloc branch has already been generating the res you suggested. As for optimizing into res 7, 0 (iy) I've now added a peephole in the optralloc branch to do that (and one for the set). Out of laziness I made it a peephole for now instead of doing it earlier. I'll have a look at the iy vs. hl thing later, probably will make a peephole out of that one, too. Unfortunately the iv vs. hl issue can't be solved very well in the code generator, since we don't know if we might be able to reuse the value in iy at some later time. The clean solution to this mess would be to make #_bitfield a rematerializeable variable. The the optimal register allocator in the optralloc branch would then automatically generate the optimal code using hl, without needing any peepholes. However implementing that will be more work (or use iy, if it decides that's better, e.g. when the value will be needed again later on and we could profit more by using hl for something else in between). I'll leave this open until the optralloc branch is merged. Philipp ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=350599&aid=3298447&group_id=599 |
From: SourceForge.net <no...@so...> - 2011-07-27 13:49:36
|
Feature Requests item #3298447, was opened at 2011-05-06 22:08 Message generated for change (Comment added) made by spth You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=350599&aid=3298447&group_id=599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: z80 port Group: None >Status: Closed Priority: 5 Private: No Submitted By: Brian Ruthven (u6c87) Assigned to: Philipp Klaus Krause (spth) Summary: Optimise certain bitwise operations. Initial Comment: The following code: ======== set_res.c ======== #define BIT7 0x80 unsigned char bitfield; void set_bit(void) { bitfield |= BIT7; } void res_bit(void) { bitfield &= ~BIT7; } ======== end set_res.c ======== when compiled with "sdcc -mz80 -c set_res.c" produces the following assembler (skipping the pre/post-amble): ;set_res.c:6: set_bit(void) ; --------------------------------- ; Function set_bit ; --------------------------------- _set_bit_start:: _set_bit: ;set_res.c:8: bitfield |= BIT7; ld iy,#_bitfield ld a,0 (iy) set 7, a ld 0 (iy),a ret _set_bit_end:: ;set_res.c:12: res_bit(void) ; --------------------------------- ; Function res_bit ; --------------------------------- _res_bit_start:: _res_bit: ;set_res.c:14: bitfield &= ~BIT7; ld iy,#_bitfield ld a,0 (iy) and a,#0x7F ld 0 (iy),a ret _res_bit_end:: There are multiple things I personally think could be done better here (although I accept that registers already in use in more complex examples could get in the way): 1) It's great that something spotted that "|= 0x80" can be optimised to a set operation. However, the load/modify/store could be further optimised to an atomic "set 7, 0(iy)", and even modified to use hl (if available) to further save bytes and execution time. 2) The res_bit function could also be modified to make use of the res instruction. Then, the same as above applies with regard to the register selection. My preference would be hl as it uses fewer bytes, but a direct res 7, 0(iy) could be used as well. I suspect this should be done at the code generation / register selection stage rather than trying to do something with the peephole optimiser. Using "set 7, 0(iy)" reduces the example code from 12 bytes to 8, and using "set 7, (hl)" reduces it to just 5 bytes. Taking this approach would also make such code safe to be shared by interrupt handlers, although only for the single bit case. The above output was produced using SDCC : z80 3.0.2 #6484 (May 6 2011) (Solaris i386) ---------------------------------------------------------------------- >Comment By: Philipp Klaus Krause (spth) Date: 2011-07-27 15:49 Message: In sdcc 3.0.4 #6686 (probably since the optralloc merge) set and res are used as you suggested. The use of set and res is already there in code generation, however a peephole is used to make them operate directly on 0(iy). I might add another peephole further transforming it to use (hl) instead later (if I encounter this sequence in production code). Philipp ---------------------------------------------------------------------- Comment By: Philipp Klaus Krause (spth) Date: 2011-05-06 22:42 Message: The optralloc branch has already been generating the res you suggested. As for optimizing into res 7, 0 (iy) I've now added a peephole in the optralloc branch to do that (and one for the set). Out of laziness I made it a peephole for now instead of doing it earlier. I'll have a look at the iy vs. hl thing later, probably will make a peephole out of that one, too. Unfortunately the iv vs. hl issue can't be solved very well in the code generator, since we don't know if we might be able to reuse the value in iy at some later time. The clean solution to this mess would be to make #_bitfield a rematerializeable variable. The the optimal register allocator in the optralloc branch would then automatically generate the optimal code using hl, without needing any peepholes. However implementing that will be more work (or use iy, if it decides that's better, e.g. when the value will be needed again later on and we could profit more by using hl for something else in between). I'll leave this open until the optralloc branch is merged. Philipp ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=350599&aid=3298447&group_id=599 |