|
From: Philipp K. K. <pk...@sp...> - 2008-07-12 15:04:50
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hello, fellow sdcc developers, I have now implemented most of the improvements in Z80 code generation I've been working on earlier this year. You can see the results at http://sdcc.wiki.sourceforge.net/Philipp%27s+TODO+list At the bottom of the page is a code size comparison of different compilers. As you can see there have been substancial improvements in Z80code generation. However there are still cases where sdcc is much worse than other compilers. Ironically the worst case I found comes from sdcc's own library. It's the code for multiplication of 32 bit values. You can find the C source and compiler options used at the page mentioned above. I have attached the asm output from sdcc #5198 (current svn), HITECH-C 7.80PL2 and z88dk 1.8. sdcc used 689 bytes (819 in 2.8.0-rc2), HITECH-CC 268 and z88dk 340. Looking at the generated asm we see two problems in sdcc's code: The minor one is the calling convention: We push everything on the stack, while the other compilers pass arguments in rgeisters. The major problem is stack use: sdcc used about 26 bytes of stack space for local variables, while the other compilers use about 4 bytes. Most of the code sdcc generated moves around data in these 26 bytes. Let's take line 64 as an example (it's the first multiplication, the others look the same): Line 64 in C is: t.i.hi = ((union bil *)&(a))->b.b0 * ((union bil *)&(b))->b.b2; Let's compare HITECH-C and sdcc. Both use ix to access the local variables and arguments on the stack. Here's what HITECH-C generates: ld e,(ix+10) ld d,0 ld l,(ix+4) ld h,d call lmul ld (ix+-2),l ld (ix+-1),h Straightforward: Fetch operands (4 instructions), multiply (1), store result (2). And now sdcc: ld hl,#0x0016 add hl,sp ld -8 (ix),l ld -7 (ix),h ld a,-8 (ix) add a,#0x02 ld -6 (ix),a ld a,-7 (ix) adc a,#0x00 ld -5 (ix),a ld hl,#0x001E add hl,sp ex de,hl ld a,(de) ld -9 (ix),a ld hl,#0x0022 add hl,sp ld -11 (ix),l ld -10 (ix),h ld a,-11 (ix) add a,#0x02 ld -13 (ix),a ld a,-10 (ix) adc a,#0x00 ld -12 (ix),a ld l,-13 (ix) ld h,-12 (ix) ld c,(hl) push de ld a,c push af inc sp ld a,-9 (ix) push af inc sp call __muluchar_rrx_s pop af ld b,h ld c,l pop de ld l,-6 (ix) ld h,-5 (ix) ld (hl),c inc hl ld (hl),b The basic structure is the same. But its 29 instructions for fetching the operands, 12 for calling and 5 to store the result. Most of it is address calculations. I think we need some high-level optimizations to improve the situation. Currently sdcc just sucks whenever code uses structures or unions. Philipp -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkh4x90ACgkQbtUV+xsoLprKiwCgmKHyUk6vFfNnmlQsJDzuq0Cu dBYAoJ4a/zwfOPQv4gdmN26K8k99G6ys =qqqz -----END PGP SIGNATURE----- |