[sdcc-devel] We need more high-level optimizations

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello, fellow sdcc developers,
I have now implemented most of the improvements in Z80 code generation
I've been working on earlier this year. You can see the results at
http://sdcc.wiki.sourceforge.net/Philipp%27s+TODO+list
At the bottom of the page is a code size comparison of different
compilers. As you can see there have been substancial improvements in
Z80code generation. However there are still cases where sdcc is much
worse than other compilers.
Ironically the worst case I found comes from sdcc's own library. It's
the code for multiplication of 32 bit values.
You can find the C source and compiler options used at the page
mentioned above.

I have attached the asm output from sdcc #5198 (current svn), HITECH-C
7.80PL2 and z88dk 1.8.
sdcc used 689 bytes (819 in 2.8.0-rc2), HITECH-CC 268 and z88dk 340.
Looking at the generated asm we see two problems in sdcc's code:

The minor one is the calling convention: We push everything on the
stack, while the other compilers pass arguments in rgeisters.

The major problem is stack use: sdcc used about 26 bytes of stack space
for local variables, while the other compilers use about 4 bytes. Most
of the code sdcc generated moves around data in these 26 bytes. Let's
take line 64 as an example (it's the first multiplication, the others
look the same):

Line 64 in C is:
t.i.hi = ((union bil *)&(a))->b.b0 * ((union bil *)&(b))->b.b2;
Let's compare HITECH-C and sdcc. Both use ix to access the local
variables and arguments on the stack.

Here's what HITECH-C generates:
ld	e,(ix+10)
ld	d,0
ld	l,(ix+4)
ld	h,d
call	lmul
ld	(ix+-2),l
ld	(ix+-1),h
Straightforward: Fetch operands (4 instructions), multiply (1), store
result (2).

And now sdcc:
ld	hl,#0x0016
add	hl,sp
ld	-8 (ix),l
ld	-7 (ix),h
ld	a,-8 (ix)
add	a,#0x02
ld	-6 (ix),a
ld	a,-7 (ix)
adc	a,#0x00
ld	-5 (ix),a
ld	hl,#0x001E
add	hl,sp
ex	de,hl
ld	a,(de)
ld	-9 (ix),a
ld	hl,#0x0022
add	hl,sp
ld	-11 (ix),l
ld	-10 (ix),h
ld	a,-11 (ix)
add	a,#0x02
ld	-13 (ix),a
ld	a,-10 (ix)
adc	a,#0x00
ld	-12 (ix),a
ld	l,-13 (ix)
ld	h,-12 (ix)
ld	c,(hl)
push	de
ld	a,c
push	af
inc	sp
ld	a,-9 (ix)
push	af
inc	sp
call	__muluchar_rrx_s
pop	af
ld	b,h
ld	c,l
pop	de
ld	l,-6 (ix)
ld	h,-5 (ix)
ld	(hl),c
inc	hl
ld	(hl),b
The basic structure is the same. But its 29 instructions for fetching
the operands, 12 for calling and 5 to store the result. Most of it is
address calculations. I think we need some high-level optimizations to
improve the situation. Currently sdcc just sucks whenever code uses
structures or unions.

Philipp
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkh4x90ACgkQbtUV+xsoLprKiwCgmKHyUk6vFfNnmlQsJDzuq0Cu
dBYAoJ4a/zwfOPQv4gdmN26K8k99G6ys
=qqqz
-----END PGP SIGNATURE-----

[sdcc-devel] We need more high-level optimizations

The Small Device C Compiler (SDCC), targeting 8-bit architectures

[sdcc-devel] We need more high-level optimizations