The best should be to write a (small) asm replacement, but I do not know exactly how to do it (I can write the mullong.s, but how to convince the compiler to use it?)
Meanwhile I cleaned the C code for the long integer product (32x32->32), it uses a smaller local variable and replaces six 8x8->16 products (four of them should be 8x8->8, because the high byte is discarder) with two 16x16->16 multiplication.
The C code obviously works on all the ports, but it was developed with Z80 limitations in mind, that's why an #ifdef limits its use to the z80 environment. On the other ports this code MAY show a small gain in code size, but it SHOULD NOT give any speed improvement (or may be it slow down the library), that's why it is disabled.
This code is not looks like optimal.
On this site is more optimal code on z80 assembler: http://baze.au.com/misc/z80bits.html
The optimal one is in z88dk, a native 32-bit multiply.
as I understand this algo (pointed by aralbrec) is not used because it changes alternative register set. Why this restriction? Or need to port it?
I never claimed that this code was optimal! :-)
I repeat, "a (small) asm replacement" is the best improvement, but the proposed patch was better than the previous code.
Meanwile, I notice that z88dk introduced some GPL'd code in their library...
http://www.z88dk.org/forum/viewtopic.php?id=4195
Did they take care of the fact that the "special exception" is valid for SDCC?
I don't like having special cases and separate code for different ports, even if that provides a small advantage. There is a cost in maintaining the code after all.
But both the z80 and gbz80 ports would profit from this, and once RFE #372 is implemented, the r2k and r3ka ports would profit as well. On the other hand, z180, hc08, s08 and stm8 would suffer. So I enabled the use of this multiplication for the z80, gbz80, r2k and r3ka.
Patch applied in revision #8767.
Philipp