## Re: [Sdcc-user] Even faster multiplication on Z80?

 Re: [Sdcc-user] Even faster multiplication on Z80? From: - 2009-04-02 13:46:10 Dear Philipp, >> [...] anyway I can >> suggest some very small corrections: > > Implemented in sdcc 2.9.1 #5422. Great, but...I did not expect such a hurried inclusion... - the comment wrongly saying "DE = less significant word..." wasn't changed in "HL =..."; - one more byte could be saved by replacing: 36 ld l,#0 37 ld a,b 38 ld b,#16 39 42 or a 43 jr NZ,2\$ with 36 xor a 37 ld l,a 38 xor b 39 ld b,#16 42 43 jr NZ,2\$ Regards, Marco PS: I've cleaned __divu16 form div.s, it's 30% faster now... I do only have to test it a little bit more... Should I send the next patch here? or to another ML? -- http://bodrato.it/

 Re: [Sdcc-user] Even faster multiplication on Z80? From: - 2009-03-30 14:03:43 Dear Philipp, > thanks for your help with Z80 multiplication. Thank you for your answer! > study, but it has been shown, that small integer values are far more > common than bigger ones. AFAIR values 0,1,2 combined are about as likely > to occur as all other values combined in an 8 bit integer. I'm sure you are right, because "char" is often used to store "boolean" values or for small cycles. But the real question is: are those values used as operands in multiplications? If you use the byteXbyte product as a building block for some more complex mathematical function, you usually have "random" bytes to handle. If you use it to access elements of a bi-dimensional array A[x+ROWLEN*y], then the result must be small... and so must be the two operands. A far faster code for small operands is: ld hl, #0 ld d, h or a jp 3 1: add hl,de 2: rl e rl d 3: rra jp C, 1 jp NZ, 2 It assumes the two operands are in A and E; put the result in HL... It is very fast when A<32, but code size is too big to be inlined... >> Drawbacks: >> - overwrites the accumulator A > For builtin code generation this is bad, since A is used to backup B > (when B is in use). If both DE and B are used, wrap into push bc/pop pc, (saving ld a,b/ld b,a). If only B is, save it to D or E (same as current). If only DE is in use, do nothing (save push/pop). > You might want to have a look at the 16 bit multiplication in mul.s or There is a faster way for 16x16->16, but it would be slower for 16x8->16... I have to think about it a little bit longer... anyway I can suggest some very small corrections: ---------8<---------8<---------8<---------8<---------8<---------8<----- --- sdcc/device/lib/z80/mul.s 2009-01-05 11:20:47.000000000 +0100 +++ ./mul.s 2009-03-30 15:30:08.000000000 +0200 @@ -29,11 +29,11 @@ ;; DE = multiplier ;; ;; Exit conditions - ;; DE = less significant word of product + ;; HL = less significant word of product ;; ;; Register used: AF,BC,DE,HL __mul16:: - ld hl,#0 + ld l,#0 ld a,b ; ld c,c ld b,#16 @@ -41,7 +41,7 @@ ;; Optimise for the case when this side has 8 bits of data or ;; less. This is often the case with support address calls. or a - jr NZ,1\$ + jr NZ,3\$ ld b,#8 ld a,c @@ -49,6 +49,7 @@ ;; Taken from z88dk, which originally borrowed from the ;; Spectrum rom. add hl,hl +3\$: rl c rla ;DLE 27/11/98 jr NC,2\$ ---------8<---------8<---------8<---------8<---------8<---------8<----- > I wonder how these would compare at device/lib/_mulllong.c This is 32x32->32, isn't it? It should be nice to write an ad-hoc function for this, if one can use BC,DE,HL,BC',DE',HL',IX,IY,A,A'... there as a lot of register-space to work on... Maybe also 32x32->64 is possible. Regards, Marco -- http://bodrato.it/
 Re: [Sdcc-user] Even faster multiplication on Z80? From: Philipp Klaus Krause - 2009-04-01 19:15:05 -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 bodrato@... schrieb: > [...] anyway I can > suggest some very small corrections: Implemented in sdcc 2.9.1 #5422. Philipp -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAknTvSoACgkQbtUV+xsoLpoEHwCfXhCV/6yYMykTwizIxSlQImPg o3kAni7rvbA0qnHYguLu6b0TWlG3C0UW =utwx -----END PGP SIGNATURE-----
 Re: [Sdcc-user] Even faster multiplication on Z80? From: - 2009-04-02 13:46:10 Dear Philipp, >> [...] anyway I can >> suggest some very small corrections: > > Implemented in sdcc 2.9.1 #5422. Great, but...I did not expect such a hurried inclusion... - the comment wrongly saying "DE = less significant word..." wasn't changed in "HL =..."; - one more byte could be saved by replacing: 36 ld l,#0 37 ld a,b 38 ld b,#16 39 42 or a 43 jr NZ,2\$ with 36 xor a 37 ld l,a 38 xor b 39 ld b,#16 42 43 jr NZ,2\$ Regards, Marco PS: I've cleaned __divu16 form div.s, it's 30% faster now... I do only have to test it a little bit more... Should I send the next patch here? or to another ML? -- http://bodrato.it/
 Re: [Sdcc-user] Even faster multiplication on Z80? From: Philipp Klaus Krause - 2009-04-02 21:07:16 -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 bodrato@... schrieb: > Dear Philipp, > > Great, but...I did not expect such a hurried inclusion... Well, it was a very simple change, now drawbacks, I understood how it works, the regression tests passed. > > PS: I've cleaned __divu16 form div.s, it's 30% faster now... I do only > have to test it a little bit more... Should I send the next patch here? or > to another ML? No. Please either create a feature request entry in the tracker for the improvement and attach the patch to it or create a patch entry in the tracker. http://sourceforge.net/tracker/?group_id=599 Philipp -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAknVKPgACgkQbtUV+xsoLpoiPwCfRdp1UnzZlRlnMPtS+SdJdvi4 aCsAoLY+uGfe5NAlgpSY1HZgPgFPa8P1 =jCAg -----END PGP SIGNATURE-----
 Re: [Sdcc-user] Even faster multiplication on Z80? From: - 2009-04-03 08:15:48 Dear Philipp, > improvement and attach the patch to it or create a patch entry in the Done: http://sourceforge.net/support/tracker.php?aid=2727917 I did not rewrite the code, I only cleaned the existing one, please let me know. Regrads, Marco -- http://bodrato.it/
 Re: [Sdcc-user] Even faster multiplication on Z80? From: Philipp Klaus Krause - 2009-04-09 12:03:59 -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 bodrato@... schrieb: > Dear Philipp, > >> thanks for your help with Z80 multiplication. > > Thank you for your answer! > >> study, but it has been shown, that small integer values are far more >> common than bigger ones. AFAIR values 0,1,2 combined are about as likely >> to occur as all other values combined in an 8 bit integer. > > I'm sure you are right, because "char" is often used to store "boolean" > values or for small cycles. But the real question is: are those values > used as operands in multiplications? > If you use the byteXbyte product as a building block for some more complex > mathematical function, you usually have "random" bytes to handle. If you > use it to access elements of a bi-dimensional array A[x+ROWLEN*y], then > the result must be small... and so must be the two operands. I finally found some useful statistics: In "The New C Standard - An Economical and Cultural Commentary", on page 1144, see figures 1143.2. It has info on how common char values are when used as operands to * and /. Unfortunately it's not much, but it might be insightful. Philipp -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAknd5CMACgkQbtUV+xsoLppFXgCgg4LXzEmJLixT1tJ6sBA3ySfn IAMAn3kNOtDeAGj+L0Sx9YjDq7CPdhzs =rCA4 -----END PGP SIGNATURE-----
 Re: [Sdcc-user] Even faster multiplication on Z80? From: - 2009-04-15 07:57:16 Dear Philipp, >> I'm sure you are right, because "char" is often used to store "boolean" >> values or for small cycles. But the real question is: are those values >> used as operands in multiplications? >> If you use the byteXbyte product as a building block for some more >> complex >> mathematical function, you usually have "random" bytes to handle. If you >> use it to access elements of a bi-dimensional array A[x+ROWLEN*y], then >> the result must be small... and so must be the two operands. I wrote a small patch for 32x32->32 multiplication using both 8x8->16 and 16x16->16 product, this should speed up the Z80 implementation a lot... https://sourceforge.net/support/tracker.php?aid=2762462 > I finally found some useful statistics: In "The New C Standard - An > Economical and Cultural Commentary", on page 1144, see figures 1143.2. > It has info on how common char values are when used as operands to * > and /. Unfortunately it's not much, but it might be insightful. Unfortunately this is mainly about constants used in programs, and this should influence mainly the compiler part and marginally the library. The compiler should optimise r=a*0+b*1/8+2*c/1 with r=(b>>3)+(c<<1) at comile-time. The library can not check for too many special cases at run-time... I tried to add such a special handling to div/mod code, because I think that most of the times modulus is called to convert integers to strings (base 10 or 16...) with a small divisor. The result obviously is that code size increase! https://sourceforge.net/support/tracker.php?aid=2764475 One question, is it possible to have some functions with a different calling convention? Using registers to pass parameters, should obviously be very interesting for small library functions, but also the simple "stack clean-up by the callee" can (sometime) avoid a lot of coding both on the caller and on the callee side. Regards, Marco -- http://bodrato.it/