From: <MSoegtrop@MichaelSoegtrop.de>  20020321 20:42:38

Dear Scott and Michael, there is one more option besides compare/subtract, subtract and restore by copy, subtract and restore by add, and this is subtract OR add. You make one add or subtract in every cycle and no restore or compare. This algorithm looks like this: int udivtst(int a, int b) { int c,q=0 b<<=(nBitsQ1); for(c=0; c<nBitsQ; c++) { if(a<0) a+=b; else a=b; q<<=1; q+=a>=0; b>>=1; } return q; } The compares a>=0 and a<0 can be extracted from the flags of the add or subtract in the previous cycle (a compare with 0 is not really a compare!). You need the same number and size of registers as for the compare/subtract algorithm, but you save the comparison. In the appended file i have a C test program, that demonstrates the algorithm. The implementation is intended for signal processing. If you make, say a 16Bit / 8Bit => 8 Bit division, the result will be saturated to the maximum value, if it doesn't fit in the result. A division by zero results in the maximum value as well. The modulus sign of the signed version is designed in such a way, that (a+b/2)/b gives int(round(double(a)/double(b))), regardless of the sign of a and b (i hope, it does what it is designed for). If the loop is unrolled, and the result is known to fit in q (for this reason, i don't recommend this), b and q can share 1 register, and the a>=0 and a<0 is extracted from the sign bit at a location in a, which is not messed up by the q bits in q:b. This is a different bit in every cycle, but this does not hurt on a processor with bit access. If you don't need the remainder, you can leave away the extra remainder fixing code in the file. As far as i know, this is the fastest general, integer exact "one bit at a time" division algorithm on most machines. Here an extract of a paper, in which i explained why this algorithm works: If the result is positive, the corresponding resultbit is a one, and a zero otherwise. Now, if the intermediate divisionrest once became negative, we have subtracted to much, but it turns out, that if the divisor is added rather than subtracted in the following bitposition, the effect is the same, as if the subtraction, which made the intermidiate divisionrest negative would not have been executed, and the divisor would have been subtracted in the next bit position. This is cause of the following equivalence: r  2^n d + 2^{n1} d = r  2^{n1} d This trickery does only work with binary numbers. If the 2 in the above equation is replaced by a different number, the equation no longer holds. The advantage of this trickery is, that we need not figure out, whether a subtraction is possible or not. In every step, either a subtract or an add is executed. I hope this helps Best regards, Michael Original Message From: sdccuseradmin@... [mailto:sdccuseradmin@...]On Behalf Of Scott Dattalo Sent: Thursday, March 21, 2002 2:37 PM Cc: Sdccuser@... Subject: Re: [Sdccuser] need a very fast 18bit / 15bit div On Thu, 21 Mar 2002, Bernhard Held wrote: > > Besides this, > > many division algorithms i have seen are not optimal: they first compare > > and then make a subtraction or not. It is a bit tricky, but more > > efficient to leave away the comparison and switch between subtracting > > and adding. This way there is only one multi byte operation per result > > bit. I don't know whether the SDCC library implementation is of the > > latter variant. > I had a look at the "optimized" algorithm, but it has the disadvantage, that > you have to store an intermediate result. There are too less registers to > store this additional result in registers, so I had to use expensive moves > on the stack. > > The result was, that the "optimized" algorithm was a little bit slower with > an increased memory consumption. Just a comment... The difference between compare/subtract versus subtract/restore falls into the category of restoring versus nonrestoring algorithms. Depending on the processor, algorithm, and the data, one implementation can favor the other. Almost all division algorithms I've seen are of the nonrestoring type (i.e. compare first and then subtract if the compare says you need to). This makes sense if the whole operation fits into the processor's registers. But for multibyte operations, a restoring algorithm can be more efficient. Nikolai Golovchenko has implemented a restoring division algorithm for the PIC that is faster than anything out there (except perhaps an unrolled loop). Here's his 24bit divide by 16bit: http://www.piclist.com/techref/microchip/math/div/24by16.htm Scott _______________________________________________ Sdccuser mailing list Sdccuser@... https://lists.sourceforge.net/lists/listinfo/sdccuser 