|
From: David H. <dmh...@ma...> - 2007-04-26 01:22:26
|
On Apr 25, 2007, at 9:01 PM, William Hart wrote: >> Whoah. GMP definitely does *not* "divide each limb >> by that integer >> and move on". On almost every architecture it uses a >> multiply-by- >> inverse. It's much much faster than doing hardware >> division. They >> basically do two hardware multiplies per limb >> instead of one hardware >> division. Granlund (and someone else) wrote a whole >> paper on how this >> is done in GMP. I think they even do it for >> multi-limb divisors. > > Right, so presumably it does make sense to try my idea > at some point then. Yes absolutely. Not even "at some point". It's clearly the best approach for exact division, at least in the case of a single limb divisor, and where you know that the output polynomial has the same coefficient size as the input polynomial. >>> Multiplying by an inverse seems out of the question. >> >> Why? GMP does it all over the place. > > Oh, I meant at the C level. Of course they will be > doing it at a lower level, so I think it will be fast > there. Yeah, but if the coefficients of the dividend polynomial are small enough (say less than 10 limbs), and you call something like mpn_tdiv_qr for each coefficient, then you are re-doing the expensive pre-inverse step on every coefficient. What I'm suggesting is: compute the preinverse stuff once at the beginning of the whole polynomial division, and then use their lower level code to do each division *using* the preinverse. Of coure this is a bit complicated since we would need to study their undocumented interface. I think for now we should just leave a note in the todo file, and come back to it later. But I think this is a very important optimisation later on. > Zpoly_scalar_mul_ui in the mpn test code. But that is > fine. You can rip it off when you get to that part, > because I am quite certain it'll save you time since > it is such a complex algorithm. It'll definitely save > you writing a python prototype. :-) ha ha ha david |