Re: [Fastlibnt-devel] Scalar multiplication

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Apr 25, 2007, at 9:01 PM, William Hart wrote:

>> Whoah. GMP definitely does *not* "divide each limb
>> by that integer
>> and move on". On almost every architecture it uses a
>> multiply-by-
>> inverse. It's much much faster than doing hardware
>> division. They
>> basically do two hardware multiplies per limb
>> instead of one hardware
>> division. Granlund (and someone else) wrote a whole
>> paper on how this
>> is done in GMP. I think they even do it for
>> multi-limb divisors.
>
> Right, so presumably it does make sense to try my idea
> at some point then.

Yes absolutely. Not even "at some point". It's clearly the best  
approach for exact division, at least in the case of a single limb  
divisor, and where you know that the output polynomial has the same  
coefficient size as the input polynomial.

>>> Multiplying by an inverse seems out of the question.
>>
>> Why? GMP does it all over the place.
>
> Oh, I meant at the C level. Of course they will be
> doing it at a lower level, so I think it will be fast
> there.

Yeah, but if the coefficients of the dividend polynomial are small  
enough (say less than 10 limbs), and you call something like  
mpn_tdiv_qr for each coefficient, then you are re-doing the expensive  
pre-inverse step on every coefficient. What I'm suggesting is:  
compute the preinverse stuff once at the beginning of the whole  
polynomial division, and then use their lower level code to do each  
division *using* the preinverse.

Of coure this is a bit complicated since we would need to study their  
undocumented interface. I think for now we should just leave a note  
in the todo file, and come back to it later. But I think this is a  
very important optimisation later on.

> Zpoly_scalar_mul_ui in the mpn test code. But that is
> fine. You can rip it off when you get to that part,
> because I am quite certain it'll save you time since
> it is such a complex algorithm. It'll definitely save
> you writing a python prototype. :-)

ha ha ha

david