Re: [TCLCORE] tommath is slow?

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Kevin Kenny skrev 2014-01-28 13.24:
> On 01/28/2014 03:12 AM, Lars Hellström wrote:
>> Yesterday, I came across the Barrett reduction algorithm
>> (http://en.wikipedia.org/wiki/Barrett_reduction), which looks like it could
>> be well suited for this -- it's basically about replacing a general division
>> with multiplying by a precomputed inverse, which makes sense when you're
>> using the same denominator in many divisions. (I definitely need to try it
>> out in [math::numtheory::isprime] when I'm not so swamped.)
>>
>> For bignum to string conversions, one could cache numbers 10**(2*k) (say)
>> and their pseudo-inverses, extending the cache as needed when a large bignum
>> gets shimmered. Unless I'm mistaken, this would produce an O( M(n) \log n)
>> divide-and-conquer algorithm for bignum-to-string, where M(n) is the
>> complexity of matrix multiplication. I'm not sure if it would be
>> straightforward to add such a global cache of useful numbers to Tcl, though.
>>
>
> Uhm. Right.

Yes, I suppose that was a bit talking to myself; sorry about that. I did 
work out the details, some days later (_much_ more fun than grading exams). 
See below for a Tcl implementation. I measure it (a recursive Tcl script!) 
as being faster than the C-coded UpdateStringOfBignum from about a 1000 
decimal digits and up. Such is the power of a better algorithm...

> libtommath has Barrett reduction inside, if I recall correctly, but it's
> not in the Tcl API.

Indeed; now that I've done some (way overdue) RTFM, RTFS and RTFB (having an 
actual book describing the library is such an advantage), I see that there 
are plenty of goodies in there. Whatever happened to the idea of exposing 
more of them at the script level?

As for Barrett-reduction when computing modular powers, I timed that 
(comparing two Tcl scripts) as being faster than the % operation already at 
about 90 decimal digits, which (if I interpret the docs correctly) seems to 
coincide roughly with where libTomMath switches over to the Karatsuba 
algorithm for multiplication. And this is just one of the tweaks 
[math::numtheory::isprime] could use. But it would get a bit obsolete if the 
corresponding functions in libTomMath were exposed instead.

> But the fastest "quick hit" would be to do base-10
> conversion as base-10**8 instead. That should give roughly an eightfold
> speedup at very little cost in complexity.

Yes, mp_toradix_n is clearly designed to be simple and generic rather than 
quick. So not really ideal for UpdateStringOfBignum, even though it gets the 
job done.

> It would also be nice to have Burnikel-Ziegler division. I started on
> that at one point, but gave it up when Tom started rejecting our
> patches; I figured it wasn't worth the effort if it was never going to
> get upstream.

My impression (from reading a bit in the google group archives) is that Tom 
has stopped developing the libTom* libraries, and that those who took over 
were mostly interested in the crypto side, although maybe in late 2013 there 
was someone who took over libTomMath with an intent to start adding stuff 
(https://groups.google.com/forum/#!topic/libtom/NipcXPetVNI, although no 
commits seem to have made it to github yet). So perhaps there would be more 
interest now. (Also, I saw some talk about reforming the build process to 
move away from auto* tools, which is perhaps a concern for Tcl.)

Anyhow, here's how to convert positive bignums to decimal digits in 
O(n**1.585) time, not much optimised (other than to avoid division):

proc Barrett_todigits {number} {
     # This proc just computes some tables that the worker
     # Barrett_todigits_rec needs. It's little enough work
     # that they need not be cached.

     set n 1000; set nL [list $n]
     set shift 20; set shiftL [list $shift]
     set M 8389;  # (8 * 2**$shift) / $n rounded up
     set mL [list 1049]; # $M/8, rounded up

     while 1 {
         set n [expr {$n*$n}]
         if {$n > $number} then {break}
         set shift [expr {$shift<<1}]
         # The following is a Newton-Raphson step, to restore the invariant
         # that $M is (8 * 2**$shift) / $n, rounded up.
         set M2 [expr {$M*$M}]
         set M [expr {-(( (($M2*$M2*$n) >> ($shift+6)) - ($M2<<1) ) >> 3)}]

         lappend nL $n
         lappend shiftL $shift
         lappend mL [expr {-((-$M)>>3)}] ; # Divide by 8, round up
     }

     return [
         Barrett_todigits_rec $number $nL $shiftL $mL\
           [expr {[llength $nL] - 1}] 1
     ]
}

proc Barrett_todigits_rec {number nL shiftL mL index left} {
     # left is 1 when we should not pad with zeroes.
     if {$index<0} then {
         return [format [lindex {%03d %d} $left] $number]
     }
     set q [expr {
         ($number*[lindex $mL $index]) >> [lindex $shiftL $index]
     }]
     set r [expr {$number - [lindex $nL $index]*$q}]
     if {$r<0} then {
         incr q -1
         incr r [lindex $nL $index]
     }
     incr index -1
     if {$left && $q==0} then {
         return [Barrett_todigits_rec $r $nL $shiftL $mL $index 1]
     } else {
         return "[
             Barrett_todigits_rec $q $nL $shiftL $mL $index $left
         ][
             Barrett_todigits_rec $r $nL $shiftL $mL $index 0
         ]"
     }
}

In C, it would probably be more natural to code it without recursion, with a 
preallocated vector of mp_ints to store the intermediate r values, and 
storing the digits directly into what will become the objPtr->bytes[] 
vector. The digits are even being produced left-to-right.

Lars Hellström

Re: [TCLCORE] tommath is slow?

The Tool Command Language implementation

Re: [TCLCORE] tommath is slow?