|
From: William H. <ha...@ya...> - 2009-02-21 17:32:56
|
I figured out what the red line is on the left hand side of the multiplication graph: http://sage.math.washington.edu/home/wbhart/poly-KS.png It occurs for polynomials of length 15, where David Harvey's odd/even karatsuba is used. This algorithm is great for unbalanced operands, as it uses far fewer multiplies in general than ordinary karatsuba. Instead of dividing the polynomials into a first half and a second half, it works by dividing into odd exponent coefficients and even exponent coefficients. However when you work it through, you need the lengths of the original polynomials (at every level of the recursion) to be even. So David's implementation (which I borrowed for FLINT 2.0) uses a trick called peeling to make everything work. Basically at each level, if the length of a polynomial is odd, one of the coefficients is ignored and the karatsuba recursion is carried out on what is left (which is of even length), then at the end a fixup is done to handle the product of the ignored coefficient with the other polynomial. The problem is obvious. 15 is odd, so 1 is subtracted to make 14 which is 2*7. At the next level, 7 is odd, so 1 is subtracted to make 6 which is 2*3. At the next level 3 is odd, so 1 is subtracted to make 2 which is 2*1. So at every level, peeling occurs, i.e. 15 is a worst case for this algorithm. One way to fix this issue is to code up ordinary karatsuba for balanced operands. In that case the 15 x 15 polynomial multiplication gets split into two 8 x 8 multiplications and one 7 x 7. The 7 x 7 gets split into two 4 x 4 multiplications and one 3 x 3. The 3 x 3 gets split into two 2 x 2 and one 1 x 1 multiplication. With the odd/even method one has: 1 x 1 -> 1 mul 2 x 2 -> 3 mul 3 x 3 -> 8 mul 6 x 6 -> 24 mul 7 x 7 -> 37 mul 14 x 14 -> 111 mul 15 x 15 -> 140 mul With the ordinary method one has: 1 x 1 -> 1 mul 2 x 2 -> 3 mul 3 x 3 -> 7 mul 4 x 4 -> 9 mul 7 x 7 -> 25 mul 8 x 8 -> 27 mul 15 x 15 -> 79 mul nearly twice the speed for large coefficients. Bill. --- On Wed, 2/18/09, Bill Hart <goo...@go...> wrote: > From: Bill Hart <goo...@go...> > Subject: [flint-devel] FLINT 2.0 Progress > To: "Development list for FLINT" <fli...@li...> > Date: Wednesday, February 18, 2009, 4:18 PM > The new F_mpz_poly module for FLINT 2.0 is progressing well. > I've now > got the byte packing code working and coded up the > (non-truncating) > polynomial multiplication functions. Here is the result of > a profile > comparing with Magma: > > http://sage.math.washington.edu/home/wbhart/poly-KS.png > > The dark red on the left is probably an indication that the > processor > clock speed is not exactly 2.8GHz on this machine (or the > real time > clock is slightly off), and is not important. The times > should be > identical with Magma until reaching the FFT region, as both > FLINT and > Magma must use GMP and degree 0 polynomials are just > integers. > > The red dots in the middle are probably real, with the dots > at the > bottom of the middle probably tuning issues. The ones on > the right of > the middle I am not sure about. They may just be timing > irregularities > (which this machine has many of). > > The red vertical line near the left is probably just a > tuning issue. > It's doubtlessly at the crossover between karatsuba and > Schoenhage-Strassen, which I must have wrong. I'm > surprised it is so > red though. > > There are also some red dots near the right of the diagram. > Those are > probably significant. They lie in a region where Kronecker > segmentation is used after byte packing the coefficients of > the > polynomials into two large integers. > > Note that below those dots is lots of strong blue. That is > the bit > packing region, and the problems we used to have there are > completely > gone. They were due to cache issue related to the fact that > Magma was > storing a coefficient in 32 bits, not 2 limbs as the FLINT > 1.0.x and > 1.1.x series did. FLINT 2.0 uses just a single limb per > coefficient, > and the nice blue there vindicates this change. > > But the red dots in the byte packing region are almost > certainly due > to the fact that once one gets out of the region where the > output > coefficients fit into a limb, one must switch to > mpz_t's (in the new > F_mpz_t format). That switching costs some cycles for extra > comparisons and branches, and so for small coefficients in > the byte > packing region, FLINT 2.0 is slower than earlier versions > of FLINT. > I've worked hard on getting overheads down. At one > point I tried an > optimisation which should have made a difference (removing > some > unnecessary branches from the bit packing code). However > this didn't > improve things at all. So I think it's difficult to see > how I could > expect to remove those dots. They may go away by fiddling > with the > tuning of the FFT itself. > > At any rate, I'm really happy with how things have > gone. The next step > will be to work on the truncating multiplication routines, > which > should be very simple in comparison with the functions I > just coded. > Then I can move on to the division and GCD functions, etc. > > Although I coded up David Harvey's KS2 algorithm for > the bitpacked > region, I didn't include it in this profile yet. You > wouldn't notice > its effect probably, as things are already so blue at the > bottom of > the graph. But when I get around to it, I'll put up > another graph with > that included, and some of these other tuning issues > resolved. > > Bill. > > ------------------------------------------------------------------------------ > Open Source Business Conference (OSBC), March 24-25, 2009, > San Francisco, CA > -OSBC tackles the biggest issue in open source: Open > Sourcing the Enterprise > -Strategies to boost innovation and cut costs with open > source participation > -Receive a $600 discount off the registration fee with the > source code: SFAD > http://p.sf.net/sfu/XcvMzF8H > _______________________________________________ > flint-devel mailing list > fli...@li... > https://lists.sourceforge.net/lists/listinfo/flint-devel |