|
From: William H. <ha...@ya...> - 2009-02-27 17:27:50
|
I've done some more work optimising the bit and byte packing and unpacking code, and making sure the maximum number of bits/limbs per coefficient is not counted more than once anywhere. I think I've done all I can in terms of this general sort of optimisation now. Here is the result: http://sage.math.washington.edu/home/wbhart/poly-KS.png But there are still some red dots, and these are real. 1) They are not tuning issues - the FLINT 1.x tuning is identical at these points. 2) They are not timing irregularities - they show up in the same places every time the graph is drawn. 3) They do not represent especially fast points for Magma - the FLINT times go up dramatically at these points, sometimes even higher than other nearby problems which are larger! 4) All points are in the KS region where byte packing is done. Here are where some of the points lie: length bits 237 237 1020 284 5266 66 There's nothing special about any of these. They are not at any crossovers of any kind, bit, byte, limb, algorithm, etc. Note that at 237 bits there is a whole dark line across the graph. This is mystifying, as there is nothing special happening at 237 bits. The output coefficients in the 237/237 case are 2*237+8 = 482 bits. Again there just isn't anything special about that. Packing is done into fields of 61 bytes each. Each of the large integers is 115656 bits or 1808 limbs. That is just into the FFT region (the FFT starts at 1695 limbs). The only thing that should be relevant is the size of the input and output coefficients, as there is a whole dark line at 237 bits. Input coefficients fit easily into 4 limbs, output coefficients into 8 limbs. There's just nothing special about any of these things. We aren't even near any boundaries in either case. Bill. --- On Fri, 2/27/09, Bill Hart <goo...@go...> wrote: > From: Bill Hart <goo...@go...> > Subject: Re: [flint-devel] FLINT 2.0 Progress > To: "Development list for FLINT" <fli...@li...> > Date: Friday, February 27, 2009, 3:32 PM > I've now fixed the karatsuba multiplication by switching > to the > ordinary algorithm when the lengths are equal. > > http://sage.math.washington.edu/home/wbhart/poly-KS.png > > Next I will try to figure out where those remaining red > dots are > coming from. The whole graph is actually a slightly darker > ble than I > expect. I did fix one issue namely counting the maximum > number of bits > per coefficient outside the KS function and then again > inside it. I > now pass this data in. > > But I think the byte packing might not be optimal as there > is quite > dark blue in some parts of the KS region. > > Bill. > > 2009/2/21 William Hart <ha...@ya...>: > > I figured out what the red line is on the left hand > side of the multiplication graph: > > > > > http://sage.math.washington.edu/home/wbhart/poly-KS.png > > > > It occurs for polynomials of length 15, where David > Harvey's odd/even karatsuba is used. This algorithm is > great for unbalanced operands, as it uses far fewer > multiplies in general than ordinary karatsuba. > > > > Instead of dividing the polynomials into a first half > and a second half, it works by dividing into odd exponent > coefficients and even exponent coefficients. > > > > However when you work it through, you need the lengths > of the original polynomials (at every level of the > recursion) to be even. So David's implementation (which > I borrowed for FLINT 2.0) uses a trick called peeling to > make everything work. Basically at each level, if the length > of a polynomial is odd, one of the coefficients is ignored > and the karatsuba recursion is carried out on what is left > (which is of even length), then at the end a fixup is done > to handle the product of the ignored coefficient with the > other polynomial. > > > > The problem is obvious. 15 is odd, so 1 is subtracted > to make 14 which is 2*7. At the next level, 7 is odd, so 1 > is subtracted to make 6 which is 2*3. At the next level 3 is > odd, so 1 is subtracted to make 2 which is 2*1. > > > > So at every level, peeling occurs, i.e. 15 is a worst > case for this algorithm. > > > > One way to fix this issue is to code up ordinary > karatsuba for balanced operands. In that case the 15 x 15 > polynomial multiplication gets split into two 8 x 8 > multiplications and one 7 x 7. The 7 x 7 gets split into two > 4 x 4 multiplications and one 3 x 3. The 3 x 3 gets split > into two 2 x 2 and one 1 x 1 multiplication. > > > > With the odd/even method one has: > > > > 1 x 1 -> 1 mul > > 2 x 2 -> 3 mul > > 3 x 3 -> 8 mul > > 6 x 6 -> 24 mul > > 7 x 7 -> 37 mul > > 14 x 14 -> 111 mul > > 15 x 15 -> 140 mul > > > > With the ordinary method one has: > > > > 1 x 1 -> 1 mul > > 2 x 2 -> 3 mul > > 3 x 3 -> 7 mul > > 4 x 4 -> 9 mul > > 7 x 7 -> 25 mul > > 8 x 8 -> 27 mul > > 15 x 15 -> 79 mul > > > > nearly twice the speed for large coefficients. > > > > Bill. > > > > > > > > --- On Wed, 2/18/09, Bill Hart > <goo...@go...> wrote: > > > >> From: Bill Hart > <goo...@go...> > >> Subject: [flint-devel] FLINT 2.0 Progress > >> To: "Development list for FLINT" > <fli...@li...> > >> Date: Wednesday, February 18, 2009, 4:18 PM > >> The new F_mpz_poly module for FLINT 2.0 is > progressing well. > >> I've now > >> got the byte packing code working and coded up the > >> (non-truncating) > >> polynomial multiplication functions. Here is the > result of > >> a profile > >> comparing with Magma: > >> > >> > http://sage.math.washington.edu/home/wbhart/poly-KS.png > >> > >> The dark red on the left is probably an indication > that the > >> processor > >> clock speed is not exactly 2.8GHz on this machine > (or the > >> real time > >> clock is slightly off), and is not important. The > times > >> should be > >> identical with Magma until reaching the FFT > region, as both > >> FLINT and > >> Magma must use GMP and degree 0 polynomials are > just > >> integers. > >> > >> The red dots in the middle are probably real, with > the dots > >> at the > >> bottom of the middle probably tuning issues. The > ones on > >> the right of > >> the middle I am not sure about. They may just be > timing > >> irregularities > >> (which this machine has many of). > >> > >> The red vertical line near the left is probably > just a > >> tuning issue. > >> It's doubtlessly at the crossover between > karatsuba and > >> Schoenhage-Strassen, which I must have wrong. > I'm > >> surprised it is so > >> red though. > >> > >> There are also some red dots near the right of the > diagram. > >> Those are > >> probably significant. They lie in a region where > Kronecker > >> segmentation is used after byte packing the > coefficients of > >> the > >> polynomials into two large integers. > >> > >> Note that below those dots is lots of strong blue. > That is > >> the bit > >> packing region, and the problems we used to have > there are > >> completely > >> gone. They were due to cache issue related to the > fact that > >> Magma was > >> storing a coefficient in 32 bits, not 2 limbs as > the FLINT > >> 1.0.x and > >> 1.1.x series did. FLINT 2.0 uses just a single > limb per > >> coefficient, > >> and the nice blue there vindicates this change. > >> > >> But the red dots in the byte packing region are > almost > >> certainly due > >> to the fact that once one gets out of the region > where the > >> output > >> coefficients fit into a limb, one must switch to > >> mpz_t's (in the new > >> F_mpz_t format). That switching costs some cycles > for extra > >> comparisons and branches, and so for small > coefficients in > >> the byte > >> packing region, FLINT 2.0 is slower than earlier > versions > >> of FLINT. > >> I've worked hard on getting overheads down. At > one > >> point I tried an > >> optimisation which should have made a difference > (removing > >> some > >> unnecessary branches from the bit packing code). > However > >> this didn't > >> improve things at all. So I think it's > difficult to see > >> how I could > >> expect to remove those dots. They may go away by > fiddling > >> with the > >> tuning of the FFT itself. > >> > >> At any rate, I'm really happy with how things > have > >> gone. The next step > >> will be to work on the truncating multiplication > routines, > >> which > >> should be very simple in comparison with the > functions I > >> just coded. > >> Then I can move on to the division and GCD > functions, etc. > >> > >> Although I coded up David Harvey's KS2 > algorithm for > >> the bitpacked > >> region, I didn't include it in this profile > yet. You > >> wouldn't notice > >> its effect probably, as things are already so blue > at the > >> bottom of > >> the graph. But when I get around to it, I'll > put up > >> another graph with > >> that included, and some of these other tuning > issues > >> resolved. > >> > >> Bill. > >> > >> > ------------------------------------------------------------------------------ > >> Open Source Business Conference (OSBC), March > 24-25, 2009, > >> San Francisco, CA > >> -OSBC tackles the biggest issue in open source: > Open > >> Sourcing the Enterprise > >> -Strategies to boost innovation and cut costs with > open > >> source participation > >> -Receive a $600 discount off the registration fee > with the > >> source code: SFAD > >> http://p.sf.net/sfu/XcvMzF8H > >> _______________________________________________ > >> flint-devel mailing list > >> fli...@li... > >> > https://lists.sourceforge.net/lists/listinfo/flint-devel > > > > > > > > > > > ------------------------------------------------------------------------------ > > Open Source Business Conference (OSBC), March 24-25, > 2009, San Francisco, CA > > -OSBC tackles the biggest issue in open source: Open > Sourcing the Enterprise > > -Strategies to boost innovation and cut costs with > open source participation > > -Receive a $600 discount off the registration fee with > the source code: SFAD > > http://p.sf.net/sfu/XcvMzF8H > > _______________________________________________ > > flint-devel mailing list > > fli...@li... > > > https://lists.sourceforge.net/lists/listinfo/flint-devel > > > > ------------------------------------------------------------------------------ > Open Source Business Conference (OSBC), March 24-25, 2009, > San Francisco, CA > -OSBC tackles the biggest issue in open source: Open > Sourcing the Enterprise > -Strategies to boost innovation and cut costs with open > source participation > -Receive a $600 discount off the registration fee with the > source code: SFAD > http://p.sf.net/sfu/XcvMzF8H > _______________________________________________ > flint-devel mailing list > fli...@li... > https://lists.sourceforge.net/lists/listinfo/flint-devel |