|
From: Bill H. <goo...@go...> - 2009-06-29 15:32:22
|
I found another trick to save some time at the top level of the half gcd: the matrix does not need to be computed (this doesn't apply to xgcd, only to gcd). Here is the new graph: http://sage.math.washington.edu/home/wbhart/zpgcd7.png I also coded up asymptotically fast half gcd style xgcd. Despite having done this already for MPIR, it took me ages to figure out because of subtle differences in the half gcd algorithm used in FLINT. Now I need to think about normalisation. I think the only normalisation I am going to do is to make the gcd monic. This helps when using it for multimodular stuff in the Z[x] module. I'll probably also do the same for the xgcd code Bill. 2009/6/28 Bill Hart <goo...@go...>: > I think the remaining red dots on the left were simply due to poor > compilation. I wound back the optimisation to -O2 and statically > linked the libraries and there was a noticeable speedup. > > Here is the new graph: > > http://sage.math.washington.edu/home/wbhart/zpgcd6.png > > The red dots on the second line up on the right are probably due to > the modulus being 2 at random. Magma has exceptionally fast GF2 code, > and for polynomials that large, only a single timing may be done. If > the modulus happens to be 2 for Magma and 3 for FLINT at that point, > then magma will win hands down. > > One improvement I could make is to improve cache performance by doing > some of the half gcd iteration in-place. But I think this would make a > very small difference so I don't know if I can be bothered with it or > not. > > Bill. > > 2009/6/27 William Hart <ha...@ya...>: >> >> I found a way of speeding up the half gcd. >> >> Basically when one performs the half gcd step, one normally produces a matrix which keeps track of the steps undertaken so far when dealing with the top half of the coefficients. Later one then applies this matrix to the _full_ original polynomials. >> >> However, one can save part of this polynomial by matrix computation by making use of some of the information computed along the way when computing the matrix in the first place. >> >> This trick probably saves about a quarter of the time. >> >> I also managed to figure out what the red dots were on the right hand side of the graph. These were due to magma having an unfair advantage in the timing code. Basically I was choosing a single modulus for each point on the graph in magma. However for FLINT I was changing the modulus every so often. What was happening is that occasionally Magma would get a really small modulus at random and FLINT would have times averaged over a whole pile of moduli. Fixing this made the red dots go away mostly. >> >> Here is the new graph: >> >> http://sage.math.washington.edu/home/wbhart/zpgcd5.png >> >> There are still some red dots on the left of the diagram. I've no idea what these are. I checked the Magma timings and indeed Magma has exceptionally fast times in that region, the times going up immediately above and below those points on the graph!! I did check that Magma seems to return the correct results at these points, so it is not some kind of bug I don't believe. >> >> There is also a slight darkening of the blue as the diagram goes across to the right. But this is to be expected as most of the time complexity is in the multiplication code, which exhibits a similar darkening to the right. >> >> It is still the case that Magma wins hands down for modulus = 2 and quite often for modulus = 3. >> >> Bill. >> >> --- On Thu, 6/25/09, Bill Hart <goo...@go...> wrote: >> >>> From: Bill Hart <goo...@go...> >>> Subject: [flint-devel] Polynomial GCD over Z/pZ >>> To: "Development list for FLINT" <fli...@li...> >>> Date: Thursday, June 25, 2009, 10:25 AM >>> This week I've been working on the >>> speed of polynomial GCD over Z/pZ. >>> After quite a bit of work, here is a before an after >>> comparison (each >>> graph compares FLINT with Magma): >>> >>> Before: >>> >>> http://sage.math.washington.edu/home/wbhart/zpgcd.png >>> >>> After: >>> >>> http://sage.math.washington.edu/home/wbhart/zpgcd3.png >>> >>> Analysis: >>> ------------- >>> >>> * Timings were done on an AMD K8 Opteron with a very recent >>> Magma >>> >>> * Note the problem is GCD(a*b, a*c) where a, b, and c all >>> have a >>> modulus with the given number of bits and the given length >>> (axes are >>> logarithmic) >>> >>> * There are only two algorithms used: Euclidean algorithm >>> and half-gcd >>> algorithm. For moduli over 8 bits the cutoff is length 168, >>> and for 8 >>> or fewer bits the cutoff is 246, however, note that these >>> cutoffs >>> occur on the graph at half those lengths due to the fact >>> that we are >>> taking GCD(a*b, a*c) where a, b and c all have the given >>> length, i.e. >>> a*b and a*c have about twice the given length. >>> >>> * The main improvements in the last week were: >>> >>> -special code to deal with the GF2 case >>> >>> - special code to deal with division of >>> equal length polynomials >>> and polynomials whose lengths differ by 1 (an idea given to >>> me by >>> Bernard Parisse). I have implemented both rem and divrem >>> functions in >>> both cases and use them in appropriate places in both the >>> euclidean >>> and half-gcd cases. >>> >>> * The red line along the bottom is almost certainly because >>> Magma has >>> special code for GF2 (and apparently GF3). The bottom line >>> is modulus >>> = nextprime(random(1 bit)) = 2 always, the next line up is >>> modulus = >>> nextprime(random(2 bits)) = 2 or 3. I would have to merge >>> GF2X to beat >>> Magma here (I already have special code for the GF2 case, >>> but the >>> format in which the polynomials is stored is simply 1 limb >>> per >>> coefficient). >>> >>> * I have no idea why there is so much red on the right side >>> of the >>> diagram. This is in the half-gcd region. The asymptotics of >>> the >>> half-gcd are O(n log n^2 log log n). The only other things >>> used are: >>> >>> - multiplication : see >>> http://sage.math.washington.edu/home/wbhart/zpmul.png >>> (no real >>> problems there, courtesy of zn_poly) >>> >>> - division : see >>> http://sage.math.washington.edu/home/wbhart/zpdiv.png >>> (definitely no >>> problems there, we also use some zn_poly code in the >>> division code) >>> >>> - strassen multiplication, but only for 2x2 matrices >>> (I use it) >>> >>> - the half-gcd algorithm itself, which doesn't yield >>> too much >>> opportunity for optimisation that I know of >>> >>> So it is mystifying how Magma can be winning here. Any >>> ideas? >>> >>> * I have no idea why there is a brown blob at about >>> log_length = 6. >>> Overheads? I doesn't appear to be a tuning issue. However I >>> don't know >>> of any other algorithms for this problem. >>> >>> * The dark blue patch to the left of the diagram is just >>> due to >>> overheads. I spent a couple of days just working on that >>> patch. >>> >>> * The red dots at about log_length = 11 are just timing >>> irregularities, they do not occur on every graph. >>> >>> * Above 30 bits we always win, as Magma uses multiprecision >>> code in >>> this region. FLINT goes all the way out to moduli of 63 >>> bits on 64 bit >>> machines. >>> >>> Bill. >>> >>> ------------------------------------------------------------------------------ >>> _______________________________________________ >>> flint-devel mailing list >>> fli...@li... >>> https://lists.sourceforge.net/lists/listinfo/flint-devel >>> >> >> >> >> >> ------------------------------------------------------------------------------ >> _______________________________________________ >> flint-devel mailing list >> fli...@li... >> https://lists.sourceforge.net/lists/listinfo/flint-devel >> > |