|
From: Bill H. <goo...@go...> - 2009-02-18 05:18:49
|
The new F_mpz_poly module for FLINT 2.0 is progressing well. I've now got the byte packing code working and coded up the (non-truncating) polynomial multiplication functions. Here is the result of a profile comparing with Magma: http://sage.math.washington.edu/home/wbhart/poly-KS.png The dark red on the left is probably an indication that the processor clock speed is not exactly 2.8GHz on this machine (or the real time clock is slightly off), and is not important. The times should be identical with Magma until reaching the FFT region, as both FLINT and Magma must use GMP and degree 0 polynomials are just integers. The red dots in the middle are probably real, with the dots at the bottom of the middle probably tuning issues. The ones on the right of the middle I am not sure about. They may just be timing irregularities (which this machine has many of). The red vertical line near the left is probably just a tuning issue. It's doubtlessly at the crossover between karatsuba and Schoenhage-Strassen, which I must have wrong. I'm surprised it is so red though. There are also some red dots near the right of the diagram. Those are probably significant. They lie in a region where Kronecker segmentation is used after byte packing the coefficients of the polynomials into two large integers. Note that below those dots is lots of strong blue. That is the bit packing region, and the problems we used to have there are completely gone. They were due to cache issue related to the fact that Magma was storing a coefficient in 32 bits, not 2 limbs as the FLINT 1.0.x and 1.1.x series did. FLINT 2.0 uses just a single limb per coefficient, and the nice blue there vindicates this change. But the red dots in the byte packing region are almost certainly due to the fact that once one gets out of the region where the output coefficients fit into a limb, one must switch to mpz_t's (in the new F_mpz_t format). That switching costs some cycles for extra comparisons and branches, and so for small coefficients in the byte packing region, FLINT 2.0 is slower than earlier versions of FLINT. I've worked hard on getting overheads down. At one point I tried an optimisation which should have made a difference (removing some unnecessary branches from the bit packing code). However this didn't improve things at all. So I think it's difficult to see how I could expect to remove those dots. They may go away by fiddling with the tuning of the FFT itself. At any rate, I'm really happy with how things have gone. The next step will be to work on the truncating multiplication routines, which should be very simple in comparison with the functions I just coded. Then I can move on to the division and GCD functions, etc. Although I coded up David Harvey's KS2 algorithm for the bitpacked region, I didn't include it in this profile yet. You wouldn't notice its effect probably, as things are already so blue at the bottom of the graph. But when I get around to it, I'll put up another graph with that included, and some of these other tuning issues resolved. Bill. |