|
From: Development l. f. F. <fas...@li...> - 2007-04-17 17:31:48
|
I've now also run the profiles on my old G3 laptop. It really screams........ typically 30-40% faster than the old code; and there are a number of regions where it's consistently 60-100% faster (i.e. 1.6 - 2.0x faster). There don't appear to be any regions at all where it's consistently slower. david On Apr 16, 2007, at 11:41 PM, David Harvey wrote: > I've been running some profiles of some of the new development ssfft > code (in the ssfft3 branch) against the trunk version. > > It's generally looking pretty good. I've done profiles on sage (= > sage.math), martinj (= jason martin's machine), bsd (= william stein's > xeon), and my G5 (haven't got results for that yet). The target > function is ssfft_fft_iterative(), which is designed to handle > L1-sized transforms, using a plain iterative FFT. In particular it's > intended for use with short coefficients where bitshift factoring is > inappropriate, and small enough transforms that FFT factoring is not > necessary. I've been running it for transform lengths M = 16, 32, 64, > 128, 256, 512, 1024, with a range of truncation parameters, and > coefficient lengths n = 1, 2, 3, 4, 6, 8 limbs. > > For lengths >= 256, it is unconditonally faster than the old code, on > all platforms (modulo a few random data points). The speedups are > typically: > > sage: 5-15% > bsd: 15-25% > martinj: 15-25% > > Some combinations get speedups of up to 40%. > > For lengths 64 and 128, there are a few problem areas, particularly > for n = 3 on all platforms, although mostly it's still ahead of the > old code. > > Length 32 and below is really a mixed bag. I find this surprising. > This code should work particularly well on small problems. > > On all platforms, apart from a few outliers, the new code was never > worse than 10% slower than the old code. > > I still have some investigation to do to figure out what's going on in > the slower regions. But generally I'm pretty happy so far. > > david > |