Thread: [Fastlibnt-devel] ssfft progress report

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

I've been running some profiles of some of the new development ssfft  
code (in the ssfft3 branch) against the trunk version.

The results are a little weird. The target function is  
ssfft_fft_iterative(), which is designed to handle L1-sized  
transforms, using a plain iterative FFT. In particular it's intended  
for use with short coefficients where bitshift factoring is  
inappropriate, and small enough transforms that FFT factoring is not  
necessary. I've been running it for transform lengths M = 16, 32, 64,  
128, 256, 512, 1024, with a range of truncation parameters, and  
coefficient lengths n = 1, 2, 3, 4, 6, 8 limbs.

I've done profiles on sage (= sage.math), martinj (= jason martin's  
machine), bsd (= william stein's xeon), and my G5.

======== sage, martinj, bsd ========

For lengths >= 256, it is unconditonally faster than the old code, on  
the above platforms (modulo a few random data points). The speedups  
are typically:

sage: 5-15%
bsd: 15-25%
martinj: 15-25%

Some combinations get speedups of up to 40%. So this is great.

For lengths 64 and 128, there are a few problem areas, particularly  
for n = 3, although mostly it's still ahead of the old code.

Length 32 and below is really a mixed bag. I find this surprising.  
This code should work particularly well on small problems.

On the above platforms, apart from a few outliers, the new code was  
never worse than 10% slower than the old code.

======== G5 ========

On my powerpc g5 machine, things looked BAD. The new code is  
typically 15-20% SLOWER than the old code. It's sometimes as much as  
10% faster, but usually it's slower.

I have absolutely no idea why this is happening.

david

Thread: [Fastlibnt-devel] ssfft progress report

flint-devel