Re: [Fastlibnt-devel] ssfft progress report

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Apr 17, 2007, at 10:10 PM, William Hart wrote:

> I just had a look at ssfft_fft_iterative() and I am
> wondering if you know which parts of it are taking
> longer than the old version. Have you profiled the
> individual parts to see which is taking all the time?

No, unfortunately the question doesn't really make any sense since the 
strategy is completely different. I can compare the whole thing, but 
there aren't any parts to match up and compare separately.

I'm halfway through another profile with all the compiler switches you 
suggested. Initial results look quite promising. We'll know more 
tomorrow.

> The innermost double for loops worry me. The compiler
> cannot do much about unrolling these loops, since it

[...]

These all sound like good ideas. I will add a note to myself in ssfft.c 
to come back to this email and think about them later on. For now I 
want to concentrate on getting some of the other functions written. The 
most interesting new idea (the bitshift factoring trick) will hopefully 
be deployed soon, and I'm feeling lucky :-)

Only one comment I have for the moment:

> Definitely you should replace things like:
>
> &x[start+half+i], &x[start+i]
>
> with statements:
>
> mp_limb_t ** x_start = x + start;
> mp_limb_t ** xtart_half = x + start + half;
>
> outside the innermost for loop and:
>
> x_start_half+i, xstart+i
>
> inside the for loop. This could be a holdup for a G5
> machine. Probably the compiler sees through your &x[i]
> notation, which should just be x+i, but loading start
> every iteration probably incurs some time.

In my heart of hearts I agree with this advice. It makes 100% perfect 
sense. HOWEVER.... I tried this kind of thing many times on sage.math, 
with various kinds of code (small prime fft, matrix transpose, etc), 
and every single time I tried it, the compiler laughed in my face and 
the code got slow. I never understood why. I still don't. So I have to 
admit I'm wary.

david