|
From: David H. <dmh...@ma...> - 2007-04-18 02:38:13
|
On Apr 17, 2007, at 10:10 PM, William Hart wrote: > I just had a look at ssfft_fft_iterative() and I am > wondering if you know which parts of it are taking > longer than the old version. Have you profiled the > individual parts to see which is taking all the time? No, unfortunately the question doesn't really make any sense since the strategy is completely different. I can compare the whole thing, but there aren't any parts to match up and compare separately. I'm halfway through another profile with all the compiler switches you suggested. Initial results look quite promising. We'll know more tomorrow. > The innermost double for loops worry me. The compiler > cannot do much about unrolling these loops, since it [...] These all sound like good ideas. I will add a note to myself in ssfft.c to come back to this email and think about them later on. For now I want to concentrate on getting some of the other functions written. The most interesting new idea (the bitshift factoring trick) will hopefully be deployed soon, and I'm feeling lucky :-) Only one comment I have for the moment: > Definitely you should replace things like: > > &x[start+half+i], &x[start+i] > > with statements: > > mp_limb_t ** x_start = x + start; > mp_limb_t ** xtart_half = x + start + half; > > outside the innermost for loop and: > > x_start_half+i, xstart+i > > inside the for loop. This could be a holdup for a G5 > machine. Probably the compiler sees through your &x[i] > notation, which should just be x+i, but loading start > every iteration probably incurs some time. In my heart of hearts I agree with this advice. It makes 100% perfect sense. HOWEVER.... I tried this kind of thing many times on sage.math, with various kinds of code (small prime fft, matrix transpose, etc), and every single time I tried it, the compiler laughed in my face and the code got slow. I never understood why. I still don't. So I have to admit I'm wary. david |