|
From: David H. <dmh...@ma...> - 2007-04-17 18:19:51
|
On Apr 17, 2007, at 1:57 PM, Development list for FLINT wrote: > I just had a look at your code and one big performance > hit is the lack of static inline functions. I have some "inline" functions which are not "static inline". What is the difference between "static inline" and just "inline". I thought it was only a linkage issue. What does plain "inline" do? Of the remaining functions, I think the only candidates for inlining, which are not already marked inline, are the ones starting with "coeff". Some of these look a bit long to be inlined (code bloat), but I agree I should try inlining some of the shorter ones. But this doesn't answer the basic question about the slowness of the code on the G5. Perhaps I need to explain what's going on with the new code, so you can see why I am perplexed. The ssfft code has basically three layers. The bottom layer is the functions starting with "basic". These are very low level coefficient operations on raw blocks of memory, like rotations, and bitshifts with carry handling. The middle layer are the functions starting with "coeff". These are allowed to do things like swap buffers, they make decisions about how to decompose large rotations into bitshifts and limbshifts etc. Finally the top layer consists of functions that call the coefficient operations in some appropriate order to carry out FFTs. Now the bottom and middle layers have NOT changed between the trunk version and my new version. I am only fiddling with the top layer for this new code. (There is one minor change I want to make to some middle layer code at some point, but I haven't got to that yet.) In particular the old code has just as much inlining going on as the new code. In fact I would argue the new code is *better* inlined, for the following reason. The old code used a table lookup to decide which of the 16 variants of the radix-4 transform to call on each block. So it was using function pointers all over the place. Surely function pointers are the arch-nemesis of inlining. The new code being profiled is basically all in one function: ssfft_fft_iterative(). It doesn't call any other FFT functions, it calls directly into the middle and bottom layers to do everything. There should be much less function call overhead than before. david |