From: Jonathan C. R. <jon...@ba...> - 2004-04-27 15:38:42
|
On Apr 26, 2004, at 3:09 AM, Benjamin Herrenschmidt wrote: > > Well, there are 2 things. One is to allocate the memory in the > SRAM, and the other is to call routines with the stack there. Normally, > Linus has one stack per process/thread, so you need to chose carefuly > where/when to switch stack there. Then, what I'd do is to create a > small asm trampoline that switches the stack, store the old stack > pointer > on the new stack (for the exit path), and jump to the routine that has > to be executed on the internal stack. You could do that on a thread > entry point for a whole thread to use that SDRAM stack for example. > > Ben > Hi, I've given myself a crash-course in ARM assembly and performed a 'stack trampoline' pretty much the way you suggest. I _think_ there is some performance improvement, but It's very difficult to tell without a proper profiler. In summary, I have performed the following optimisations in libmad: 1. I have moved all dynamically allocated structures to the SRAM; 2. I have moved some static data from the sampling routines too (the 'D' structure from D.dat) 3. I have moved the stack to 0x40017ffc Guessing from audio quality and tempi, this has given me a 10-20% performance boost. I will attempt two more things now - first I'll try _moving_ the sampling code (judging by profiling on an i386 this is the bottleneck) to the SRAM, along with some of the Huffmann and layer 3 static data. If that fails, I'll try a hack where I write directly to the COP's audio buffer (instead of copying first to slow memory, converting to little-endian DSP, then writing back to SRAM). (We'll be doing this in the kernel version anyway.) As a last resort I will turn to hand-coding the dct/sample code - looking at the assembly output on sample.c, there is a lot of work to be done there... Have you got any pointers for reallocating code? As I said previously, arm-elf-ld didn't like me trying to tell it where to put stuff. Should I try a kernel-style relocation? How would you guys do it? As Bern pointed out, we really need profiling... has anybody gotten -pg to work with the arm-elf toolchain? Cheers, Jonathan. |