|
From: Simon J. <sje...@bl...> - 2003-08-06 23:49:31
|
Benno Senoner wrote: >Interesting thoughts Simon, but I am still unsure which approach wins >in terms of speed. > I think you misunderstood what I was suggesting. I'm well aware that you mustn't generate one giant loop with a cache footprint bigger than the actual cache! But its also very inefficient to generate lots of tiny little loops and move data continually from one to the next via buffers when a single loop could have achieved the same computation and still fit in the cache. (Its not the slight overhead of the extra looping that matters. Its that any intermediate values which must cross loop boundaries are forced out of registers and into memory buffers). The trick is to generate loops which are just about the right size: Definitely not too big for the cache but, at the same, not needlessly fragmented into sub-loops that are too small. IMO: The code engine for a single voice is probably just about the right size for a loop, and things would run a lot faster if the code was generated blocklessly *within that loop* than if it was generated as a lot of tiny sub-loops leaving buffers of data in RAM for each other. The fact that a voice is designed by connecting little modules with wires doesn't mean that the compiled code must connect little loops with buffers! Giving each envelope generator, each filter, each LFO its own individual loop won't speed things up... it will slow them down. (At higher levels of granularity than a single voice, eg your "200 voices" example, everything should of course be processed in blocks). BTW I'm not making this stuff up out of thin air... there's a fairly thorough proof of concept demo at http://www.sbibble.pwp.blueyonder.co.uk/amble/amble-0.1.1.tar.gz Its not a true compiler unfortunately: It generates C source code by pasting together code fragments, each representing a module, into a single internally blockless (but externally block-processing) function. I/O is transferred via buffers but internal connections between modules are modelled by local variables, and many of these get optimised away by the C compiler becoming temps in registers as I have been describing. Not only does this work, but it delivers the performance advantages I've been talking about. Its not as good as a true compiler could be, but with a bit of work it could actually be hacked into a quick and dirty code generator for LinuxSampler while we wait for the real compiler to arrive. It really could. Simon Jenkins (Bristol, UK) |