Re: [Linuxsampler-devel] GUI screenshot , Linuxsampler status,thoughts etc (long)

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Benno Senoner wrote:

>Interesting thoughts Simon, but I am still unsure which approach wins
>in terms of speed.
>
I think you misunderstood what I was suggesting. I'm well aware that
you mustn't generate one giant loop with a cache footprint bigger than
the actual cache!

But its also very inefficient to generate lots of tiny little loops and move
data continually from one to the next via buffers when a single loop
could have achieved the same computation and still fit in the cache.
(Its not the slight overhead of the extra looping that matters. Its that
any intermediate values which must cross loop boundaries are forced
out of registers and into memory buffers).

The trick is to generate loops which are just about the right size:
Definitely not too big for the cache but, at the same, not needlessly
fragmented into sub-loops that are too small.

IMO: The code engine for a single voice is probably just about the right
size for a loop, and things would run a lot faster if the code was generated
blocklessly *within that loop* than if it was generated as a lot of tiny
sub-loops leaving buffers of data in RAM for each other. The fact that a
voice is designed by connecting little modules with wires doesn't mean
that the compiled code must connect little loops with buffers! Giving each
envelope generator, each filter, each LFO its own individual loop won't
speed things up... it will slow them down.

(At higher levels of granularity than a single voice, eg your "200 voices"
example, everything should of course be processed in blocks).

BTW I'm not making this stuff up out of thin air... there's a fairly
thorough proof of concept demo at

http://www.sbibble.pwp.blueyonder.co.uk/amble/amble-0.1.1.tar.gz

Its not a true compiler unfortunately: It generates C source code by
pasting together code fragments, each representing a module, into
a single internally blockless (but externally block-processing)
function.

I/O is transferred via buffers but internal connections between
modules are modelled by local variables, and many of these get
optimised away by the C compiler becoming temps in registers
as I have been describing.

Not only does this work, but it delivers the performance advantages
I've been talking about. Its not as good as a true compiler could be,
but with a bit of work it could actually be hacked into a quick and
dirty code generator for LinuxSampler while we wait for the real
compiler to arrive. It really could.

Simon Jenkins
(Bristol, UK)