Re: [Linuxsampler-devel] sample accurate events and scheduling events in the future

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Saturday 09 August 2003 19.01, Benno Senoner wrote:
[...]
> > > The other philosopy is not to adopt the "everything is a CV",
> > > use typed ports and use time stamped scheduled events.
> >
> > Another way of thinking about it is to view streams of control
> > ramp events as "structured audio data". It allows for various
> > optimizations for low bandwidth data, but still doesn't have any
> > absolute bandwidth limit apart from the audio sample rate. (And
> > not even that, if you use a different unit for event timestamps -
> > but that's probably not quite as easy as it may sound.)
>
> So at this time for linuxsampler would you advocate an event
> based approach or continuous control stream (that runs at a
> fraction of the samplerate) ?

I don't like "control rate streams" and the like very much at all.=20
They're only slightly easier to deal with than timestamped events,=20
while they restrict timing accuracy in a way that's not musically=20
logical. They get easier to deal with if you fix the C-rate at=20
compile time, but then you have a quantization that may differ=20
between installed systems - annoying if you want to move=20
sounds/projects/whatever around.

I've had enough trouble with h/w synths that have these kind of=20
restrictions (MCUs and/or IRQ driven EGs + LFOs and whatnot) that I=20
don't want to see anything like it in any serious synth or sampler=20
again. Envelopes and stuff *have* to be sample accurate for serious=20
sound programming. (That is, anything beyond applying slow filter=20
sweeps to samples. Worse than sample accurate timing essentially=20
rules out virtual analog synthesis, at least when it comes to=20
percussive sounds.)

> As far as I understood it from reading your mail it seems that you
> agree that on current machines (see filters that need to
> recalculate coefficients etc) it makes sense to use an event based
> system.

Yes, that seems to be the best compromize at the moment. Maybe that=20
will change some time, but I'm not sure. *If* audio rate controls and=20
blockless processing become state-of-the-art for native processing,=20
it's only because CPUs are so fast that the cost is acceptable in=20
relation to the gains. (Audio rate control data everywhere and=20
slightly simpler DSP code, I guess...)

[...]
> > Now, that is apparently impossible to implement that on some
> > platforms (poor RT scheduling), but some people using broken OSes
> > is no argument for broken API designs, IMNSHO...
>
> Ok but even if there is a jitter of a few samples it is much better
> than having an event jitter equivalent to the audio fragment size.

Sure - but it seems that on Windoze, the higher priority MIDI thread=20
will pretty much be quantized to audio block granularity by the=20
scheduler, so the results are no better than polling MIDI once per=20
audio block.

Anyway, that's not our problem - and it's not an issue with h/w=20
timestamping MIDI interfaces, since they do the job properly=20
regardless of OS RT performance.

> It will be impossible to the user to notice that the midi pitchbend
> event was schedule a few usecs too late compared to the ideal time.
> Plus as said it will work with relatively large audio fragsizes
> too.

Yes. Or on Win32; maybe. ;-) I don't have any first hand experience=20
with pure Win32. The last time I messed with it, it seemed to "sort=20
of" work as intended, but that was on Win95 with Win16 code running=20
in IRQ context, and that's a different story entirely. You can't do=20
that on Win32 without some serious Deep Hack Mode sessions, AFAIK.

Anyway, who cares!? ;-) It works on Linux, QNX, BeOS, Mac OS X, Irix=20
and probably some other platforms.

The only reason I brought it up is that Win32 failing to support this=20
kind of timestamped real time input was seriously suggested as an=20
argument to drop timestamped events in GMPI. Of course, that was=20
overruled, as the kind of plugins GMPI is meant for are usually=20
driven from the host's integrated sequencer anyway, and there's no=20
point in having two event systems; one with timestamps and one=20
without.

> > > With the streamed approach we would need some scheduling of
> > > MIDI events too thus we would probably need to create a module
> > > that waits N samples (control samples) and then emits the
> > > event. So basically we end up in a timestamped event scenario
> > > too.
> >
> > Or the usual approach; MIDI is processed once per block and
> > quantized to block boundaries...
>
> I don't like that , it might work ok for very small fragsizes eg
> 32-64 samples / block but if you go up to let's say 512 - 1024
> timing of MIDI events will suck badly.

Right. I don't like it either, but suggested it only for completeness.=20
(See above, on Win32 etc; they do this for RT MIDI input, because it=20
hardly gets any better anyway.)

> > > Not sure we gain in performance compared to the block based
> > > processing where we apply all operations sequentially on a
> > > buffer (filters, envelopes etc) like they were ladspa modules
> > > but without calling external modules but instead "pasting"
> > > their sources in sequence without function calls etc.
> >
> > I suspect it is *always* a performance loss, except in a few
> > special cases and/or with very small nets and a good optimizing
> > "compiler".
>
> So it seems that the best compromise is to process audio in blocks
> but to perform all dsp operations relative to an output sample
> (relative to a voice) in one rush.

Sometimes... I think this depends on the architecture (# of registers,=20
cache behavior etc) and on the algorithm. For most simple algorithms,=20
it's the memory access pattern that determines how things should be=20
done.

[...]
> it would be faster do do
>
> for(i=3D0;i<256;i++) {
> output_sample[i]=3Ddo_filter(do_amplitude_modulation(input_sample[i])
>);
> }
>
> right ?

Maybe - if you still get a properly optimized inner loop that way. Two=20
optimized inner loops with a hot buffer in between is probably faster=20
than one that isn't properly optimized.

> While for pure audio processing the approach is quite
> straightforward, when we take envelope generators etc into account
> we must inject code that checks the current timestamp (an if()
> statement and then modifies the right values accordingly to the
> events pending in the queue (or autogenerated events).

It shouldn't make much of a difference if the "integration" is done=20
properly. In Audiality, I just wrap the DSP loop inside the event=20
decoding loop (the normal, efficient and obvious way of handling=20
timestamped events), so there's only event checking overhead per=20
event queue (normally one per "unit") and per block - not per sample.=20
Code compiled from multiple plugins should be structured the same=20
way, and then it doesn't matter if some of these plugin do various=20
forms of processing that doesn't fit the "once-per-sample" model.

[...]
> sample_ptr_and_len: pointer to a sample stored in RAM with
> associated len
>
>
> attack looping: a list of looping points:
> (position_to_jump, loop_len, number_of_repeats)
>
> release looping: same structure as above but it is used when
> the sampler module goes into release phase.
> Basically when you release a key if the sample has loops after
> the current loop comes to the end pos you switch to the
> release_looping list.

This is where it goes wrong, I think. These things are not musical=20
events, but rather "self triggered" chains of events, closely related=20
to the audio data and the inner workings of the sampler. You can't=20
really control this from a sequencer in a useful way, because it=20
requires sub-sample accurate timing as well as intimate knowledge of=20
how the sampler handles pitch bend, and how accurate it's=20
pitch/position tracking is. You'll most likely get small glitches=20
everywhere for no obvious reason if you try this approach. Lots of=20
fun! ;-)

Of course, you'll need some nice way of passing this stuff as=20
*parameters" to the sampler - but that goes for the audio data as=20
well. If it can't be represented as plain values and text controls in=20
any useful way, it's better handled as "raw data" controls; ie just=20
blocks of private data. (Somewhat like MIDI SysEx dumps.)

[...]
> So I'd be interested how the RAM sampler module described above
> could be made to work with only the RAMP event.

You just handle the private stuff as "raw data", and there's no=20
problem. :-) Controls are meant for things that can actually be=20
driven by a sequencer, or some other event generator or processor.

Also note that a control event is just a command that tells the plugin=20
to change a control to a specified value at a specified time. It's=20
not to be viewed as an object passed to the plugin, to be added to=20
some list or anything like that.

> BTW: you said RAMP with value =3D 0 means set a value.

No, RAMP (<value>, <duration>), where <duration> is 0. The value and=20
timestamp fields are always valid.

> But what do you set exactly to 0 , the timestamp ?

The 'duration' field. (Which BTW, is completely ignored by controls=20
that support only SET operations.)

> this would not be ideal since 0 is a legitimate value.
> It would be better to use -1 or something alike.

No, -1 (or rather, whatever it is in unsigned format) is a perfectly=20
valid timestamp as well, at least in Audiality. It uses 16 bit=20
timestamps that wrap pretty frequently, which is just fine, since=20
events are only for the current buffer anyway.

> OTOH this would require an additional if() statement
> (to check it it is a regular ramp or a set statement) and it could
> possibly slow down things a bit.

This cannot be avoided anyway. You'll have to put that if() somewhere=20
anyway, to avoid div-by-zero and other nasty issues. Consider this=20
code from Audiality:

(in the envelope generator:)

=09case PES_DELAY:
=09=09duration =3D S2S(p->param[APP_ENV_T1]);
=09=09target =3D p->param[APP_ENV_L1];
=09=09clos->env_state =3D PES_ATTACK;
=09=09break;

This is the switch from DELAY to ATTACK. 'duration' goes into the=20
event's duration field, and may well be 0, either because the=20
APP_ENV_T1 parameter is very small, or because the sample rate is=20
low. (S2S() is a macro that converts seconds to samples, based on the=20
current system sample rate.) Since RAMP(target, 0) does exactly what=20
we want in that case, that's just fine. We leave the special case=20
handling to the receiver:

(in the voice mixer:)

=09case VE_IRAMP:
=09=09if(ev->arg2)
=09=09{
=09=09=09v->ic[ev->index].dv =3D ev->arg1 << RAMP_BITS;
=09=09=09v->ic[ev->index].dv -=3D v->ic[ev->index].v;
=09=09=09v->ic[ev->index].dv /=3D ev->arg2 + 1;
=09=09}
=09=09else
=09=09=09v->ic[ev->index].v =3D ev->arg1 << RAMP_BITS;
=09=09v->ic[ev->index].aimv =3D ev->arg1;
=09=09v->ic[ev->index].aimt =3D ev->arg2 + s;
=09=09break;

Here we calculate our internal ramping increments - or, if the=20
duration argument (ev->arg2) is 0, we just grab the target value=20
right away.

One would think that it would be appropriate to set .dv to 0 in the=20
SET case, but it's irrelevant, since what happens after the RAMP=20
duration is undefined, and it's illegal to leave a connected control=20
without input.

That is, the receiver will never end up in the "where do I go now?"=20
situation. In the case of Audiality, the instance is destroyed at the=20
very moment the RAMP event stream ends. In the case of XAP, there is=20
always the option of disconnecting the control, and in that case, the=20
disconnect() call would clear .dv as required to lock the control at=20
a fixed value. (Or the plugin will reconfigure itself internally to=20
remove the control entirely, or whatever...)

> My proposed ramping approach that consists of
> value_to_be_set,  delta
> does not require an if and if you simply want to set a value
> you set delta =3D 0

Delta based ramping has the error build-up issue - but <value, delta>=20
tuples should be ok... The <target, duration> approach has the nice=20
side effects of

=091) telling the plugin for how long the (possibly
=09   approximated) ramp must behave nicely, and

=092) eliminating the requirement for the sender to
=09   know the exact current value of controls.

(The latter can make life easier for senders in some cases, but in the=20
case of Audiality - which uses fixed point controls - it's mostly=20
about avoiding error build-up drift without introducing clicks.)

> But my approach has the disadvantage that if you want do to mosly
> ramping you have always to calculate value_to_be_set at each event
> and this could become not trivial if you do not track the values
> within the modules.

Actually, I think you'll have to keep track of what you're doing most=20
of the time anyway, so that's not a major one. The <target, duration>=20
approach doesn't have that requirement, though...

[...]
> blockless as refered above by me (blockless =3D one single equation
> but processed in blocks), or blockless using another kind of
> approach ? (elaborate please)

By "blockless", I mean "no blocks", or rather "blocks of exactly one=20
sample frame". That is, no granularity below the audio rate, since=20
plugins are executed once per sample frame. C-rate =3D=3D A-rate, and=20
there's no need for timestamped events or anything like that, since=20
everything is sample accurate anyway.

> > That said, something that generates C code that's passed to a
> > good optimizing compiler might shift things around a bit,
> > especially now that there are compilers that automatically
> > generate SIMD code and stuff like that.
>
> The question is indeed if yo do LADSPA style processing
> (applying all DSP processing in sequence) the compiler uses SIMD
> and optimization of the processing loops and is therefore faster
> than calculating the result one single big equation at time which
> could possibly not take advantage of SIMD etc.
> But OTOH the blockless processing has the advantage that
> things are not moved around much in the cache.
> The output value of the first module is directly available as the
> input value of the next module without needing to move it to
> a temporary buffer or variable.

It's just that the cost of temporary buffers is very low, as long as=20
they all fit in the cache. Meanwhile, if you throw too many cache=20
abusing plugins into the same inner loop, you may end up with=20
something that thrashes the cache on a once-per-sample-frame basis...

[...]
> > > As said I dislike "everyting is a CV" a bit because you cannot
> > > do what I proposed:
[...]
> > I disagree to some extent - but this is a very complex subject.
> > Have you followed the XAP discussions? I think we pretty much
> > concluded that you can get away with "everything is a control",
> > only one event type (RAMP, where duration =3D=3D 0 means SET) and a
> > few data types. That's what I'm using internally in Audiality,
> > and I'm not seing any problems with it.
>
> Ah you are using the concept of duration.
> Ins't it a bit redundant ?

No - especially not if you consider that some plugins will only=20
*approximate* the linear ramps. When you do that, it becomes useful=20
to know where the sender intends the ramp to end, to come up with a=20
nice approximation that hits the start and end points dead on.

> Instead of using duration one can use
> duration-less RAMP events and just generate an event that sets
> the delta ramp value to zero when you want the ramp to stop.

Yes, but since it's not useful to "just let go" of a control anyway,=20
this doesn't make any difference. Each control receives a continous=20
stream of structured audio rate data, and that stream must be=20
maintained by the sender until the control is disconnected.

Of course, there are other ways of looking at it, but I don't really=20
see the point. Either you're connected, or you aren't. Analog CV=20
outputs don't have a high impedance "free" mode, AFAIK. ;-) (Or maybe=20
some weird variants do - but that would still be an extra feature, ie=20
"soft disconnect", rather than a property of CV control signals.)

> > > Basically in my model you cannot connect everything with
> > > everything (Steve says it it bad but I don't think so) but you
> > > can connect everything with "everything that makes sense to
> > > connect to".
> >
> > Well, you *can* convert back and forth, but it ain't free... You
> > can't have everything.
>
> Ok but converters will be the exception and not the rule:

One would hope so... That should be carefully considered when deciding=20
which controls use what format.

> for example the MIDI mapper module
>
> see the GUI screenshot message here:
> http://sourceforge.net/mailarchive/forum.php?thread_id=3D2841483&foru
>m_id=3D12792
>
> acts as a proxy between the MIDI Input and the RAM sampler module.
> So it makes the right port types available.
> No converters are needed. It's all done internally in the best
> possible way without needless float to int conversions,
> interpreting pointers as floats and other "everything is a CV"
> oddities ;-)

*hehe*

Well, it gets harder when those modular synth dudes show up and want=20
to play around with everything. ;-)

[...]
> > Yes... In XAP, we tried to forget about the "argument bundling"
> > of MIDI, and just have plain controls. We came up with a nice and
> > clean design that can do everything that MIDI can, and then some,
> > still without any multiple argument events. (Well, events *have*
> > multiple arguments, but only one value argument - the others are
> > the timestamp and various addressing info.)
>
> Hmm I did not follow the XAP discussions (I was overloaded during
> that time as usual ;-) ) but can you briefly explain how this XAP
> model would fit the model where the MIDI IN module talks to the
> MIDI mapper which in turns talks to the RAM sampler.

Well, the MIDI IN module would map everything to XAP Instrument=20
Control events, for starters.

=09NoteOn(Ch, Pitch, Vel);

would become something like

=09vid =3D Pitch;=09=09//Use Pitch for voice ID, like MIDI...
=09ALLOC_VOICE(vid);=09//Tell the receiver we need a voice for 'vid'
=09VCTRL(vid, VELOCITY, Vel);=09//Set velocity
=09VCTRL(vid, PITCH, Pitch);=09//Set pitch
=09VCTRL(vid, VOICE, 1);=09=09//Voice on

and

=09NoteOff(Ch, Pitch);

becomes

=09vid =3D Pitch;
=09VCTRL(vid, VOICE, 0)=09=09//Voice off
=09RELEASE_VOICE(vid)=09=09//We won't talk about 'vid' no more.

Note that RELEASE_VOICE does *not* mean the actual voice dies=20
instantly. That's entirely up to the receiver. What it *does* mean is=20
that you give up the voice ID, so you can't control the voice from=20
now on. If you ALLOC_VOICE(that_same_vid), it well be assigned to a=20
new, independent voice.

Anyway, VELOCITY, PITCH (and any others you like) are just plain=20
controls. They may be continous, "voice on latched" and/or "voice off=20
latched", so you can emulate the same (rather restrictive) logic that=20
applies to MIDI NoteOn/Off parameters. That is, we use controls that=20
are latched at state transitions instead of explicit argruments to=20
some VOICE ON/OFF event.

[...]
> > Either way, the real heavy stuff is always the DSP code. In cases
> > where it isn't, the whole plugin is usually so simple that it
> > doesn't really matter what kind of control interface you're
> > using; the DSP code fits right into the basic "standard model"
> > anyway. In such cases, an API like XAP or Audiality's internal
> > "plugin API" could provide some macros that make it all insanely
> > simple - maybe simpler than LADSPA.
>
> So you are saying that in pratical terms (processing performance)
> it does not matter whether you use events or streamed control
> values ?

I'm saying it doesn't matter much in terms of code complexity. (Which=20
is rather important - someone has to code all those kewl plugins! :-)

It *does* matter a great deal in terms of performance, though, which=20
is why we have to go with the slightly more (and sometimes very much=20
more) complicated solutions.

> I still prefer the event based system because it allows you to deal
> more easily with real time events (with sub audio block precision)
> and if you need it you can run at full sample rate.

Sure, some things actually get *simpler* with timestamped events, and=20
other things don't get all that complicated. After all, you can just=20
do the "VST style quick hack" and process all events first and then=20
process the audio for the full block, if you can't be arsed to do it=20
properly. (They actually do that in some examples that come with the=20
VST SDK, at least in the older versions... *heh*)

> > Anyway, need to get back to work now... :-)
>
> yeah, we unleashed those km-long mails again ... just like in the
> old times ;-) can you say infinite recursion ;-)))

*hehe* :-)

//David Olofson - Programmer, Composer, Open Source Advocate

=2E- The Return of Audiality! --------------------------------.
| Free/Open Source Audio Engine for use in Games or Studio. |
| RT and off-line synth. Scripting. Sample accurate timing. |
`-----------------------------------> http://audiality.org -'
   --- http://olofson.net --- http://www.reologica.se ---