Re: [Linuxsampler-devel] RAM sampler module proposal

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Thu, 2002-11-07 at 17:43, Steve Harris wrote:
> > For example without timestamped events, when using not so small buffer
> > sizes, note starts/stop would get quantized to buffer boundaries
> > introducing unacceptable timing artifacts.
> 
> That is true in principle, but bear in mind that in many cases the note on
> events will be coming in over real time MIDI, and therefore will have to
> be processed as soon as posible ie. at the start of the next process()
> block.

Do not understimate these artifacts, perhaps they go unnoticed with 1
msec audio fragments, but when you drive up the buffer size things are
beginning sound weird.
And think about the fact that the event can come just a few samples
before the current fragment gets to the audio output.
This means that the almost one-full-fragment-long-delay will occur
anyway.
The small delay will go unnoticed.
The jitter correction just ensures that the delay will be constant,
 exactly one fragment.
You can calculate the needed delay by looking at either the soundcard's
frame pointer or alternatively using gettimeofday() / RDTSC and scale it
to the sample rate (44100 nominal or for even more precision calibrate
it to the real sample rate)
Read this paper in order to convince yourself that the jitter correction
is needed in order to provide excellent timing.
http://www.rme-audio.de/english/techinfo/lola_latec.htm
The price to pay is very low since it involves just a few calculations
outside the innermost loop.
Plus when driven from a sequencer, the sampler can provide sample
accurate audio rendering. (but in that case a time stamped protocol is
probably needed, time stamped MIDI anyone ? )

> 
> Yes, though generally the CV signals run at a lower rate than the audio
> signals (eg. 1/4tr or 1/16th). Like Krate signals in Music N. Providing
> pitch data at 1/4 audio rate is more than enough and will save cycles.

Yes this is a good idea ... perhaps allowing variable CV granularity
or better to run at fixed 1/4 audio rate ?

>  
> > Or perhaps both models can be used ? (a time stamped event that supplies
> 
> I think a mixture ofhte two is neccesary.

I believe that too,but I'd like to hear more opinions.

> 
> > That way we save CPU time and can avoid the statement
> > if(sample_ptr >= loop_end_pos) reset_sample_ptr();   
> > within audio rendering loop.
> 
> As long as the compiler issues the branch prediction instruction correctly
> (to hint that the condition will be false), it will be fine. You can check
> this by looking at the .s output.

How do you check this ? 
I mean probably there will be a cmp (compare statement) followed by a 
conditional jump (jge) , how can the compiler issue branch prediction
instructions on x86 ?
I thought it is the task of the CPU to figure it out ?

> 
> > IIWU/Fluid sounds quite nice but it seems to be quite CPU heavy.
> > I took a look at the voice generation routines and it is all pretty much
> > "hard coded" plus it uses integer math (integer,fractional parts etc)
> > which is not that efficient as you might think.
> 
> If you are refering to phase pointers, then its not an efficientcy issue,
> if you use floating point numbers then the sample will play out of key,
> only slightly, but enough that you can tell.

Out of key because using 32bit floats does provide only a 24bit mantissa
?
In my proof of concept code I use 64bit float for the playback pointers
and it works flawlessly even with extreme pitches. 
(I think that with 48bit mantissa it would take quite a lot of errors
adding in order to notice that the tune changes).
Was this the issue or am I missing something ?

> 
> > Regarding the JACK issues that Matthias W raised:
> > I'm no JACK expert but I hope that JACK supports running a JACK client
> > directly in it's own process space as it were a plugin.
> > This would save unnecessary context switches since there would be only
> > one SCHED_FIFO process runnnig. 
> > (Do the JACK ALSA I/O modules work that way ? )
> 
> Yes, but there is currently no mechanism for loading an in process client
> once the engine has started, however that is merely because the function
> hasn't been written. Both the ALSA and Solaris i/o clients are in process,
> but they are loaded when the engine starts up.

Ok, at least the engine is designed to work that way (so I guess for
maximum performance some extensions for JACK will be required but I
assume that will not be a big problem)

> Further to this, serial MIDI only runs at 31.250 kbaud, which means
>that the maximum resolution of a MIDI note on is 42 samples at 44.1kHz
>(30 bits> at 31250bits/s).
> 

Yes with 32 sample fragments (my beloved 2.1 msec latency case) you can
match midi resolution with at-block-boundary rendering.
I think people will want to use bigger buffer sizes too perhaps 
they want to increase performance or because the particular hardware
can't cope with such small buffers.
Plus there is the offline audio rendering or from sequencer where
sometimes sample accurate rendering can help avoid flanging effects due
to small delays in triggering similar or equal waveforms.
The price to pay for handling the sample accurate events is quite low
because when there are no events pending no CPU is wasted within the
innermost loop.

the usual way to handle time stamped events within an audio block

while(num_samples_before_event=event_pending()) 
{
  process(num_samples_before_event);
  handle_event(); 
}
process(num_samples_after_event);

of course event_pending(), process() will probably be inlined macros
in order to provide maximum performance, but as you can see, the only
overhead over an event-less system is the checking of event_pending()
at the beginning of the block and after an event has occurred because
 it could be that more than one event per fragment occurs.

When using lock free fifos or linked lists (probably the lock free fifo 
is more efficient since it allows asynchronous insertion by other
modules) you simply check the presence of an element within the
structure which usually involves checking a pointer or doing a
subtraction. (lock free fifo) .. not a big deal especially since it lies
outside the innermost loop.

> Obviously things like OSC have greater time resolution, but it shows
>that sample accurate note triggering isn't essential. It may something
> worth dropping for efficiency.

Steve, using your own words, for the efficiency "nazis" we could always
tell
the signal recompiler to #undef the event handling code and compile an
event less version.
:-)
I think the recompiler has many advantages like easily being able to
provide "simplified" instruments where you do not include LP filters,
envelopes etc.
Eg in the cases you need only sample playback without any post
processing, leave out all DSP stuff and you get an instrument that is
faster than the "standard" ones while it does exactly what you want.

Benno
-- 
http://linuxsampler.sourceforge.net
Building a professional grade software sampler for Linux.
Please help us designing and developing it.