|
From: Steve H. <S.W...@ec...> - 2002-11-13 17:58:47
|
On Wed, Nov 13, 2002 at 05:47:14 +0100, Benno Senoner wrote: > Since we will probably go all floating point (because high precision, > head room and flexibility over integer) you need to be careful to > optimize the code because as we all know x86 FPUs do suck a bit. Right, but we can use SSE in P4's (and maybe P3's if its faster) with gcc3. This just needs the flags I posted to l-a-d, no code changes. > Steve H: I have added stereo mixing with volume support to better > reflect the behaviour of a real sampler with pan support, fortunately > the performance drop from the mono version is minimal thanks to caching. Excellent. I though we were wasting a lot of cycles waiting for the RAM in the mono case. [events and CV] > One might say this is a waste of CPU but as Steve wrote in an earlier > posting on this list, the rate of CV values is usually much lower (1/4 - > 1/16) than the samplerate. This means that even if the event stream is > very dense the added overhead is minimal. > I think the best way to find a good comprimise between flexibility > and speed is to try out several methods and pick those with the best > price/performance ratio. OK, well events are more LADSPA like, which is convienient I suppose, this is really an internal enging thoing though, so we dont have to decide upfront. > Are FXes in soft samplers/synths usually stereo or mono ? > Since we are using recompilation this can be made flexible but I have > noticed that FX send channels can chew up quite some CPU. > see this: I think on older samplers they are stereo return (to the main mix outs), newer samplers have many more outputs, so I dont know how they handle it. The number of send channels is equal to the number of channels in the sample. > P4: > samples/sec = 12528321.035306 mono voices at 44.1kHz = 284.088912 > efficency: 144.401951 CPU cycles/sample > > Athlon: > samples/sec = 14626412.219113 mono voices at 44.1kHz = 331.664676 > efficency: 95.721219 CPU cycles/sample > > > This with both gcc3.2 and 2.96. The P4 seem to suck quite. P4's really dont like branches from what I have heard (very long pipelines). The Athlon is much shallower. What RAM systems did the two machines have? > Using the Intel C / gcc compilers with SSE optimizations did not > provide any speedup, in some cases the performance was even worse. Even on P4? > I heard the P4 heavily relies on optimal SSE2 optimizations in order to > deliver maximum performance and it seems that both gcc and icc do not > work optimally in this regard. SSE, not SSE2 IIRC. SSE2 is still only 128bits wide, and uses 64bit floats so it can only go two-way. - Steve |