From: Ulrich K. <ul...@ch...> - 2012-03-16 01:48:57
|
From our Spectacular Speedups series ... The main idea is that the remix effect uses a lot of floating-point arithmetic (convert to FP, multiply by a scale factor, possibly add several scaled values, perform clipping, convert back to integer), but has one important use case where all that isn't needed at all: Namely, reordering channels or selecting a subset of channels, without actually mixing or scaling. Of the three remix_* patches attached, apply remix_base and either or none of remix_global and remix_local. - remix_base only introduces local variables for the number of channels, shortening some source code lines and yielding a slight performance increase of about 5%. - remix_global checks within start() whether every output channel is a copy of exactly one input channel. If so, a flag is set that causes flow() to run a much simplified loop without FP. - remix_local is similar, but checks and records for each output channel whether it is a copy of one input channel. flow() decides inside the inner loop between the "traditional" procedure and an integer-only copy-through. Here are some measurements. swap and gain are included for comparison. t.wav is a very long (about two hours) stereo file of CDDA resolution. I used it three times for a total audio length of about six hours, to improve the SNR of the time measurements. master: 7.5 sox -D t.wav t.wav t.wav -n 15.7 sox -D t.wav t.wav t.wav -n gain -1 11.4 sox -D t.wav t.wav t.wav -n swap 25.5 sox -D t.wav t.wav t.wav -n remix 2 1 38.8 sox -D t.wav t.wav t.wav -n remix 1 2 2 1 53.0 sox -D t.wav t.wav t.wav -n remix 1,2i - 2v0.8 1p-2,2v0.01 52.6 sox -D t.wav t.wav t.wav -n remix 1,2i - 2 1p-2,2v0.01 remix_base: 24.3 sox -D t.wav t.wav t.wav -n remix 2 1 36.3 sox -D t.wav t.wav t.wav -n remix 1 2 2 1 51.0 sox -D t.wav t.wav t.wav -n remix 1,2i - 2v0.8 1p-2,2v0.01 51.0 sox -D t.wav t.wav t.wav -n remix 1,2i - 2 1p-2,2v0.01 remix_global+remix_base: 14.3 sox -D t.wav t.wav t.wav -n remix 2 1 19.5 sox -D t.wav t.wav t.wav -n remix 1 2 2 1 49.8 sox -D t.wav t.wav t.wav -n remix 1,2i - 2v0.8 1p-2,2v0.01 49.7 sox -D t.wav t.wav t.wav -n remix 1,2i - 2 1p-2,2v0.01 remix_local+remix_base: 16.2 sox -D t.wav t.wav t.wav -n remix 2 1 21.3 sox -D t.wav t.wav t.wav -n remix 1 2 2 1 52.6 sox -D t.wav t.wav t.wav -n remix 1,2i - 2v0.8 1p-2,2v0.01 45.6 sox -D t.wav t.wav t.wav -n remix 1,2i - 2 1p-2,2v0.01 "remix 2 1" and "remix 1 2 2 1" are the two test cases where only reordering of channels is requested. As you can see, remix_global and remix_local both reduce the running time significantly here, somewhere in the range of 30% to 50%. remix_local is slower due to the "if" construct in its inner loop. "remix 1,2i - 2v0.8 1p-2,2v0.01" is the opposite case, needing scaling and mixing on every channel. I can't explain why remix_global is still a bit faster than remix_base alone here, it isn't supposed to be. But it is expected that remix_local is a bit slower, again because of the branching in its inner loop. "remix 1,2i - 2 1p-2,2v0.01" shows the difference between the two approaches: There is one output channel (the third) that could be copied through directly. remix_global doesn't take advantage of it, while remix_local does. This single channel is enough to compensate for the additional complexity in its inner loop. So, the question is how often the intermediate case (some channels copied through, some actually remixed) occurs in practice. If rarely, remix_global is the best choice; if often, remix_local is better. There is also one way to get the best of both worlds, but at the price of an increased code complexity: Use three different flow() functions for each of the possible cases that start() then can choose between. Ulrich PS: If you're doing tests on a 32-bit system anyway, I'd be interested in hearing about any performance effect of clips.patch. I assume it might reduce running time for the scaling/mixing case, since the 64-bit effp->clips doesn't have to be updated as often. |