You can subscribe to this list here.
2001 
_{Jan}

_{Feb}

_{Mar}

_{Apr}

_{May}

_{Jun}

_{Jul}

_{Aug}

_{Sep}

_{Oct}
(50) 
_{Nov}
(77) 
_{Dec}
(169) 

2002 
_{Jan}
(139) 
_{Feb}
(147) 
_{Mar}
(111) 
_{Apr}
(348) 
_{May}
(262) 
_{Jun}
(294) 
_{Jul}
(315) 
_{Aug}
(186) 
_{Sep}
(132) 
_{Oct}
(135) 
_{Nov}
(358) 
_{Dec}
(241) 
2003 
_{Jan}
(557) 
_{Feb}
(489) 
_{Mar}
(361) 
_{Apr}
(378) 
_{May}
(493) 
_{Jun}
(348) 
_{Jul}
(289) 
_{Aug}
(259) 
_{Sep}
(322) 
_{Oct}
(463) 
_{Nov}
(305) 
_{Dec}
(201) 
2004 
_{Jan}
(198) 
_{Feb}
(186) 
_{Mar}
(192) 
_{Apr}
(216) 
_{May}
(175) 
_{Jun}
(200) 
_{Jul}
(277) 
_{Aug}
(127) 
_{Sep}
(64) 
_{Oct}
(208) 
_{Nov}
(170) 
_{Dec}
(154) 
2005 
_{Jan}
(239) 
_{Feb}
(171) 
_{Mar}
(123) 
_{Apr}
(55) 
_{May}
(74) 
_{Jun}
(100) 
_{Jul}
(129) 
_{Aug}
(221) 
_{Sep}
(209) 
_{Oct}
(270) 
_{Nov}
(590) 
_{Dec}
(313) 
2006 
_{Jan}
(377) 
_{Feb}
(189) 
_{Mar}
(234) 
_{Apr}
(180) 
_{May}
(230) 
_{Jun}
(404) 
_{Jul}
(574) 
_{Aug}
(300) 
_{Sep}
(424) 
_{Oct}
(444) 
_{Nov}
(363) 
_{Dec}
(153) 
2007 
_{Jan}
(223) 
_{Feb}
(106) 
_{Mar}
(311) 
_{Apr}
(233) 
_{May}
(336) 
_{Jun}
(278) 
_{Jul}
(467) 
_{Aug}
(416) 
_{Sep}
(550) 
_{Oct}
(503) 
_{Nov}
(483) 
_{Dec}
(271) 
2008 
_{Jan}
(344) 
_{Feb}
(127) 
_{Mar}
(416) 
_{Apr}
(381) 
_{May}
(679) 
_{Jun}
(749) 
_{Jul}
(549) 
_{Aug}
(281) 
_{Sep}
(137) 
_{Oct}
(324) 
_{Nov}
(200) 
_{Dec}
(330) 
2009 
_{Jan}
(634) 
_{Feb}
(438) 
_{Mar}
(560) 
_{Apr}
(387) 
_{May}
(313) 
_{Jun}
(443) 
_{Jul}
(947) 
_{Aug}
(505) 
_{Sep}
(477) 
_{Oct}
(679) 
_{Nov}
(714) 
_{Dec}
(407) 
2010 
_{Jan}
(348) 
_{Feb}
(283) 
_{Mar}
(232) 
_{Apr}
(173) 
_{May}
(79) 
_{Jun}
(109) 
_{Jul}
(128) 
_{Aug}
(62) 
_{Sep}
(118) 
_{Oct}
(153) 
_{Nov}
(57) 
_{Dec}
(76) 
2011 
_{Jan}
(105) 
_{Feb}
(150) 
_{Mar}
(314) 
_{Apr}
(266) 
_{May}
(55) 
_{Jun}
(47) 
_{Jul}
(113) 
_{Aug}
(70) 
_{Sep}
(77) 
_{Oct}
(93) 
_{Nov}
(106) 
_{Dec}
(190) 
2012 
_{Jan}
(68) 
_{Feb}
(188) 
_{Mar}
(313) 
_{Apr}
(80) 
_{May}
(122) 
_{Jun}
(222) 
_{Jul}
(94) 
_{Aug}
(239) 
_{Sep}
(64) 
_{Oct}
(164) 
_{Nov}
(168) 
_{Dec}
(277) 
2013 
_{Jan}
(336) 
_{Feb}
(156) 
_{Mar}
(80) 
_{Apr}
(135) 
_{May}
(150) 
_{Jun}
(139) 
_{Jul}
(160) 
_{Aug}
(266) 
_{Sep}
(386) 
_{Oct}
(465) 
_{Nov}
(366) 
_{Dec}
(156) 
2014 
_{Jan}
(190) 
_{Feb}
(88) 
_{Mar}
(60) 
_{Apr}
(38) 
_{May}
(146) 
_{Jun}
(104) 
_{Jul}
(189) 
_{Aug}
(424) 
_{Sep}
(235) 
_{Oct}
(990) 
_{Nov}
(598) 
_{Dec}
(393) 
2015 
_{Jan}
(256) 
_{Feb}
(40) 
_{Mar}
(195) 
_{Apr}
(497) 
_{May}
(227) 
_{Jun}
(138) 
_{Jul}
(257) 
_{Aug}
(351) 
_{Sep}
(151) 
_{Oct}
(119) 
_{Nov}
(78) 
_{Dec}
(16) 
2016 
_{Jan}
(225) 
_{Feb}
(289) 
_{Mar}
(267) 
_{Apr}
(318) 
_{May}
(198) 
_{Jun}
(177) 
_{Jul}
(155) 
_{Aug}
(268) 
_{Sep}
(160) 
_{Oct}

_{Nov}

_{Dec}

S  M  T  W  T  F  S 






1
(7) 
2
(8) 
3
(10) 
4
(17) 
5
(10) 
6
(5) 
7
(5) 
8
(5) 
9
(7) 
10
(10) 
11
(18) 
12
(9) 
13
(6) 
14
(9) 
15
(4) 
16
(2) 
17
(5) 
18
(11) 
19
(8) 
20
(13) 
21
(12) 
22
(9) 
23
(6) 
24
(6) 
25
(7) 
26
(11) 
27
(16) 
28
(13) 
29
(23) 
30
(10) 
31
(31) 






From: Martyn Shaw <martynshaw99@go...>  20090511 23:51:05

OK, this looks neat (although I haven't checked the detail). How much speedup do we get in drawing with this? If it's significant, can you supply a patch to HEAD please? Thanks Martyn Sami Liedes wrote: > On Mon, May 11, 2009 at 04:50:22AM +0300, Sami Liedes wrote: >> #3 Move two exp()s out of DrawClipSpectrum() inner loop by math >> trickery. > >  > commit 5ad6daabf17b54ba2874a281e8b5f9fd59bf76e7 > Author: Sami Liedes <sliedes@...> > Date: Mon May 11 01:40:30 2009 +0300 > > Move two exp()s out of DrawClipSpectrum() inner loop. > > diff git a/src/TrackArtist.cpp b/src/TrackArtist.cpp > index 2257402..8dd07d0 100644 >  a/src/TrackArtist.cpp > +++ b/src/TrackArtist.cpp > @@ 1988,17 +1988,21 @@ void TrackArtist::DrawClipSpectrum(WaveTrack* track, WaveClip *clip, > bool inMaximum = false; > #endif //EXPERIMENTAL_FIND_NOTES > > + double yy2_base=exp(lmin)/f; > + float yy2 = yy2_base; > + double exp_scale_per_height = exp(scale/mid.height); > for (int yy = 0; yy < mid.height; yy++) { > if(!usePxCache) { > float h=float(yy)/mid.height; >  float yy2=expf(h*scale+lmin)/f; > + //float yy2=expf(h*scale+lmin)/f; > if (int(yy2)>=half) > yy2=half1; > if (yy2<0) > yy2=0; > float bin0 = float(yy2); > float h1=float(yy+1)/mid.height; >  float yy3=expf(h1*scale+lmin)/f; > + yy2_base *= exp_scale_per_height; > + float yy3=yy2_base; > if (int(yy3)>=half) > yy3=half1; > if (yy3<0) > @@ 2044,6 +2048,7 @@ void TrackArtist::DrawClipSpectrum(WaveTrack* track, WaveClip *clip, > if (value < 0.0) > value = float(0.0); > clip>mSpecPxCache>values[x * mid.height + yy] = value; > + yy2 = yy2_base; > } > else > value = clip>mSpecPxCache>values[x * mid.height + yy]; > >  > > >  > >  > The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your > production scanning environment may not be a perfect world  but thanks to > Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700 > Series Scanner you'll get full speed at 300 dpi even with all image > processing features enabled. http://p.sf.net/sfu/kodakcom > > >  > > _______________________________________________ > audacitydevel mailing list > audacitydevel@... > https://lists.sourceforge.net/lists/listinfo/audacitydevel 
From: Martyn Shaw <martynshaw99@go...>  20090511 23:40:07

So what is the speedup with these changes? I don't see that they can be very significant. Martyn Sami Liedes wrote: > On Mon, May 11, 2009 at 04:50:22AM +0300, Sami Liedes wrote: >> #2 Use float versions of various math functions where possible. >> Also use pow10f() instead of pow(10.0, ...) if available. > > Here's the patch. Note that pow10f() is a GNU extension, I don't know > if it's available on Windows, so I made autoconf check for it. Hope I > understood the autoconf stuff correctly, I haven't much experience > with it. > > Sami > > >  > commit 60f50d38e2d605103d281569df6a039904e70ae5 > Author: Sami Liedes <sliedes@...> > Date: Mon May 11 01:31:52 2009 +0300 > > Use float versions of math routines where available. > > diff git a/configure.in b/configure.in > index eb1b19b..793b081 100644 >  a/configure.in > +++ b/configure.in > @@ 667,6 +667,7 @@ dnl Check for lrint/lrintf > AC_C99_FUNC_LRINT > AC_C99_FUNC_LRINTF > > +AC_CHECK_LIB(m, pow10f, AC_DEFINE(HAVE_POW10F)) > > if [[ "$use_ladspa" = "yes" ]] ; then > OPTOBJS="$OPTOBJS effects/ladspa/LoadLadspa.o" > diff git a/src/Spectrum.cpp b/src/Spectrum.cpp > index ea7677e..1e437e1 100644 >  a/src/Spectrum.cpp > +++ b/src/Spectrum.cpp > @@ 27,7 +27,7 @@ static inline float todB_a(const float *x){ > #else > > static inline float todB_a(const float *x){ >  return (*(x)==0?400.f:log(*(x)**(x))*4.34294480f); > + return (*(x)==0?400.f:logf(*(x)**(x))*4.34294480f); > } > > #endif > @@ 75,7 +75,7 @@ bool ComputeSpectrum(float * data, int width, > // of the power, instead of the square root > > for (i = 0; i < windowSize; i++) >  in[i] = pow(in[i], 1.0f / 3.0f); > + in[i] = powf(in[i], 1.0f / 3.0f); > > // Take FFT > FFT(windowSize, false, in, NULL, out, out2); > @@ 128,7 +128,7 @@ bool ComputeSpectrum(float * data, int width, > for (i = 0; i < half; i++){ > float temp=(processed[i] / windowSize / windows); > if (temp > 0.0) >  processed[i] = 10*log10(temp); > + processed[i] = 10*log10f(temp); > else > processed[i] = 0; > } > diff git a/src/TrackArtist.cpp b/src/TrackArtist.cpp > index edd319c..2257402 100644 >  a/src/TrackArtist.cpp > +++ b/src/TrackArtist.cpp > @@ 1812,14 +1812,14 @@ void TrackArtist::DrawClipSpectrum(WaveTrack* track, WaveClip *clip, > sampleCount w1 = (sampleCount) ((t0*rate + x *rate *tstep) + .5); > > const float > // e=exp(1.0f), >  log2=log(2.0f), > +// e=expf(1.0f), > + log2=logf(2.0f), > f=rate/2.0f/half, >  lmin=log(float(minFreq)), >  lmax=log(float(maxFreq)), > + lmin=logf(float(minFreq)), > + lmax=logf(float(maxFreq)), > #ifdef EXPERIMENTAL_FFT_SKIP_POINTS >  lmins=log(float(minFreq)/(fftSkipPoints+1)), >  lmaxs=log(float(maxFreq)/(fftSkipPoints+1)), > + lmins=logf(float(minFreq)/(fftSkipPoints+1)), > + lmaxs=logf(float(maxFreq)/(fftSkipPoints+1)), > #else //!EXPERIMENTAL_FFT_SKIP_POINTS > lmins=lmin, > lmaxs=lmax, > @@ 1835,10 +1835,10 @@ void TrackArtist::DrawClipSpectrum(WaveTrack* track, WaveClip *clip, > for (int y = 0; y < mid.height; y++) { > float n =(float(y )/mid.height*scale2lmin2)*12; > float n2=(float(y+1)/mid.height*scale2lmin2)*12; >  float f =float(minFreq)/(fftSkipPoints+1)*pow(2.0f, n /12.0f+lmin2); >  float f2=float(minFreq)/(fftSkipPoints+1)*pow(2.0f, n2/12.0f+lmin2); >  n =log(f /440)/log2*12; >  n2=log(f2/440)/log2*12; > + float f =float(minFreq)/(fftSkipPoints+1)*powf(2.0f, n /12.0f+lmin2); > + float f2=float(minFreq)/(fftSkipPoints+1)*powf(2.0f, n2/12.0f+lmin2); > + n =logf(f /440)/log2*12; > + n2=logf(f2/440)/log2*12; > if (floor(n) < floor(n2)) > yGrid[y]=true; > else > @@ 1856,9 +1856,9 @@ void TrackArtist::DrawClipSpectrum(WaveTrack* track, WaveClip *clip, > f2bin = half/(rate/2.0f), > #endif //EXPERIMENTAL_FFT_SKIP_POINTS > bin2f = 1.0f/f2bin, >  minDistance = pow(2.0f, 2.0f/12.0f), >  i0=exp(lmin)/f, >  i1=exp(scale+lmin)/f, > + minDistance = powf(2.0f, 2.0f/12.0f), > + i0=expf(lmin)/f, > + i1=expf(scale+lmin)/f, > minColor=0.0f; > const int maxTableSize=1024; > int *indexes=new int[maxTableSize]; > @@ 1967,19 +1967,19 @@ void TrackArtist::DrawClipSpectrum(WaveTrack* track, WaveClip *clip, > } > > // The f2pix helper macro converts a frequency into a pixel coordinate. > #define f2pix(f) (log(f)lmins)/(lmaxslmins)*mid.height > +#define f2pix(f) (logf(f)lmins)/(lmaxslmins)*mid.height > > // Possibly quantize the maxima frequencies and create the pixel block limits. > for (int i=0; i < maximas; i++) { > int index=maxima[i]; > float f = float(index)*bin2f; > if (findNotesQuantize) >  { f = exp(int(log(f/440)/log2*120.5)/12.0f*log2)*440; > + { f = expf(int(logf(f/440)/log2*120.5)/12.0f*log2)*440; > maxima[i] = f*f2bin; > } >  float f0 = exp((log(f/440)/log2*241)/24.0f*log2)*440; > + float f0 = expf((logf(f/440)/log2*241)/24.0f*log2)*440; > maxima0[i] = f2pix(f0); >  float f1 = exp((log(f/440)/log2*24+1)/24.0f*log2)*440; > + float f1 = expf((logf(f/440)/log2*24+1)/24.0f*log2)*440; > maxima1[i] = f2pix(f1); > } > } > @@ 1991,14 +1991,14 @@ void TrackArtist::DrawClipSpectrum(WaveTrack* track, WaveClip *clip, > for (int yy = 0; yy < mid.height; yy++) { > if(!usePxCache) { > float h=float(yy)/mid.height; >  float yy2=exp(h*scale+lmin)/f; > + float yy2=expf(h*scale+lmin)/f; > if (int(yy2)>=half) > yy2=half1; > if (yy2<0) > yy2=0; > float bin0 = float(yy2); > float h1=float(yy+1)/mid.height; >  float yy3=exp(h1*scale+lmin)/f; > + float yy3=expf(h1*scale+lmin)/f; > if (int(yy3)>=half) > yy3=half1; > if (yy3<0) > diff git a/src/configtemplate.h b/src/configtemplate.h > index 1c439f2..12ab55a 100644 >  a/src/configtemplate.h > +++ b/src/configtemplate.h > @@ 22,6 +22,9 @@ > /* Define if you have C99's lrintf function. */ > #undef HAVE_LRINTF > > +/* Define if you have pow10f function. */ > +#undef HAVE_POW10F > + > /* Define to 1 if you have the <memory.h> header file. */ > #undef HAVE_MEMORY_H > > diff git a/src/effects/NoiseRemoval.cpp b/src/effects/NoiseRemoval.cpp > index 788ee9b..87c9921 100644 >  a/src/effects/NoiseRemoval.cpp > +++ b/src/effects/NoiseRemoval.cpp > @@ 469,7 +469,7 @@ void EffectNoiseRemoval::FillFirstHistoryWindow() > double power = > mRealFFTs[0][i] * mRealFFTs[0][i] + > mImagFFTs[0][i] * mImagFFTs[0][i]; >  mSpectrums[0][i] = (float)(10.0 * log10(power)); > + mSpectrums[0][i] = (float)(10.0 * log10f(power)); > mGains[0][i] = mNoiseGain; > } > } > @@ 584,7 +584,11 @@ void EffectNoiseRemoval::RemoveNoise() > > // Apply gain to FFT > for (j = 0; j < mSpectrumSize; j++) { >  float mult = pow(10.0, mGains[out][j] / 20.0); > +#ifdef HAVE_POW10F > + float mult = pow10f(mGains[out][j] / 20.0); > +#else > + float mult = powf(10.0, mGains[out][j] / 20.0); > +#endif > > mRealFFTs[out][j] *= mult; > mImagFFTs[out][j] *= mult; > >  > > >  > >  > The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your > production scanning environment may not be a perfect world  but thanks to > Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700 > Series Scanner you'll get full speed at 300 dpi even with all image > processing features enabled. http://p.sf.net/sfu/kodakcom > > >  > > _______________________________________________ > audacitydevel mailing list > audacitydevel@... > https://lists.sourceforge.net/lists/listinfo/audacitydevel 
From: Jan Kolar <kolar@ma...>  20090511 23:01:22

Dan Horgan napsal(a): > On Mon, May 11, 2009 at 10:42:36AM +0200, Jan Kolar wrote: > >> Sami Liedes napsal(a): >> >>> On Mon, May 11, 2009 at 04:50:22AM +0300, Sami Liedes wrote: >>> >>> >>>> #1 Replace two calls to rand() in Dither.cpp with one. Profiling >>>> indicated that the rand() calls took some 30% of CPU when removing >>>> noise when the sample format was not float. (I only realized that >>>> I could speed up noise removal a lot by using float format after >>>> profiling.) >>>> >>>> >>> This is a rather simple change. DITHER_NOISE is defined as follows: >>> >>> #define DITHER_NOISE (rand() / (float)RAND_MAX  0.5f) >>> >>> and in Dither::ShapedDither, there was >>> >>> float r = DITHER_NOISE + DITHER_NOISE; >>> >>> As far as I understand correctly, this can be replaced by >>> >>> float r = DITHER_NOISE * 2; >>> >>> without the maths becoming incorrect. This saves a rand() call in an >>> inner loop. >>> >>> >>> >> Sami, >> this two are not equivalent. >> The main reason is the difference in probability distribution: >> rand()+rand() gives much more often medium values then 2*rand(), which >> has more heavy 'tails'. >> This is similar to the following example: >> When casting two cubes (with 1,2,3,4,5,6 dots), you can get 2 by only >> one way (1+1), >> you can get 3 in two ways (1+2 and 2+1) and you can get 7 in six ways. >> (This is much different from casting eleven sided cube with 2,...,12 dots). >> >> I do not know if either rand()+rand() or 2*rand() is wrong from the >> presented fragment of code. >> > There's a comment, "Generate triangular dither, +1 LSB, flat psd" which > suggests that rand()+rand() is what is intended (since summing the two > independent uniformly distributed random variables does give a > 'triangular' distribution, similar to your example) > I am sorry that I overlooked the obvious comment. (I do not understand "flat psd", how much is that important?) Hopefully I will read the full source today (and perhaps give some comments ?) >> Even if rand()+rand() is the correct one, it *can be implemented by a >> single call* to rand() and some arithmetic. >> I can eventually do the underlying math. >> > > I tried working this out and got the following transformation: > r = g(x) = if (x < 0.5) then sqrt(2*x)1 else 1sqrt(2*(1x)) > where x = rand()/RAND_MAX > > which I think does the job, but has unpleasant branching & sqrts. I agree with the formula. > As far > as I can tell there are a couple of options for getting rid of these: > 1) A lookup table, where accuracy depends on size of the table > 2) Polynomial interpolation, where accuracy depends on degree of > polynomial > > Both will be unhappy with infinite derivative on the endpoints. But there is a different way of avoiding sqrt, though *I do not know* if it is worth of pursuing. The comment you recalled says '+1 LSB' I would wish (just for a moment) that means we only change whether the rounding is done 'up' or 'down'. (Really it can also mean sometimes to add extra +1 (if rounding is up) or extra 1 (if rounding is down)  4 cases together), It means *we can avoid sqrt*, and use the inverse function (the one containing nothing worse then r*r and branching) and float comparison to do exactly the same thing. <<<I'd rather we looked for a faster rand if we want to speed this up. (James)>>> Yes, this can perhaps make it faster by a bigger factor then 2. Perhaps? I don't know how much serious is the problem with branching (and cache etc.), especially when floats are used (and perhaps libm might contain some branching, too? [depending on which error/overflow handling model is used]) (There are certainly *tricks to avoid branching*, e.g. using the function that copies sign of one float to another float. For example, the above formula, modulo my mistakes, could read r = g(x) = copysignf( 1  sqrt( 1  copysignf( 2* xh , 1,0 /*infact fabsf()*/ ) ) , xh ) where xh= x 0,5 ) > I can show my working if required, but it might take me a while to > figure out the LaTeX. > > Dan > > >> [The second difference is that rand()+rand() sometimes might be slighter >> better pseudorandom generator, but this is not an issue here I think.] >> >> Anyway, the source deserves to present a reason for having rand()+rand() >> on the place. >> >> >> >> >>> Here's the patch from my private git repo: >>> >>>  >>> commit 21639cfd8920bd1399416b93082ede5a93208d8a >>> Author: Sami Liedes <sliedes@...> >>> Date: Mon May 11 04:30:31 2009 +0300 >>> >>> Replace two calls to rand() with one in Dither.cpp. >>> >>> diff git a/src/Dither.cpp b/src/Dither.cpp >>> index 608f2cc..3837d50 100644 >>>  a/src/Dither.cpp >>> +++ b/src/Dither.cpp >>> @@ 284,7 +284,7 @@ inline float Dither::TriangleDither(float sample) >>> inline float Dither::ShapedDither(float sample) >>> { >>> // Generate triangular dither, +1 LSB, flat psd >>>  float r = DITHER_NOISE + DITHER_NOISE; >>> + float r = DITHER_NOISE * 2; >>> // Run FIR >>> float xe = sample + mBuffer[mPhase] * SHAPED_BS[0] >>>  >>> >>> Sami >>> >>>  >>> >>>  >>> The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your >>> production scanning environment may not be a perfect world  but thanks to >>> Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700 >>> Series Scanner you'll get full speed at 300 dpi even with all image >>> processing features enabled. http://p.sf.net/sfu/kodakcom >>>  >>> >>> _______________________________________________ >>> audacitydevel mailing list >>> audacitydevel@... >>> https://lists.sourceforge.net/lists/listinfo/audacitydevel >>> >>> > > >>  >> The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your >> production scanning environment may not be a perfect world  but thanks to >> Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700 >> Series Scanner you'll get full speed at 300 dpi even with all image >> processing features enabled. http://p.sf.net/sfu/kodakcom >> _______________________________________________ >> audacitydevel mailing list >> audacitydevel@... >> https://lists.sourceforge.net/lists/listinfo/audacitydevel >> > > >  > The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your > production scanning environment may not be a perfect world  but thanks to > Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700 > Series Scanner you'll get full speed at 300 dpi even with all image > processing features enabled. http://p.sf.net/sfu/kodakcom > _______________________________________________ > audacitydevel mailing list > audacitydevel@... > https://lists.sourceforge.net/lists/listinfo/audacitydevel > 
From: Steve <stevethefiddle@gm...>  20090511 22:52:20

I've recently installed Ubuntu 9.04 (Jaunty) onto a new computer, and I'm now having trouble with the audio device selection in Audacity. When I build Audacity from source (either 1.3.7 or cvs) I can only get access to OSS. If Jack is running, then opening the Preferences menu causes Audacity to crash. However, the Ubuntu team seem to have done a big job patching Audacity, and the repo version (Audacity 1.3.7) not only works with OSS and ALSA, but with Jack as well. Pulseaudio is in the default installation of Ubuntu 9.04 and it works gracefully with the Ubuntu version of Audacity. I don't know enough about programming to understand what they have done, but I thought it may be of interest/use for you guys. https://launchpad.net/ubuntu/jaunty/+source/audacity/1.3.72 Is it likely to be possible to apply a similar patch to the cvs code? Steve 
From: Steve <stevethefiddle@gm...>  20090511 21:52:15

test  please ignore 
From: James Crook <crookj@in...>  20090511 19:47:57

Dan Horgan wrote: > > There's a comment, "Generate triangular dither, +1 LSB, flat psd" which > suggests that rand()+rand() is what is intended (since summing the two > independent uniformly distributed random variables does give a > 'triangular' distribution, similar to your example) > >> Even if rand()+rand() is the correct one, it *can be implemented by a >> single call* to rand() and some arithmetic. >> I can eventually do the underlying math. >> > > I tried working this out and got the following transformation: > r = g(x) = if (x < 0.5) then sqrt(2*x)1 else 1sqrt(2*(1x)) > where x = rand()/RAND_MAX > > which I think does the job, but has unpleasant branching & sqrts. As far > as I can tell there are a couple of options for getting rid of these: > 1) A lookup table, where accuracy depends on size of the table > 2) Polynomial interpolation, where accuracy depends on degree of > polynomial Nice thought, but we should be using a rand that is a lot faster than sqrt for this  it's not as if we've got high randomness demands on the rand. Something we should profile? We should be able to just add two rands. A table is going to be pushing other things out of cache and we'll be jumping about in it randomly too, so I'm not keen on a table. I'd rather we looked for a faster rand if we want to speed this up. James. 
From: James Crook <crookj@in...>  20090511 19:21:18

Sami Thank you for these patches. As you say, some look uncontroversial at first sight, but we need some discussion. I think for example that there are license issues around using FFTW in binary versions of Audacity that we distribute, that to use FFTW we have to exclude the possibility of a non GPL plugin being used. It's a matter of interpretation of GPL, but our policy is to go with the author of the code's intent. Please be patient with us. We are all volunteers doing this as a hobby and short of time. Please argue through the changes with us until we reach consensus. My hope / expectation is that we'll reach that fairly quickly and that the core developers will then want to offer you cvs write access so that you can make the agreed changes. That will make it easier for you to make changes in future too. James. Sami Liedes wrote: > Hi, > > I did some profiling of Audacity, as I got bored to the stuff I do > taking so long. After that I worked to improve the code to be faster. > > Since there are a number of patches to separate areas of Audacity with > different optimizations, I will split the submission to several mails. > I also expect some of them to be possibly less controversial (e.g. > using float versions of math routines when the result is assigned to a > float) than others (parallelization). So perhaps it's better to divide > the improvements to separate mails. > > Note that my improvements only affect the areas of audacity that I > encountered in my very limited activities (mostly just normalization, > noise removal and spectrum view). I'm sure there's a lot of work to do > in other areas too (I recommend profiling first). In some cases > significant speedups were achievable with rather trivial changes. > > The specific changes of the patches are as follows: > > #1 Replace two calls to rand() in Dither.cpp with one. Profiling > indicated that the rand() calls took some 30% of CPU when removing > noise when the sample format was not float. (I only realized that > I could speed up noise removal a lot by using float format after > profiling.) > > #2 Use float versions of various math functions where possible. > Also use pow10f() instead of pow(10.0, ...) if available. > > #3 Move two exp()s out of DrawClipSpectrum() inner loop by math > trickery. > > #4 Optimize AColor::GetColorGradient() as a lookup table, as it was > the bottleneck after #2 and #3. > > #5 Use the fftw library for FFT. This change alone speeds up noise > removal of a five hour 44.1 kHz mono recording by a factor of > 1.88. It probably speeds up lots of other stuff too, I assume FFT > is used a lot by effects. > > #6 Parallelize various loops using OpenMP, which AFAIK is supported > both by GCC and VC++ (but I haven't tried it in Windows land so it > may need some help to work there). This results in some > improvement in run time for noise removal on my Core 2 Quad > processor (19% on top of the other patches) and normalization. The > area that most benefits, however, is spectrogram view of the > waveform, which now does FFT in parallel (noise removal does not, > yet  it was more tricky  and that's why the improvement is > measly 19%). Together with the other patches, this causes the > spectrogram display to actually feel quite fast even with the > most narrowband 4096 point FFT and Hanning window maximized on a > 1600x1200 display. > > Together the patches result in a speedup by a factor of 2.09 in noise > removal of 32bit float data (6:21 or 381 seconds instead of 13:17 or > 797 seconds). Some of the other speedups are noticeable (like the > spectrogram view), but I didn't or couldn't measure them precisely. > > Sami > 
From: Dan Horgan <danhgn@go...>  20090511 17:27:26

On Mon, May 11, 2009 at 10:42:36AM +0200, Jan Kolar wrote: > Sami Liedes napsal(a): >> On Mon, May 11, 2009 at 04:50:22AM +0300, Sami Liedes wrote: >> >>> #1 Replace two calls to rand() in Dither.cpp with one. Profiling >>> indicated that the rand() calls took some 30% of CPU when removing >>> noise when the sample format was not float. (I only realized that >>> I could speed up noise removal a lot by using float format after >>> profiling.) >>> >> >> This is a rather simple change. DITHER_NOISE is defined as follows: >> >> #define DITHER_NOISE (rand() / (float)RAND_MAX  0.5f) >> >> and in Dither::ShapedDither, there was >> >> float r = DITHER_NOISE + DITHER_NOISE; >> >> As far as I understand correctly, this can be replaced by >> >> float r = DITHER_NOISE * 2; >> >> without the maths becoming incorrect. This saves a rand() call in an >> inner loop. >> >> > Sami, > this two are not equivalent. > The main reason is the difference in probability distribution: > rand()+rand() gives much more often medium values then 2*rand(), which > has more heavy 'tails'. > This is similar to the following example: > When casting two cubes (with 1,2,3,4,5,6 dots), you can get 2 by only > one way (1+1), > you can get 3 in two ways (1+2 and 2+1) and you can get 7 in six ways. > (This is much different from casting eleven sided cube with 2,...,12 dots). > > I do not know if either rand()+rand() or 2*rand() is wrong from the > presented fragment of code. There's a comment, "Generate triangular dither, +1 LSB, flat psd" which suggests that rand()+rand() is what is intended (since summing the two independent uniformly distributed random variables does give a 'triangular' distribution, similar to your example) > > Even if rand()+rand() is the correct one, it *can be implemented by a > single call* to rand() and some arithmetic. > I can eventually do the underlying math. I tried working this out and got the following transformation: r = g(x) = if (x < 0.5) then sqrt(2*x)1 else 1sqrt(2*(1x)) where x = rand()/RAND_MAX which I think does the job, but has unpleasant branching & sqrts. As far as I can tell there are a couple of options for getting rid of these: 1) A lookup table, where accuracy depends on size of the table 2) Polynomial interpolation, where accuracy depends on degree of polynomial I can show my working if required, but it might take me a while to figure out the LaTeX. Dan > [The second difference is that rand()+rand() sometimes might be slighter > better pseudorandom generator, but this is not an issue here I think.] > > Anyway, the source deserves to present a reason for having rand()+rand() > on the place. > > > >> Here's the patch from my private git repo: >> >>  >> commit 21639cfd8920bd1399416b93082ede5a93208d8a >> Author: Sami Liedes <sliedes@...> >> Date: Mon May 11 04:30:31 2009 +0300 >> >> Replace two calls to rand() with one in Dither.cpp. >> >> diff git a/src/Dither.cpp b/src/Dither.cpp >> index 608f2cc..3837d50 100644 >>  a/src/Dither.cpp >> +++ b/src/Dither.cpp >> @@ 284,7 +284,7 @@ inline float Dither::TriangleDither(float sample) >> inline float Dither::ShapedDither(float sample) >> { >> // Generate triangular dither, +1 LSB, flat psd >>  float r = DITHER_NOISE + DITHER_NOISE; >> + float r = DITHER_NOISE * 2; >> // Run FIR >> float xe = sample + mBuffer[mPhase] * SHAPED_BS[0] >>  >> >> Sami >> >>  >> >>  >> The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your >> production scanning environment may not be a perfect world  but thanks to >> Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700 >> Series Scanner you'll get full speed at 300 dpi even with all image >> processing features enabled. http://p.sf.net/sfu/kodakcom >>  >> >> _______________________________________________ >> audacitydevel mailing list >> audacitydevel@... >> https://lists.sourceforge.net/lists/listinfo/audacitydevel >> > >  > The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your > production scanning environment may not be a perfect world  but thanks to > Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700 > Series Scanner you'll get full speed at 300 dpi even with all image > processing features enabled. http://p.sf.net/sfu/kodakcom > _______________________________________________ > audacitydevel mailing list > audacitydevel@... > https://lists.sourceforge.net/lists/listinfo/audacitydevel 
From: David Bailes <drbailes@go...>  20090511 13:09:42

Hi, On Sun, May 10, 2009 at 11:23 PM, Jan Kolar <kolar@...> wrote: > > Putting aside the main problem that 1 and 2 move independently on Vista, > I think, it would be *worse*, and not better, if level meter would follow 2 > (together with 1). > Agreed. > > > This is the case when Help>Audio Device Info says "Capture volume is * > emulated*", is not it? > That means, recorded signal is multiplied by number <1 inside Audacity. > > Then the level meter should give* information about CLIPPING* in the sound > device, and in this case it is better that level meters do not depend on 2 > since 2 cannot avoid clipping. > (Therefore the second sentence on wiki Release_Checklist is not a bug to be > fixed, only a symptom of the problem from the first sentence: > P3 (Windows Vista only) System mixer level sliders and > Audacity level sliders act independently. > As a result, achieved recorded level only matches level > indicated on the Recording VU meter if the Audacity input slider is > at 100%. > ) > I think that's the intended meaning. I have zero knowledge about emulated mixers, so have no comments about them. David. 
From: Jan Kolar <kolar@ma...>  20090511 08:57:56

Sami Liedes napsal(a): > On Mon, May 11, 2009 at 04:50:22AM +0300, Sami Liedes wrote: > >> #1 Replace two calls to rand() in Dither.cpp with one. Profiling >> indicated that the rand() calls took some 30% of CPU when removing >> noise when the sample format was not float. (I only realized that >> I could speed up noise removal a lot by using float format after >> profiling.) >> > > This is a rather simple change. DITHER_NOISE is defined as follows: > > #define DITHER_NOISE (rand() / (float)RAND_MAX  0.5f) > > and in Dither::ShapedDither, there was > > float r = DITHER_NOISE + DITHER_NOISE; > > As far as I understand correctly, this can be replaced by > > float r = DITHER_NOISE * 2; > > without the maths becoming incorrect. This saves a rand() call in an > inner loop. > > Sami, this two are not equivalent. The main reason is the difference in probability distribution: rand()+rand() gives much more often medium values then 2*rand(), which has more heavy 'tails'. This is similar to the following example: When casting two cubes (with 1,2,3,4,5,6 dots), you can get 2 by only one way (1+1), you can get 3 in two ways (1+2 and 2+1) and you can get 7 in six ways. (This is much different from casting eleven sided cube with 2,...,12 dots). I do not know if either rand()+rand() or 2*rand() is wrong from the presented fragment of code. Even if rand()+rand() is the correct one, it *can be implemented by a single call* to rand() and some arithmetic. I can eventually do the underlying math. [The second difference is that rand()+rand() sometimes might be slighter better pseudorandom generator, but this is not an issue here I think.] Anyway, the source deserves to present a reason for having rand()+rand() on the place. > Here's the patch from my private git repo: > >  > commit 21639cfd8920bd1399416b93082ede5a93208d8a > Author: Sami Liedes <sliedes@...> > Date: Mon May 11 04:30:31 2009 +0300 > > Replace two calls to rand() with one in Dither.cpp. > > diff git a/src/Dither.cpp b/src/Dither.cpp > index 608f2cc..3837d50 100644 >  a/src/Dither.cpp > +++ b/src/Dither.cpp > @@ 284,7 +284,7 @@ inline float Dither::TriangleDither(float sample) > inline float Dither::ShapedDither(float sample) > { > // Generate triangular dither, +1 LSB, flat psd >  float r = DITHER_NOISE + DITHER_NOISE; > + float r = DITHER_NOISE * 2; > > // Run FIR > float xe = sample + mBuffer[mPhase] * SHAPED_BS[0] >  > > Sami > >  > >  > The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your > production scanning environment may not be a perfect world  but thanks to > Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700 > Series Scanner you'll get full speed at 300 dpi even with all image > processing features enabled. http://p.sf.net/sfu/kodakcom >  > > _______________________________________________ > audacitydevel mailing list > audacitydevel@... > https://lists.sourceforge.net/lists/listinfo/audacitydevel > 
From: Sami Liedes <sliedes@cc...>  20090511 02:24:33

On Mon, May 11, 2009 at 04:50:22AM +0300, Sami Liedes wrote: > #3 Move two exp()s out of DrawClipSpectrum() inner loop by math > trickery.  commit 5ad6daabf17b54ba2874a281e8b5f9fd59bf76e7 Author: Sami Liedes <sliedes@...> Date: Mon May 11 01:40:30 2009 +0300 Move two exp()s out of DrawClipSpectrum() inner loop. diff git a/src/TrackArtist.cpp b/src/TrackArtist.cpp index 2257402..8dd07d0 100644  a/src/TrackArtist.cpp +++ b/src/TrackArtist.cpp @@ 1988,17 +1988,21 @@ void TrackArtist::DrawClipSpectrum(WaveTrack* track, WaveClip *clip, bool inMaximum = false; #endif //EXPERIMENTAL_FIND_NOTES + double yy2_base=exp(lmin)/f; + float yy2 = yy2_base; + double exp_scale_per_height = exp(scale/mid.height); for (int yy = 0; yy < mid.height; yy++) { if(!usePxCache) { float h=float(yy)/mid.height;  float yy2=expf(h*scale+lmin)/f; + //float yy2=expf(h*scale+lmin)/f; if (int(yy2)>=half) yy2=half1; if (yy2<0) yy2=0; float bin0 = float(yy2); float h1=float(yy+1)/mid.height;  float yy3=expf(h1*scale+lmin)/f; + yy2_base *= exp_scale_per_height; + float yy3=yy2_base; if (int(yy3)>=half) yy3=half1; if (yy3<0) @@ 2044,6 +2048,7 @@ void TrackArtist::DrawClipSpectrum(WaveTrack* track, WaveClip *clip, if (value < 0.0) value = float(0.0); clip>mSpecPxCache>values[x * mid.height + yy] = value; + yy2 = yy2_base; } else value = clip>mSpecPxCache>values[x * mid.height + yy];  
From: Sami Liedes <sliedes@cc...>  20090511 02:21:07

On Mon, May 11, 2009 at 04:50:22AM +0300, Sami Liedes wrote: > #2 Use float versions of various math functions where possible. > Also use pow10f() instead of pow(10.0, ...) if available. Here's the patch. Note that pow10f() is a GNU extension, I don't know if it's available on Windows, so I made autoconf check for it. Hope I understood the autoconf stuff correctly, I haven't much experience with it. Sami  commit 60f50d38e2d605103d281569df6a039904e70ae5 Author: Sami Liedes <sliedes@...> Date: Mon May 11 01:31:52 2009 +0300 Use float versions of math routines where available. diff git a/configure.in b/configure.in index eb1b19b..793b081 100644  a/configure.in +++ b/configure.in @@ 667,6 +667,7 @@ dnl Check for lrint/lrintf AC_C99_FUNC_LRINT AC_C99_FUNC_LRINTF +AC_CHECK_LIB(m, pow10f, AC_DEFINE(HAVE_POW10F)) if [[ "$use_ladspa" = "yes" ]] ; then OPTOBJS="$OPTOBJS effects/ladspa/LoadLadspa.o" diff git a/src/Spectrum.cpp b/src/Spectrum.cpp index ea7677e..1e437e1 100644  a/src/Spectrum.cpp +++ b/src/Spectrum.cpp @@ 27,7 +27,7 @@ static inline float todB_a(const float *x){ #else static inline float todB_a(const float *x){  return (*(x)==0?400.f:log(*(x)**(x))*4.34294480f); + return (*(x)==0?400.f:logf(*(x)**(x))*4.34294480f); } #endif @@ 75,7 +75,7 @@ bool ComputeSpectrum(float * data, int width, // of the power, instead of the square root for (i = 0; i < windowSize; i++)  in[i] = pow(in[i], 1.0f / 3.0f); + in[i] = powf(in[i], 1.0f / 3.0f); // Take FFT FFT(windowSize, false, in, NULL, out, out2); @@ 128,7 +128,7 @@ bool ComputeSpectrum(float * data, int width, for (i = 0; i < half; i++){ float temp=(processed[i] / windowSize / windows); if (temp > 0.0)  processed[i] = 10*log10(temp); + processed[i] = 10*log10f(temp); else processed[i] = 0; } diff git a/src/TrackArtist.cpp b/src/TrackArtist.cpp index edd319c..2257402 100644  a/src/TrackArtist.cpp +++ b/src/TrackArtist.cpp @@ 1812,14 +1812,14 @@ void TrackArtist::DrawClipSpectrum(WaveTrack* track, WaveClip *clip, sampleCount w1 = (sampleCount) ((t0*rate + x *rate *tstep) + .5); const float // e=exp(1.0f),  log2=log(2.0f), +// e=expf(1.0f), + log2=logf(2.0f), f=rate/2.0f/half,  lmin=log(float(minFreq)),  lmax=log(float(maxFreq)), + lmin=logf(float(minFreq)), + lmax=logf(float(maxFreq)), #ifdef EXPERIMENTAL_FFT_SKIP_POINTS  lmins=log(float(minFreq)/(fftSkipPoints+1)),  lmaxs=log(float(maxFreq)/(fftSkipPoints+1)), + lmins=logf(float(minFreq)/(fftSkipPoints+1)), + lmaxs=logf(float(maxFreq)/(fftSkipPoints+1)), #else //!EXPERIMENTAL_FFT_SKIP_POINTS lmins=lmin, lmaxs=lmax, @@ 1835,10 +1835,10 @@ void TrackArtist::DrawClipSpectrum(WaveTrack* track, WaveClip *clip, for (int y = 0; y < mid.height; y++) { float n =(float(y )/mid.height*scale2lmin2)*12; float n2=(float(y+1)/mid.height*scale2lmin2)*12;  float f =float(minFreq)/(fftSkipPoints+1)*pow(2.0f, n /12.0f+lmin2);  float f2=float(minFreq)/(fftSkipPoints+1)*pow(2.0f, n2/12.0f+lmin2);  n =log(f /440)/log2*12;  n2=log(f2/440)/log2*12; + float f =float(minFreq)/(fftSkipPoints+1)*powf(2.0f, n /12.0f+lmin2); + float f2=float(minFreq)/(fftSkipPoints+1)*powf(2.0f, n2/12.0f+lmin2); + n =logf(f /440)/log2*12; + n2=logf(f2/440)/log2*12; if (floor(n) < floor(n2)) yGrid[y]=true; else @@ 1856,9 +1856,9 @@ void TrackArtist::DrawClipSpectrum(WaveTrack* track, WaveClip *clip, f2bin = half/(rate/2.0f), #endif //EXPERIMENTAL_FFT_SKIP_POINTS bin2f = 1.0f/f2bin,  minDistance = pow(2.0f, 2.0f/12.0f),  i0=exp(lmin)/f,  i1=exp(scale+lmin)/f, + minDistance = powf(2.0f, 2.0f/12.0f), + i0=expf(lmin)/f, + i1=expf(scale+lmin)/f, minColor=0.0f; const int maxTableSize=1024; int *indexes=new int[maxTableSize]; @@ 1967,19 +1967,19 @@ void TrackArtist::DrawClipSpectrum(WaveTrack* track, WaveClip *clip, } // The f2pix helper macro converts a frequency into a pixel coordinate. #define f2pix(f) (log(f)lmins)/(lmaxslmins)*mid.height +#define f2pix(f) (logf(f)lmins)/(lmaxslmins)*mid.height // Possibly quantize the maxima frequencies and create the pixel block limits. for (int i=0; i < maximas; i++) { int index=maxima[i]; float f = float(index)*bin2f; if (findNotesQuantize)  { f = exp(int(log(f/440)/log2*120.5)/12.0f*log2)*440; + { f = expf(int(logf(f/440)/log2*120.5)/12.0f*log2)*440; maxima[i] = f*f2bin; }  float f0 = exp((log(f/440)/log2*241)/24.0f*log2)*440; + float f0 = expf((logf(f/440)/log2*241)/24.0f*log2)*440; maxima0[i] = f2pix(f0);  float f1 = exp((log(f/440)/log2*24+1)/24.0f*log2)*440; + float f1 = expf((logf(f/440)/log2*24+1)/24.0f*log2)*440; maxima1[i] = f2pix(f1); } } @@ 1991,14 +1991,14 @@ void TrackArtist::DrawClipSpectrum(WaveTrack* track, WaveClip *clip, for (int yy = 0; yy < mid.height; yy++) { if(!usePxCache) { float h=float(yy)/mid.height;  float yy2=exp(h*scale+lmin)/f; + float yy2=expf(h*scale+lmin)/f; if (int(yy2)>=half) yy2=half1; if (yy2<0) yy2=0; float bin0 = float(yy2); float h1=float(yy+1)/mid.height;  float yy3=exp(h1*scale+lmin)/f; + float yy3=expf(h1*scale+lmin)/f; if (int(yy3)>=half) yy3=half1; if (yy3<0) diff git a/src/configtemplate.h b/src/configtemplate.h index 1c439f2..12ab55a 100644  a/src/configtemplate.h +++ b/src/configtemplate.h @@ 22,6 +22,9 @@ /* Define if you have C99's lrintf function. */ #undef HAVE_LRINTF +/* Define if you have pow10f function. */ +#undef HAVE_POW10F + /* Define to 1 if you have the <memory.h> header file. */ #undef HAVE_MEMORY_H diff git a/src/effects/NoiseRemoval.cpp b/src/effects/NoiseRemoval.cpp index 788ee9b..87c9921 100644  a/src/effects/NoiseRemoval.cpp +++ b/src/effects/NoiseRemoval.cpp @@ 469,7 +469,7 @@ void EffectNoiseRemoval::FillFirstHistoryWindow() double power = mRealFFTs[0][i] * mRealFFTs[0][i] + mImagFFTs[0][i] * mImagFFTs[0][i];  mSpectrums[0][i] = (float)(10.0 * log10(power)); + mSpectrums[0][i] = (float)(10.0 * log10f(power)); mGains[0][i] = mNoiseGain; } } @@ 584,7 +584,11 @@ void EffectNoiseRemoval::RemoveNoise() // Apply gain to FFT for (j = 0; j < mSpectrumSize; j++) {  float mult = pow(10.0, mGains[out][j] / 20.0); +#ifdef HAVE_POW10F + float mult = pow10f(mGains[out][j] / 20.0); +#else + float mult = powf(10.0, mGains[out][j] / 20.0); +#endif mRealFFTs[out][j] *= mult; mImagFFTs[out][j] *= mult;  
From: Sami Liedes <sliedes@cc...>  20090511 02:19:30

Hi, I did some profiling of Audacity, as I got bored to the stuff I do taking so long. After that I worked to improve the code to be faster. Since there are a number of patches to separate areas of Audacity with different optimizations, I will split the submission to several mails. I also expect some of them to be possibly less controversial (e.g. using float versions of math routines when the result is assigned to a float) than others (parallelization). So perhaps it's better to divide the improvements to separate mails. Note that my improvements only affect the areas of audacity that I encountered in my very limited activities (mostly just normalization, noise removal and spectrum view). I'm sure there's a lot of work to do in other areas too (I recommend profiling first). In some cases significant speedups were achievable with rather trivial changes. The specific changes of the patches are as follows: #1 Replace two calls to rand() in Dither.cpp with one. Profiling indicated that the rand() calls took some 30% of CPU when removing noise when the sample format was not float. (I only realized that I could speed up noise removal a lot by using float format after profiling.) #2 Use float versions of various math functions where possible. Also use pow10f() instead of pow(10.0, ...) if available. #3 Move two exp()s out of DrawClipSpectrum() inner loop by math trickery. #4 Optimize AColor::GetColorGradient() as a lookup table, as it was the bottleneck after #2 and #3. #5 Use the fftw library for FFT. This change alone speeds up noise removal of a five hour 44.1 kHz mono recording by a factor of 1.88. It probably speeds up lots of other stuff too, I assume FFT is used a lot by effects. #6 Parallelize various loops using OpenMP, which AFAIK is supported both by GCC and VC++ (but I haven't tried it in Windows land so it may need some help to work there). This results in some improvement in run time for noise removal on my Core 2 Quad processor (19% on top of the other patches) and normalization. The area that most benefits, however, is spectrogram view of the waveform, which now does FFT in parallel (noise removal does not, yet  it was more tricky  and that's why the improvement is measly 19%). Together with the other patches, this causes the spectrogram display to actually feel quite fast even with the most narrowband 4096 point FFT and Hanning window maximized on a 1600x1200 display. Together the patches result in a speedup by a factor of 2.09 in noise removal of 32bit float data (6:21 or 381 seconds instead of 13:17 or 797 seconds). Some of the other speedups are noticeable (like the spectrogram view), but I didn't or couldn't measure them precisely. Sami 
From: Sami Liedes <sliedes@cc...>  20090511 02:17:00

On Mon, May 11, 2009 at 04:50:22AM +0300, Sami Liedes wrote: > #6 Parallelize various loops using OpenMP, which AFAIK is supported > both by GCC and VC++ (but I haven't tried it in Windows land so it > may need some help to work there). This results in some > improvement in run time for noise removal on my Core 2 Quad > processor (19% on top of the other patches) and normalization. The > area that most benefits, however, is spectrogram view of the > waveform, which now does FFT in parallel (noise removal does not, > yet  it was more tricky  and that's why the improvement is > measly 19%). Together with the other patches, this causes the > spectrogram display to actually feel quite fast even with the > most narrowband 4096 point FFT and Hanning window maximized on a > 1600x1200 display. This patch also adds an autoconf check. I don't know what it does if you don't have OpenMP support, so it may need tweaking. Note that using threads and calling libc functions from several threads has the result that the code has to be compiled with D_REENTRANT (to tell libc that its functions need to be reentrant) and both compiled and linked with fopenmp (at least under GCC). I don't know if my autoconf changes do the thing, I had trouble with getting autoconf to acknowledge my compile options without giving them directly to configure. Also I forgot to mention from the fftw change that you need to link in lfftw3f (that's the float version of fftw 3.x.)... Sami  commit c8d6d220fb5a25e6f9eff1119dab1dbe4b9df7cf Author: Sami Liedes <sliedes@...> Date: Mon May 11 01:48:54 2009 +0300 Parallelize with OpenMP. diff git a/configure.in b/configure.in index 793b081..3575e35 100644  a/configure.in +++ b/configure.in @@ 88,6 +88,9 @@ dnl List of libraries in libsrc that need to be compiled before building dnl audacity. Each name in the list should be a _target_ in libsrc/Makefile AC_SUBST(LIBSRC_BUILD) +AC_OPENMP +AC_DEFINE(_REENTRANT,1) + dnl allow the user to specify options to configure that change the dnl name "audacity" anywhere it appears in a pathname. This allows dnl multiple versions of Audacity to be installed concurrently @@ 178,6 +181,10 @@ if test x"$debug_preference" = "xyes" ; then CXXFLAGS="${CXXFLAGS} g " fi +CFLAGS="${CFLAGS} ${OPENMP_CFLAGS}" +CPPFLAGS="${CPPFLAGS} ${OPENMP_CFLAGS}" +CXXFLAGS="${CXXFLAGS} ${OPENMP_CXXFLAGS}" + dnl Checks for typedefs, structures, and compiler characteristics. AC_C_CONST AC_TYPE_SIZE_T @@ 280,7 +287,8 @@ dnl Include libwidgetextra (via pkgconfig). Doing this gets it configured as dnl well, because it has to be configured in order to be found AUDACITY_CHECKLIB_WIDGETEXTRA dnl pull in flags for the library BUILD_LDFLAGS="$BUILD_LDFLAGS $WIDGETEXTRA_LIBS" +dnl OPENMP_CFLAGS (sic) also need to be added to LDFLAGS +BUILD_LDFLAGS="$BUILD_LDFLAGS $WIDGETEXTRA_LIBS $OPENMP_CFLAGS" CXXFLAGS="$CXXFLAGS $WIDGETEXTRA_CFLAGS" dnl Add the library to the list of libraries that need to be compiled LIBSRC_BUILD="$LIBSRC_BUILD widgetextra" diff git a/src/BlockFile.cpp b/src/BlockFile.cpp index 98de501..87766a8 100644  a/src/BlockFile.cpp +++ b/src/BlockFile.cpp @@ 220,6 +220,9 @@ void *BlockFile::CalcSummary(samplePtr buffer, sampleCount len, // Recalc 256 summaries sumLen = (len + 255) / 256; +#pragma omp parallel for default(none) \ + shared(sumLen,fbuffer,len,summary256) \ + private(min,max,sumsq,jcount,i,j) for (i = 0; i < sumLen; i++) { min = fbuffer[i * 256]; max = fbuffer[i * 256]; diff git a/src/Dither.cpp b/src/Dither.cpp index fe4e66c..608f2cc 100644  a/src/Dither.cpp +++ b/src/Dither.cpp @@ 192,11 +192,13 @@ void Dither::Apply(enum DitherType ditherType, // Only float samples can contain a value that // must be clipped. float* p = (float*)dest;  for (i = 0; i < len; i++, p++)  if (*p > 1.0)  *p = 1.0;  else if (*p < 1.0)  *p = 1.0; +#pragma omp parallel for \ + default(none) shared(p,len) private(i) + for (i = 0; i < len; i++) + if (p[i] > 1.0) + p[i] = 1.0; + else if (p[i] < 1.0) + p[i] = 1.0; } } else if (destFormat == floatSample) diff git a/src/Spectrum.cpp b/src/Spectrum.cpp index 1e437e1..b4b2c9e 100644  a/src/Spectrum.cpp +++ b/src/Spectrum.cpp @@ 49,7 +49,7 @@ bool ComputeSpectrum(float * data, int width, int i; for (i = 0; i < windowSize; i++) processed[i] = float(0.0);  int half = windowSize / 2; + const int half = windowSize / 2; float *in = new float[windowSize]; float *out = new float[windowSize]; @@ 125,6 +125,8 @@ bool ComputeSpectrum(float * data, int width, } else { // Convert to decibels // But do it safely; Inf is nobody's friend +#pragma omp parallel for default(none) \ + private(i) shared(processed, windowSize, windows) for (i = 0; i < half; i++){ float temp=(processed[i] / windowSize / windows); if (temp > 0.0) diff git a/src/WaveClip.cpp b/src/WaveClip.cpp index ce66504..475b095 100644  a/src/WaveClip.cpp +++ b/src/WaveClip.cpp @@ 646,12 +646,12 @@ bool WaveClip::GetSpectrogram(float *freq, sampleCount *where, bool autocorrelation) { int windowType;  int windowSize = gPrefs>Read(wxT("/Spectrum/FFTSize"), 256); + const int windowSize = gPrefs>Read(wxT("/Spectrum/FFTSize"), 256); #ifdef EXPERIMENTAL_FFT_SKIP_POINTS int fftSkipPoints = gPrefs>Read(wxT("/Spectrum/FFTSkipPoints"), 0L); int fftSkipPoints1 = fftSkipPoints+1; #endif //EXPERIMENTAL_FFT_SKIP_POINTS  int half = windowSize/2; + const int half = windowSize/2; gPrefs>Read(wxT("/Spectrum/WindowType"), &windowType, 3); if (mSpecCache && @@ 724,71 +724,75 @@ bool WaveClip::GetSpectrogram(float *freq, sampleCount *where, #ifdef EXPERIMENTAL_FFT_SKIP_POINTS float *buffer = new float[windowSize*fftSkipPoints1]; +#error fixme mSpecCache>fftSkipPointsOld = fftSkipPoints; #else //!EXPERIMENTAL_FFT_SKIP_POINTS  float *buffer = new float[windowSize]; #endif //EXPERIMENTAL_FFT_SKIP_POINTS mSpecCache>windowTypeOld = windowType; mSpecCache>windowSizeOld = windowSize;  for (x = 0; x < mSpecCache>len; x++)  if (recalc[x]) { +#pragma omp parallel default(none) \ + shared(recalc, windowType, autocorrelation) + { + float *buffer = new float[windowSize]; +#pragma omp for + for (x = 0; x < mSpecCache>len; x++) + if (recalc[x]) { sampleCount start = mSpecCache>where[x]; sampleCount len = windowSize; sampleCount i; if (start <= 0  start >= mSequence>GetNumSamples()) {   for (i = 0; i < (sampleCount)half; i++)  mSpecCache>freq[half * x + i] = 0;   }  else  {  float *adj = buffer;  start = windowSize >> 1;   if (start < 0) {  for (i = start; i < 0; i++)  *adj++ = 0;  len += start;  start = 0;  } + for (i = 0; i < (sampleCount)half; i++) + mSpecCache>freq[half * x + i] = 0; + } else { + float *adj = buffer; + start = windowSize >> 1; + + if (start < 0) { + for (i = start; i < 0; i++) + *adj++ = 0; + len += start; + start = 0; + } #ifdef EXPERIMENTAL_FFT_SKIP_POINTS  if (start + len*fftSkipPoints1 > mSequence>GetNumSamples()) {  len = (mSequence>GetNumSamples()  start)/fftSkipPoints1;  for (i = len*fftSkipPoints1; i < (sampleCount)windowSize; i++) + if (start + len*fftSkipPoints1 > mSequence>GetNumSamples()) { + len = (mSequence>GetNumSamples()  start)/fftSkipPoints1; + for (i = len*fftSkipPoints1; i < (sampleCount)windowSize; i++) #else //!EXPERIMENTAL_FFT_SKIP_POINTS  if (start + len > mSequence>GetNumSamples()) {  len = mSequence>GetNumSamples()  start;  for (i = len; i < (sampleCount)windowSize; i++) + if (start + len > mSequence>GetNumSamples()) { + len = mSequence>GetNumSamples()  start; + for (i = len; i < (sampleCount)windowSize; i++) #endif //EXPERIMENTAL_FFT_SKIP_POINTS  adj[i] = 0;  } + adj[i] = 0; + } +#if 0 + } +#endif  if (len > 0) + if (len > 0) #ifdef EXPERIMENTAL_FFT_SKIP_POINTS  mSequence>Get((samplePtr)adj, floatSample, start, len*fftSkipPoints1);  if (fftSkipPoints) {  // TODO: (maybe) alternatively change Get to include skipping of points  int j=0;  for (int i=0; i < len; i++) {  adj[i]=adj[j];  j+=fftSkipPoints1;  }  } + mSequence>Get((samplePtr)adj, floatSample, start, len*fftSkipPoints1); + if (fftSkipPoints) { + // TODO: (maybe) alternatively change Get to include skipping of points + int j=0; + for (int i=0; i < len; i++) { + adj[i]=adj[j]; + j+=fftSkipPoints1; + } + } #else //!EXPERIMENTAL_FFT_SKIP_POINTS  mSequence>Get((samplePtr)adj, floatSample, start, len); + mSequence>Get((samplePtr)adj, floatSample, start, len); #endif //EXPERIMENTAL_FFT_SKIP_POINTS + ComputeSpectrum(buffer, windowSize, windowSize, + mRate, &mSpecCache>freq[half * x], + autocorrelation, windowType); + } + } + delete[]buffer; + }  ComputeSpectrum(buffer, windowSize, windowSize,  mRate, &mSpecCache>freq[half * x],  autocorrelation, windowType);  }  }   delete[]buffer; delete[]recalc; delete oldCache; diff git a/src/blockfile/ODPCMAliasBlockFile.cpp b/src/blockfile/ODPCMAliasBlockFile.cpp index d6c6835..6d6df52 100644  a/src/blockfile/ODPCMAliasBlockFile.cpp +++ b/src/blockfile/ODPCMAliasBlockFile.cpp @@ 512,6 +512,9 @@ void *ODPCMAliasBlockFile::CalcSummary(samplePtr buffer, sampleCount len, sumLen = (len + 255) / 256; +#pragma omp parallel for default(none) \ + shared(sumLen,fbuffer,len,summary256) \ + private(min,max,sumsq,jcount,i,j) for (i = 0; i < sumLen; i++) { min = fbuffer[i * 256]; max = fbuffer[i * 256]; diff git a/src/effects/NoiseRemoval.cpp b/src/effects/NoiseRemoval.cpp index 87c9921..ec2c6d7 100644  a/src/effects/NoiseRemoval.cpp +++ b/src/effects/NoiseRemoval.cpp @@ 326,6 +326,8 @@ void EffectNoiseRemoval::ApplyFreqSmoothing(float *spec) float *tmp = new float[mSpectrumSize]; int i, j, j0, j1; +#pragma omp parallel for default(none) \ + private(i,j,j0,j1) shared(tmp,spec) for(i = 0; i < mSpectrumSize; i++) { j0 = wxMax(0, i  mFreqSmoothingBins); j1 = wxMin(mSpectrumSize1, i + mFreqSmoothingBins); @@ 465,6 +467,8 @@ void EffectNoiseRemoval::FillFirstHistoryWindow() FFT(mWindowSize, false, mInWaveBuffer, NULL, mRealFFTs[0], mImagFFTs[0]); +#pragma omp parallel for default(none) \ + private(i) for(i = 0; i < mSpectrumSize; i++) { double power = mRealFFTs[0][i] * mRealFFTs[0][i] + @@ 524,10 +528,12 @@ void EffectNoiseRemoval::GetProfile() // level achieved at that frequency for a minimum of // mMinSignalBlocks blocks in a row  the max of a min.  int start = mHistoryLen  mMinSignalBlocks;  int finish = mHistoryLen; + const int start = mHistoryLen  mMinSignalBlocks; + const int finish = mHistoryLen; int i, j; +#pragma omp parallel for default(none) \ + private(i,j) for (j = 0; j < mSpectrumSize; j++) { float min = mSpectrums[start][j]; for (i = start+1; i < finish; i++) { @@ 543,12 +549,14 @@ void EffectNoiseRemoval::GetProfile() void EffectNoiseRemoval::RemoveNoise() {  int center = mHistoryLen / 2;  int start = center  mMinSignalBlocks/2;  int finish = start + mMinSignalBlocks; + const int center = mHistoryLen / 2; + const int start = center  mMinSignalBlocks/2; + const int finish = start + mMinSignalBlocks; int i, j; // Raise the gain for elements in the center of the sliding history +#pragma omp parallel for default(none) \ + private(j,i) for (j = 0; j < mSpectrumSize; j++) { float min = mSpectrums[start][j]; for (i = start+1; i < finish; i++) { @@ 562,6 +570,8 @@ void EffectNoiseRemoval::RemoveNoise() // Decay the gain in both directions; // note that mOneBlockAttackDecay is a negative number, like 1.0 // dB of attenuation per block +#pragma omp parallel for default(none) \ + private(j,i) for (j = 0; j < mSpectrumSize; j++) { for (i = center + 1; i < mHistoryLen; i++) { if (mGains[i][j] < mGains[i  1][j] + mOneBlockAttackDecay) @@ 583,6 +593,8 @@ void EffectNoiseRemoval::RemoveNoise() ApplyFreqSmoothing(mGains[out]); // Apply gain to FFT +#pragma omp parallel for default(none) \ + private(j) shared(out) for (j = 0; j < mSpectrumSize; j++) { #ifdef HAVE_POW10F float mult = pow10f(mGains[out][j] / 20.0); diff git a/src/effects/Normalize.cpp b/src/effects/Normalize.cpp index d8434d5..bf3a521 100644  a/src/effects/Normalize.cpp +++ b/src/effects/Normalize.cpp @@ 265,15 +265,31 @@ void EffectNormalize::StartAnalysis() void EffectNormalize::AnalyzeData(float *buffer, sampleCount len) { int i; + float minimum=mMin, maximum=mMax; + double sum=0;  for(i=0; i<len; i++) {  if (buffer[i] < mMin)  mMin = buffer[i];  if (buffer[i] > mMax)  mMax = buffer[i];  mSum += (double)buffer[i]; +#pragma omp parallel default(none) private(i) \ + firstprivate(minimum,maximum) \ + shared(len,buffer) reduction(+: sum) + { +#pragma omp for nowait + for(i=0; i<len; i++) { + if (buffer[i] < minimum) + minimum = buffer[i]; + if (buffer[i] > maximum) + maximum = buffer[i]; + sum += (double)buffer[i]; + } +#pragma omp critical + { + if (minimum < mMin) + mMin = minimum; + if (maximum > mMax) + mMax = maximum; + } } + mSum += sum; mCount += len; } @@ 303,12 +319,17 @@ void EffectNormalize::StartProcessing() void EffectNormalize::ProcessData(float *buffer, sampleCount len) { int i; + double frameSum=0; +#pragma omp parallel for default(none) private(i) \ + shared(len,buffer) reduction(+:frameSum) for(i=0; i<len; i++) { float adjFrame = (buffer[i] + mOffset) * mMult; buffer[i] = adjFrame;  gFrameSum += fabs(adjFrame); //lda: validation. + frameSum += fabs(adjFrame); //lda: validation. } + + gFrameSum += frameSum; } //  Sami 
From: Sami Liedes <sliedes@cc...>  20090511 02:14:43

On Mon, May 11, 2009 at 04:50:22AM +0300, Sami Liedes wrote: > #1 Replace two calls to rand() in Dither.cpp with one. Profiling > indicated that the rand() calls took some 30% of CPU when removing > noise when the sample format was not float. (I only realized that > I could speed up noise removal a lot by using float format after > profiling.) This is a rather simple change. DITHER_NOISE is defined as follows: #define DITHER_NOISE (rand() / (float)RAND_MAX  0.5f) and in Dither::ShapedDither, there was float r = DITHER_NOISE + DITHER_NOISE; As far as I understand correctly, this can be replaced by float r = DITHER_NOISE * 2; without the maths becoming incorrect. This saves a rand() call in an inner loop. Here's the patch from my private git repo:  commit 21639cfd8920bd1399416b93082ede5a93208d8a Author: Sami Liedes <sliedes@...> Date: Mon May 11 04:30:31 2009 +0300 Replace two calls to rand() with one in Dither.cpp. diff git a/src/Dither.cpp b/src/Dither.cpp index 608f2cc..3837d50 100644  a/src/Dither.cpp +++ b/src/Dither.cpp @@ 284,7 +284,7 @@ inline float Dither::TriangleDither(float sample) inline float Dither::ShapedDither(float sample) { // Generate triangular dither, +1 LSB, flat psd  float r = DITHER_NOISE + DITHER_NOISE; + float r = DITHER_NOISE * 2; // Run FIR float xe = sample + mBuffer[mPhase] * SHAPED_BS[0]  Sami 
From: Sami Liedes <sliedes@cc...>  20090511 02:11:35

On Mon, May 11, 2009 at 04:50:22AM +0300, Sami Liedes wrote: > #5 Use the fftw library for FFT. This change alone speeds up noise > removal of a five hour 44.1 kHz mono recording by a factor of > 1.88. It probably speeds up lots of other stuff too, I assume FFT > is used a lot by effects. Note that the max_threads logic in InitFFT() is still not redundant even with the multithreaded fftw init code commented out, since FFT() is still called in a parallel way by the OpenMP patch (if you apply it). What is not used is the fftw support for computing a single FFT using multiple threads (it doesn't give big benefits until the FFT size is "at least a few thousand points"). However in some cases (especially spectrogram view) FFT() is called from a parallel context for separate blocks. The old code is still left there inside #if 0. Remove it if you wish. This patch also parallelizes the window functions... seems I didn't do good job separating the issues anyway :( Anyway one nice thing with OpenMP is that you should in most cases just be able to remove the #pragmas and have code that works perfectly in a single threaded environment, and that is the case here too. Sami  commit 909ad3f0e037e6bd4ca037dd773284d89c2d1679 Author: Sami Liedes <sliedes@...> Date: Mon May 11 01:43:49 2009 +0300 Use fftw for FFT. diff git a/src/FFT.cpp b/src/FFT.cpp index d9444b8..8893ee7 100644  a/src/FFT.cpp +++ b/src/FFT.cpp @@ 43,6 +43,11 @@ #include <stdlib.h> #include <stdio.h> #include <math.h> +#include <fftw3.h> + +#ifdef _OPENMP +#include <omp.h> +#endif #include "FFT.h" @@ 52,6 +57,7 @@ const int MaxFastBits = 16; /* Declare Static functions */ static int IsPowerOfTwo(int x); static int NumberOfBitsNeeded(int PowerOfTwo); + static int ReverseBits(int index, int NumBits); static void InitFFT(); @@ 92,6 +98,19 @@ int ReverseBits(int index, int NumBits) return rev; } +static int max_threads = 0; +static fftwf_plan *forward_plan_arr; +static int *forward_plan_size_arr; +static fftwf_plan *forward_plan_real_arr; +static int *forward_plan_real_size_arr; +static fftwf_plan *backward_plan_arr; +static int *backward_plan_size_arr; + +static int *fftwf_array_size_arr; +static fftwf_complex **fftwf_in_arr=NULL, **fftwf_out_arr=NULL; + +static int init_done = 0; + void InitFFT() { gFFTBitTable = new int *[MaxFastBits]; @@ 106,6 +125,40 @@ void InitFFT() len <<= 1; } + +#ifdef _OPENMP + max_threads = omp_get_num_procs(); +#else + max_threads = 1; +#endif + + forward_plan_arr = new fftwf_plan[max_threads]; + forward_plan_size_arr = new int[max_threads]; + memset(forward_plan_size_arr, 0, sizeof(int)*max_threads); + forward_plan_real_arr = new fftwf_plan[max_threads]; + forward_plan_real_size_arr = new int[max_threads]; + memset(forward_plan_real_size_arr, 0, sizeof(int)*max_threads); + backward_plan_arr = new fftwf_plan[max_threads]; + backward_plan_size_arr = new int[max_threads]; + memset(backward_plan_size_arr, 0, sizeof(int)*max_threads); + + fftwf_in_arr = new fftwf_complex *[max_threads]; + memset(fftwf_in_arr, 0, sizeof(fftwf_complex *)*max_threads); + fftwf_out_arr = new fftwf_complex *[max_threads]; + memset(fftwf_out_arr, 0, sizeof(fftwf_complex *)*max_threads); + + fftwf_array_size_arr = new int[max_threads]; + memset(fftwf_array_size_arr, 0, sizeof(int)*max_threads); + + init_done = 1; + + // Not worth doing this with threads for small FFTs +// fftwf_init_threads(); +// #ifdef _OPENMP +// fftwf_plan_with_nthreads(omp_get_num_procs()); +// #else +// fftwf_plan_with_nthreads(4); // a reasonable default? +// #endif } void DeinitFFT() @@ 116,6 +169,23 @@ void DeinitFFT() } delete[] gFFTBitTable; } + if (forward_plan_arr) { + int i; + for (i=0; i<max_threads; i++) { + if (forward_plan_size_arr[i]) + fftwf_destroy_plan(forward_plan_arr[i]); + if (forward_plan_real_size_arr[i]) + fftwf_destroy_plan(forward_plan_real_arr[i]); + if (backward_plan_size_arr[i]) + fftwf_destroy_plan(backward_plan_arr[i]); + } + delete [] forward_plan_arr; + delete [] forward_plan_size_arr; + delete [] forward_plan_real_arr; + delete [] forward_plan_real_size_arr; + delete [] backward_plan_arr; + delete [] backward_plan_size_arr; + } } inline int FastReverseBits(int i, int NumBits) @@ 126,6 +196,7 @@ inline int FastReverseBits(int i, int NumBits) return ReverseBits(i, NumBits); } + /* * Complex Fast Fourier Transform */ @@ 134,6 +205,7 @@ void FFT(int NumSamples, bool InverseTransform, float *RealIn, float *ImagIn, float *RealOut, float *ImagOut) { +#if 0 int NumBits; /* Number of bits needed to store indices */ int i, j, k, n; int BlockSize, BlockEnd; @@ 223,6 +295,143 @@ void FFT(int NumSamples, ImagOut[i] /= denom; } } + +#else + +#ifdef _OPENMP + const int threadNum = omp_get_thread_num(); + wxASSERT(threadNum < max_threads); +#else + const int threadNum = 1; +#endif + +#pragma omp critical + if (!init_done) + InitFFT(); + + fftwf_plan &forward_plan = forward_plan_arr[threadNum]; + int &forward_plan_size = forward_plan_size_arr[threadNum]; + fftwf_plan &forward_plan_real = forward_plan_real_arr[threadNum]; + int &forward_plan_real_size = forward_plan_real_size_arr[threadNum]; + fftwf_plan &backward_plan = backward_plan_arr[threadNum]; + int &backward_plan_size = backward_plan_size_arr[threadNum]; + int &fftwf_array_size = fftwf_array_size_arr[threadNum]; + fftwf_complex * &fftwf_in = fftwf_in_arr[threadNum]; + fftwf_complex * &fftwf_out = fftwf_out_arr[threadNum]; + + // We could allocate a bit less for real transform, but it would be a small + // saving of memory and no real saving of time since the array is only + // allocated once and its size is only ever increased. + if (fftwf_array_size < NumSamples) { +#pragma omp critical + { + // everything except fftw_execute is nonreentrant + if (fftwf_array_size) { + fftwf_free(fftwf_in); + fftwf_free(fftwf_out); + } + fftwf_array_size = NumSamples; + fftwf_in = (fftwf_complex *)fftwf_malloc(sizeof(fftwf_complex) * NumSamples); + fftwf_out = (fftwf_complex *)fftwf_malloc(sizeof(fftwf_complex) * NumSamples); + } + } + + // We cache plans for one size; this is no big waste since fftw internally + // caches plans and returns quickly if it's seen the same transform before. + if (!InverseTransform) { + if (ImagIn) { + if (forward_plan_size != NumSamples) { +#pragma omp critical + { + if (forward_plan_size != 0) + fftwf_destroy_plan(forward_plan); + forward_plan = + fftwf_plan_dft_1d(NumSamples, fftwf_in, + fftwf_out, + FFTW_FORWARD, FFTW_MEASURE); + } + forward_plan_size = NumSamples; + } + } else + // transforming real data + if (forward_plan_real_size != NumSamples) { +#pragma omp critical + { + if (forward_plan_real_size != 0) + fftwf_destroy_plan(forward_plan_real); + // We don't use RealIn as input since it's not necessarily + // correctly aligned for SIMD. + forward_plan_real = + fftwf_plan_dft_r2c_1d(NumSamples, + (float *)fftwf_in, + fftwf_out, + FFTW_MEASURE); + } + forward_plan_real_size = NumSamples; + } + } else { + if (backward_plan_size != NumSamples) { +#pragma omp critical + { + if (backward_plan_size != 0) + fftwf_destroy_plan(backward_plan); + backward_plan = + fftwf_plan_dft_1d(NumSamples, fftwf_in, + fftwf_out, + FFTW_BACKWARD, FFTW_MEASURE); + } + backward_plan_size = NumSamples; + } + } + + int i; + + // Copy data to fftwf input array. + if (InverseTransform  ImagIn) + for (i=0; i<NumSamples; i++) { + fftwf_in[i][0] = RealIn[i]; + if (ImagIn) + fftwf_in[i][1] = ImagIn[i]; + else + fftwf_in[i][1] = 0; + } + else { + float *in_arr = (float *)fftwf_in; + memcpy(in_arr, RealIn, sizeof(float)*NumSamples); + } + + if (!InverseTransform) { + if (ImagIn) { + fftwf_execute(forward_plan); + for (i=0; i<NumSamples; i++) { + RealOut[i] = fftwf_out[i][0]; + ImagOut[i] = fftwf_out[i][1]; + } + } else { + fftwf_execute(forward_plan_real); + RealOut[0] = fftwf_out[0][0]; + ImagOut[0] = fftwf_out[0][1]; + for (i=1; i<NumSamples/2+1; i++) { + RealOut[i] = fftwf_out[i][0]; + RealOut[NumSamplesi] = fftwf_out[i][0]; + ImagOut[i] = fftwf_out[i][1]; + ImagOut[NumSamplesi] = fftwf_out[i][1]; + } + } + } else { + fftwf_execute(backward_plan); + + /* + ** Need to normalize... + */ + float denom = (float) NumSamples; + + for (i = 0; i < NumSamples; i++) { + RealOut[i] = fftwf_out[i][0] / denom; + ImagOut[i] = fftwf_out[i][1] / denom; + } + } +#endif } /* @@ 415,6 +624,8 @@ void WindowFunc(int whichFunction, int NumSamples, float *in) if (whichFunction == 1) { // Bartlett (triangular) window +#pragma omp parallel for default(none) \ + private(i) shared(in, NumSamples) for (i = 0; i < NumSamples / 2; i++) { in[i] *= (i / (float) (NumSamples / 2)); in[i + (NumSamples / 2)] *= @@ 424,32 +635,42 @@ void WindowFunc(int whichFunction, int NumSamples, float *in) if (whichFunction == 2) { // Hamming +#pragma omp parallel for default(none) \ + private(i) shared(in, NumSamples) for (i = 0; i < NumSamples; i++)  in[i] *= 0.54  0.46 * cos(2 * M_PI * i / (NumSamples  1)); + in[i] *= 0.54  0.46 * cosf(2 * M_PI * i / (NumSamples  1)); } if (whichFunction == 3) { // Hanning +#pragma omp parallel for default(none) \ + private(i) shared(in, NumSamples) for (i = 0; i < NumSamples; i++)  in[i] *= 0.50  0.50 * cos(2 * M_PI * i / (NumSamples  1)); + in[i] *= 0.50  0.50 * cosf(2 * M_PI * i / (NumSamples  1)); } if (whichFunction == 4) { // Blackman +#pragma omp parallel for default(none) \ + private(i) shared(in, NumSamples) for (i = 0; i < NumSamples; i++) {  in[i] *= 0.42  0.5 * cos (2 * M_PI * i / (NumSamples  1)) + 0.08 * cos (4 * M_PI * i / (NumSamples  1)); + in[i] *= 0.42  0.5 * cosf(2 * M_PI * i / (NumSamples  1)) + 0.08 * cosf(4 * M_PI * i / (NumSamples  1)); } } if (whichFunction == 5) { // BlackmanHarris +#pragma omp parallel for default(none) \ + private(i) shared(in, NumSamples) for (i = 0; i < NumSamples; i++) {  in[i] *= 0.35875  0.48829 * cos(2 * M_PI * i /(NumSamples1)) + 0.14128 * cos(4 * M_PI * i/(NumSamples1))  0.01168 * cos(6 * M_PI * i/(NumSamples1)); + in[i] *= 0.35875  0.48829 * cosf(2 * M_PI * i /(NumSamples1)) + 0.14128 * cosf(4 * M_PI * i/(NumSamples1))  0.01168 * cosf(6 * M_PI * i/(NumSamples1)); } } if (whichFunction == 6) { // Welch +#pragma omp parallel for default(none) \ + private(i) shared(in, NumSamples) for (i = 0; i < NumSamples; i++) { in[i] *= 4*i/(float)NumSamples*(1(i/(float)NumSamples)); } @@ 460,11 +681,13 @@ void WindowFunc(int whichFunction, int NumSamples, float *in) // Precalculate some values, and simplify the fmla to try and reduce overhead A=2*2.5*2.5; +#pragma omp parallel for default(none) \ + private(i) shared(in, NumSamples, A) for (i = 0; i < NumSamples; i++) { // full // in[i] *= exp(0.5*(A*((iNumSamples/2)/NumSamples/2))*(A*((iNumSamples/2)/NumSamples/2))); // reduced  in[i] *= exp(A*(0.25 + ((i/(float)NumSamples)*(i/(float)NumSamples))  (i/(float)NumSamples))); + in[i] *= expf(A*(0.25 + ((i/(float)NumSamples)*(i/(float)NumSamples))  (i/(float)NumSamples))); } } @@ 472,9 +695,11 @@ void WindowFunc(int whichFunction, int NumSamples, float *in) // Gaussian (a=3.5) A=2*3.5*3.5; +#pragma omp parallel for default(none) \ + private(i) shared(in, NumSamples, A) for (i = 0; i < NumSamples; i++) { // reduced  in[i] *= exp(A*(0.25 + ((i/(float)NumSamples)*(i/(float)NumSamples))  (i/(float)NumSamples))); + in[i] *= expf(A*(0.25 + ((i/(float)NumSamples)*(i/(float)NumSamples))  (i/(float)NumSamples))); } } @@ 482,9 +707,11 @@ void WindowFunc(int whichFunction, int NumSamples, float *in) // Gaussian (a=4.5) A=2*4.5*4.5; +#pragma omp parallel for default(none) \ + private(i) shared(in, NumSamples, A) for (i = 0; i < NumSamples; i++) { // reduced  in[i] *= exp(A*(0.25 + ((i/(float)NumSamples)*(i/(float)NumSamples))  (i/(float)NumSamples))); + in[i] *= expf(A*(0.25 + ((i/(float)NumSamples)*(i/(float)NumSamples))  (i/(float)NumSamples))); } }  
From: Sami Liedes <sliedes@cc...>  20090511 02:03:15

On Mon, May 11, 2009 at 04:50:22AM +0300, Sami Liedes wrote: > #4 Optimize AColor::GetColorGradient() as a lookup table, as it was > the bottleneck after #2 and #3. The #pragma omp critical can be removed if you decide to not use OpenMP (it should be in the OpenMP patch anyway but seems I put it here by mistake). Sami  commit abafee9ce5281bf024bb459db662e3e83de56ea2 Author: Sami Liedes <sliedes@...> Date: Fri May 8 18:28:27 2009 +0300 Optimize AColor::GetColorGradient() as a lookup table. diff git a/src/AColor.cpp b/src/AColor.cpp index f6d974b..9d70d5c 100644  a/src/AColor.cpp +++ b/src/AColor.cpp @@ 490,46 +490,59 @@ void AColor::DarkMIDIChannel(wxDC * dc, int channel /* 1  16 */ ) } void GetColorGradient(float value,  bool selected,  bool grayscale,  unsigned char *red,  unsigned char *green, unsigned char *blue) {  float r, g, b;   if (grayscale) {  r = g = b = 0.84  0.84 * value;  } else {  const int gsteps = 4;  float gradient[gsteps + 1][3] = {  {float(0.75), float(0.75), float(0.75)}, // lt gray  {float(0.30), float(0.60), float(1.00)}, // lt blue  {float(0.90), float(0.10), float(0.90)}, // violet  {float(1.00), float(0.00), float(0.00)}, // red  {float(1.00), float(1.00), float(1.00)} // white  };   int left = int (value * gsteps);  int right = (left == gsteps ? gsteps : left + 1);   float rweight = (value * gsteps)  left;  float lweight = 1.0  rweight;   r = (gradient[left][0] * lweight) + (gradient[right][0] * rweight);  g = (gradient[left][1] * lweight) + (gradient[right][1] * rweight);  b = (gradient[left][2] * lweight) + (gradient[right][2] * rweight); +bool AColor::gradient_inited = 0; + +unsigned char AColor::gradient_pre[2][2][gradientSteps][3]; + +void AColor::PreComputeGradient() { +#pragma omp critical + { + if (!gradient_inited) { + gradient_inited = 1; + + for (int selected = 0; selected <= 1; selected++) + for (int grayscale = 0; grayscale <= 1; grayscale++) { + float r, g, b; + + int i; + for (i=0; i<gradientSteps; i++) { + float value = float(i)/gradientSteps; + + if (grayscale) { + r = g = b = 0.84  0.84 * value; + } else { + const int gsteps = 4; + float gradient[gsteps + 1][3] = { + {float(0.75), float(0.75), float(0.75)}, // lt gray + {float(0.30), float(0.60), float(1.00)}, // lt blue + {float(0.90), float(0.10), float(0.90)}, // violet + {float(1.00), float(0.00), float(0.00)}, // red + {float(1.00), float(1.00), float(1.00)} // white + }; + + int left = int (value * gsteps); + int right = (left == gsteps ? gsteps : left + 1); + + float rweight = (value * gsteps)  left; + float lweight = 1.0  rweight; + + r = (gradient[left][0] * lweight) + (gradient[right][0] * rweight); + g = (gradient[left][1] * lweight) + (gradient[right][1] * rweight); + b = (gradient[left][2] * lweight) + (gradient[right][2] * rweight); + } + + if (selected) { + r *= 0.77f; + g *= 0.77f; + b *= 0.885f; + } + gradient_pre[selected][grayscale][i][0] = (unsigned char) (255 * r); + gradient_pre[selected][grayscale][i][1] = (unsigned char) (255 * g); + gradient_pre[selected][grayscale][i][2] = (unsigned char) (255 * b); + } + } + } }   if (selected) {  r *= 0.77f;  g *= 0.77f;  b *= 0.885f;  }   *red = (unsigned char) (255 * r);  *green = (unsigned char) (255 * g);  *blue = (unsigned char) (255 * b); } // Indentation settings for Vim and Emacs and unique identifier for Arch, a diff git a/src/AColor.h b/src/AColor.h index fff62e1..29ec378 100644  a/src/AColor.h +++ b/src/AColor.h @@ 52,6 +52,8 @@ class AColor { static void TrackFocusPen(wxDC * dc, int level /* 0  2 */); static void SnapGuidePen(wxDC * dc); + static void PreComputeGradient(); + // Member variables static wxBrush lightBrush[2]; @@ 89,6 +91,10 @@ class AColor { static wxBrush tooltipBrush; + static bool gradient_inited; + static const int gradientSteps = 512; + static unsigned char gradient_pre[2][2][gradientSteps][3]; + private: static wxPen sparePen; static wxBrush spareBrush; @@ 96,11 +102,20 @@ class AColor { }; void GetColorGradient(float value,  bool selected,  bool grayscale,  unsigned char *red,  unsigned char *green, unsigned char *blue); +inline void GetColorGradient(float value, + bool selected, + bool grayscale, + unsigned char *red, + unsigned char *green, unsigned char *blue) { + if (!AColor::gradient_inited) + AColor::PreComputeGradient(); + + int idx = value * (AColor::gradientSteps  1); + + *red = AColor::gradient_pre[selected][grayscale][idx][0]; + *green = AColor::gradient_pre[selected][grayscale][idx][1]; + *blue = AColor::gradient_pre[selected][grayscale][idx][2]; +} #endif  
From: Martyn Shaw <martynshaw99@go...>  20090511 00:17:02

Sorry if this is a mispost. I see in http://audacityteam.org/wiki/index.php?title=Pending_Feature_Requests "Auto Complete Equalizer Graph: I created a simple equalizer graph to get somebody out of trouble recently. I think it was a lot more bother than it needed to be. Here's an illustration of an Adobe Photoshop tool. See: http://audacityteam.org/wiki/index.php?title=Image:Curves2.jpg Note that there's only one new data point on the righthand brightness curve, and yet Photoshop automatically produced a graceful, gentle, useful curve typical of a picture whose natural lighting had actually changed. It did not produce two straight lines and depend on me to painstakingly calculate the new points and put the rest of the curve in by hand. I want the equalizer work window to run like that. " So what would be a good a good curve to use? I tried a cubic spline at that wasn't a good plan. TTFN Martyn 