Thread: [Audacity-devel] [Fwd: Re: Dithering]
A free multi-track audio editor and recorder
Brought to you by:
aosiniao
From: Dominic M. <do...@mi...> - 2001-10-17 16:38:21
|
-------- Original Message -------- Subject: Re: Dithering Date: Wed, 17 Oct 2001 05:43:28 -0400 From: Chris Johnson <jin...@so...> To: Dominic Mazzoni <do...@mi...> >I'm posting this to the Audacity Developers mailing list so that >others can read and join the discussion as well. Great- I'm not sure if my response will get there, but if you want to repost it I approve. If it's not a high-volume list I could join- though it might be unsuitable if it is largely intensive C++ talk, which I won't understand. Maybe I'm best as a 'background' resource. >The current version of Audacity (0.97) is definitely not pro-level. >An entire project must be one sample rate, and it only supports 16-bit >stereo (though it can mix multiple tracks). However, most of the work >"under the hood" is done to have Audacity support an unlimited >number of I/O channels and 32-bit samples, and that code should be >finished in the next few months. Even if you're using only 16-bit samples, what is the bit depth of your mix buss? Arguably, the preferable internal format is 32-bit float: that, or something more outlandish like 64-bit float. Mastering Tools Pro runs on 32-bit float, and I believe VST plugins can operate on an internal buss running at that depth. >Clearly dithering is important - but I don't really know much about >dithering at all - so the first thing you can do to help us out is >explain exactly what dithering is and when it should be used. >Let me explain what I do know, to bring others on the list up to speed, >and so you can correct any misconceptions I might have. Sure. Quick answer is: "you should never need to use it until the very last step, and then it's very important". In other words, ideally you have infinite word length, and then you reduce it to 16 bit (for 16 bit sound output hardware, or CD audio) or 24 or 32 bit (for export to high resolution formats such as the Alesis Masterlink, or high-res files such as AIFFs or the NeXT 32-bit floating point format that Apple's OSX will be based on. >Essentially, when you divide a value like 55 by 8 you get 6.875, for >example. Instead of always making this 6 or always 7, 87.5% of the time >it returns 7 and 12.5% of the time it returns 6. Over a large region of >this color, this approximates the color 55/255. This is precisely what happens in audio- except that the large 'region' is a distribution of samples in time. To see what happens you can go to my techdetails page and look at the very top and very bottom graphs... <http://www.airwindows.com/dithering/TechDetails.html> At the top, is a frequency curve of what the 'banding' in your picture SOUNDS like as applied to audio. The series of spikes are overtones at various frequencies. Dither can be as simple as 'noise' added at very quiet levels (around 1-2 least significant bits), but it is not about _covering_ quantization noise with background noise. What it does is, it produces a statistical averaging effect that redistributes the error. Rather than having the error show up like 'banding' in a picture (easily recognizable to the ear as thin-ness and shallowness of the sound, plus a bit of grunge), it instead makes the error be distributed across the frequency domain, and because of the random fluctuations (with considerable high frequency content- dither is not _low frequency_ noise), the mid and low frequencies actually clean up, becoming lower in distortion and noise. This is shown in the frequency analyses- simple spectrum charts measuring signal content at those frequencies generated from a low-level 1K tone. >In digital audio, this comes up when you have an audio track in memory >that's stored in 32-bit samples, but your sound card can only handle >16-bit samples. You want to convert from the 32-bit numbers to 16-bit >numbers, but with as little loss of sound quality as possible. >* Is that what dithering means in audio? Or is that too general, > too specific, or totally different? No, that's what you want to do. You're trying to break up the audio equivalent of 'banding'. Pro Tools in particular has developed a bad reputation through being on the one hand wildly popular, but on the other hand, having a truncated 16-bit buss until recently, producing a characteristic sound that's tough to work around. >* Does the function above work well for audio? I'm not enough of a C programmer to really follow the example confidently but it looks like this equates to: output = (x + (rnd()*8))/8 //when 'output' is an int You can just add the dither value in and then truncate- the logic and comparisons aren't necessary. But the general principle is good- for instance, if you are keeping your internal representation in floating point (which I recommend, unless you have a high-bit-depth fixed point like 40-bit or more, some embedded audio stuff does that), you can keep it all in the area of -16386 to 16387, add a dither that's nothing more than (rnd()-0.5) and convert the result to int. I strongly recommend experimenting with different techniques like flooring or 'ceil'ing the values, testing a very low level signal and then wildly amplifying the output, to see what your crossover point sounds like: you could end up with a dead spot around zero that will effectively be crossover distortion- in the digital domain. This will have a subtle but profound effect on the sound of your reverb tails, and in fact all sound but especially subtleties. I'm not kidding- once when doing azimuth correction for Mastering Tools Pro, I got a calculation wrong and was interpolating backwards between individual _samples_... and the only way I could tell was, some transient sounds were making 'tick' noises- but when I fixed it, ZANG! The tonality got way better, in a really obvious way. Magic tonality is in the details such as how well your conversion to int handles the zero-crossing region. >* Why are there so many different dithering algorithms, and why > would people choose different ones for different uses? Ah, here's the fun part- and again there's a graphics parallel. Ever seen 'diffusion dither' compared to, say, pattern dither? That refers to 'error diffusion'. Let's imagine again that we have some samples that are floating-point, and we're converting them to int. Let's also pretend we've already got the dither noise added to it, though this is a separate process. Finally, we'll assume that values are rounded to the nearest int (though you can also do this with flooring the numbers, I think it makes more sense rounded) 30.1 30 (.1) 32.7 33 (-.3) 35.2 35 (.2) 37.8 38 (-.2) See the values in the parentheses? That's the error from each truncation. Now, what if we took this error, the un-capturable part that's 'missing' from the truncated int version, and added it in again? 30.1 30 (.1) 32.8 33 (-.2) 35.0 35 (0) 37.8 38 (-.2) This is a really primitive noise shaping- nobody actually does it quite this way. Technically it's feeding the whole error back in on the next sample, which would cause some wild aberrations in the high end. However, here's the point: by adding back what is lost in truncation error, it's possible to further redistribute the error- as in, further clean up the bass and midrange, even to a large extent the treble. The whole point of error distribution is to load the error into the most extreme highs where people have a harder time hearing it. If you look at a diffusion-dithered picture, the broad areas of tonality will be strikingly well rendered- but any tiny detail the size of a pixel is likely to be completely distorted by the dither. It's the same with audio- you're shooting for a big, lush, warm, liquid sound through most of the audio range- and the price you pay is allowing some error energy to be shoved into the extreme highs. We're talking still about 100 db down, though this does tend to make the resulting audio sound 'bright' and illuminated. Hope this helped- we can get more detailed whenever you like. Technically, the dithers I have available under the GPL include two variations on 'quadratic root' and 'primitive root' sequences that are fairly quick to calculate, a 'high-pass' dither that is a bear to calculate (counts as a 'Near Nyquist' dither as it has little content below the highest frequencies), and a method for noise shaping that I call indeterminate-order noise shaping which is rather good. Here is the code snippets (should be readable- this isn't C but then the code isn't hard). Position is the sample number. First, the Quadratic Residue one: Calibration(0)=0.0799058 Calibration(1)=0.1088214 Calibration(3)=0.1178439 Calibration(4)=0.1622092 Calibration(5)=0.1787035 Calibration(9)=0.2021918 //set up values in routine init Squared = Position*Position Dither = Squared mod 1993 //main quadratic residue value relative to sample number Dither = Calibration((Dither*Dither) mod 11) * 2.9606 //second level of quadratic residue compressing range further if OddSample then Dither = -Dither end if //sign bit flips every other sample That gives you a quadratic dither scaled for simply converting a float to an int. Next, is the Primitive Root dither PrimitiveRoot = (PrimitiveRoot * 2) mod 10006721 Dither = Dither + ((((PrimitiveRoot mod 5) - 2.5) / 14.157) * 1.3849) Finally there's the highpass, which is far from realtime (though I think you could cache a section of it without too much trouble) Rand = (rnd-0.5)*10000000000 for count = 1 to 100 HAvg(count) = (HAvg(count) + Rand)/2 Rand = Rand - HAvg(count) next Dither = Dither + (Rand * (1.847*600000000)) All this is taken live from the current Mastering Tools Pro source, including the 'trim' settings I settled on- in the program they're user-tweakable but here I'm plugging in the defaults. Finally, this is the code used for the indeterminate-order noise shaping- slightly simplified because the original had to deal with interleaved stereo data: if OddSample then NSOddR = (NSOddR / NSDensity) + Error NSEvenR = (NSEvenR / NSDensity) - Error Result = Result + (NSOddR * NSTrim) + Dither else NSOddR = (NSOddR / NSDensity) - Error NSEvenR = (NSEvenR / NSDensity) + Error Result = Result + (NSEvenR * NSTrim) + Dither end if Note what's happening here: the error from one sample keeps getting applied to succeeding samples with diminishing impact. If the error started off as 1.6, and NSDensity was 2 then succeeding samples would be hit with: -.8 .4 -.2 .1 -.05 ..and so on. This means the amount of error introduced approaches 0 over enough time- meaning that the noise curve of the resulting noise shaping drops to _very_ low amounts at lower frequencies. Now that I mention it- I need to experiment with more than 2 stages in this, and with delaying the onset of the error component. But that's for another day- and it's for me to experiment with. For the highpass dither, my value for NSDensity is 1.28649 and NSTrim is 1.846. For Quadratic, NSDensity is 1.122 and NSTrim is 1.22076. For the Primitive Root, NSDensity is 1.117 and NSTrim is 1.38433. This is mostly arrived at through experimentation and measurement of noise levels... >After we tackle dithering, I'd love to talk about compression and >equalization, and probably a lot more. Absolutely! I can cover compression, limiting, lookahead limiting and even some tweaks that I'm not sure are common- I've taken to feeding the 'error' from limiting back into the waveform as DC offset, which I think is one of the tricks pro limiters such as Waves L2 use. Also, I have variable lookahead because excessive limiting without lookahead sounds distorted, but with lookahead it sounds 'disassociated' and diffused, and I wanted to be able to dial in my choice of how much distorted-feel to give it. There's also a useful trick called 'sidechain compression' which is basically taking a signal, compressing the hell out of it and then bleeding in small amounts of this into the regular, uncompressed signal to bring up the details without smooshing the transient attacks of the music. As for equalization, you guys might actually know more about it than I do, because my concept of EQ is on a much cruder level than FFT stuff (which is how it's normally done, but I don't have the math). My approach to EQ is averaging recent samples, which is a very crude and ugly lowpass filter. For highpass, you subtract it from the current sample :) This actually makes for a good-sounding EQ, but it's hell to control and has a really shallow slope. Might be worth coding up as an optional type- it's another indeterminate-order approach. The only way you can steepen the slope is by cascading them, which gets really computationally expensive. Maybe I ought to get on the developer list, if you want me, even if I'm no C++ coder. I have to say- this is _exactly_ what I write free software for. I really hope I can bring something to your project. It may be that at some time I will rely on it for my work :) Chris Johnson |