Thread: [Audacity-devel] [Fwd: Re: Dithering]

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

-------- Original Message --------
Subject: Re: Dithering
Date: Wed, 17 Oct 2001 05:43:28 -0400
From: Chris Johnson <jin...@so...>
To: Dominic Mazzoni <do...@mi...>

>I'm posting this to the Audacity Developers mailing list so that
>others can read and join the discussion as well.

	Great- I'm not sure if my response will get there, but if you want
to repost it I approve. If it's not a high-volume list I could join-
though
it might be unsuitable if it is largely intensive C++ talk, which I
won't
understand. Maybe I'm best as a 'background' resource.

>The current version of Audacity (0.97) is definitely not pro-level.
>An entire project must be one sample rate, and it only supports 16-bit
>stereo (though it can mix multiple tracks).  However, most of the work
>"under the hood" is done to have Audacity support an unlimited
>number of I/O channels and 32-bit samples, and that code should be
>finished in the next few months.

	Even if you're using only 16-bit samples, what is the bit depth of
your mix buss? Arguably, the preferable internal format is 32-bit float:
that, or something more outlandish like 64-bit float. Mastering Tools
Pro
runs on 32-bit float, and I believe VST plugins can operate on an
internal
buss running at that depth.

>Clearly dithering is important - but I don't really know much about
>dithering at all - so the first thing you can do to help us out is
>explain exactly what dithering is and when it should be used.
>Let me explain what I do know, to bring others on the list up to speed,
>and so you can correct any misconceptions I might have.

	Sure. Quick answer is: "you should never need to use it until the
very last step, and then it's very important". In other words, ideally
you
have infinite word length, and then you reduce it to 16 bit (for 16 bit
sound output hardware, or CD audio) or 24 or 32 bit (for export to high
resolution formats such as the Alesis Masterlink, or high-res files such
as
AIFFs or the NeXT 32-bit floating point format that Apple's OSX will be
based on.

>Essentially, when you divide a value like 55 by 8 you get 6.875, for
>example.  Instead of always making this 6 or always 7, 87.5% of the time
>it returns 7 and 12.5% of the time it returns 6.  Over a large region of
>this color, this approximates the color 55/255.

	This is precisely what happens in audio- except that the large
'region' is a distribution of samples in time. To see what happens you
can
go to my techdetails page and look at the very top and very bottom
graphs...
	<http://www.airwindows.com/dithering/TechDetails.html>
	At the top, is a frequency curve of what the 'banding' in your
picture SOUNDS like as applied to audio. The series of spikes are
overtones
at various frequencies.
	Dither can be as simple as 'noise' added at very quiet levels
(around 1-2 least significant bits), but it is not about _covering_
quantization noise with background noise. What it does is, it produces a
statistical averaging effect that redistributes the error. Rather than
having the error show up like 'banding' in a picture (easily
recognizable
to the ear as thin-ness and shallowness of the sound, plus a bit of
grunge), it instead makes the error be distributed across the frequency
domain, and because of the random fluctuations (with considerable high
frequency content- dither is not _low frequency_ noise), the mid and low
frequencies actually clean up, becoming lower in distortion and noise.
This
is shown in the frequency analyses- simple spectrum charts measuring
signal
content at those frequencies generated from a low-level 1K tone.

>In digital audio, this comes up when you have an audio track in memory
>that's stored in 32-bit samples, but your sound card can only handle
>16-bit samples.  You want to convert from the 32-bit numbers to 16-bit
>numbers, but with as little loss of sound quality as possible.
>* Is that what dithering means in audio?  Or is that too general,
>   too specific, or totally different?

	No, that's what you want to do. You're trying to break up the audio
equivalent of 'banding'. Pro Tools in particular has developed a bad
reputation through being on the one hand wildly popular, but on the
other
hand, having a truncated 16-bit buss until recently, producing a
characteristic sound that's tough to work around.

>* Does the function above work well for audio?

	I'm not enough of a C programmer to really follow the example
confidently but it looks like this equates to:

	output = (x + (rnd()*8))/8  //when 'output' is an int

	You can just add the dither value in and then truncate- the logic
and comparisons aren't necessary. But the general principle is good- for
instance, if you are keeping your internal representation in floating
point
(which I recommend, unless you have a high-bit-depth fixed point like
40-bit or more, some embedded audio stuff does that), you can keep it
all
in the area of -16386 to 16387, add a dither that's nothing more than
(rnd()-0.5) and convert the result to int.
	I strongly recommend experimenting with different techniques like
flooring or 'ceil'ing the values, testing a very low level signal and
then
wildly amplifying the output, to see what your crossover point sounds
like:
you could end up with a dead spot around zero that will effectively be
crossover distortion- in the digital domain. This will have a subtle but
profound effect on the sound of your reverb tails, and in fact all sound
but especially subtleties. I'm not kidding- once when doing azimuth
correction for Mastering Tools Pro, I got a calculation wrong and was
interpolating backwards between individual _samples_... and the only way
I
could tell was, some transient sounds were making 'tick' noises- but
when I
fixed it, ZANG! The tonality got way better, in a really obvious way.
Magic
tonality is in the details such as how well your conversion to int
handles
the zero-crossing region.

>* Why are there so many different dithering algorithms, and why
>   would people choose different ones for different uses?

	Ah, here's the fun part- and again there's a graphics parallel.
	Ever seen 'diffusion dither' compared to, say, pattern dither? That
refers to 'error diffusion'. Let's imagine again that we have some
samples
that are floating-point, and we're converting them to int. Let's also
pretend we've already got the dither noise added to it, though this is a
separate process. Finally, we'll assume that values are rounded to the
nearest int (though you can also do this with flooring the numbers, I
think
it makes more sense rounded)

	30.1	30	(.1)
	32.7	33	(-.3)
	35.2	35	(.2)
	37.8	38	(-.2)

	See the values in the parentheses? That's the error from each
truncation. Now, what if we took this error, the un-capturable part
that's
'missing' from the truncated int version, and added it in again?

	30.1	30	(.1)
	32.8	33	(-.2)
	35.0	35	(0)
	37.8	38	(-.2)

	This is a really primitive noise shaping- nobody actually does it
quite this way. Technically it's feeding the whole error back in on the
next sample, which would cause some wild aberrations in the high end.
However, here's the point: by adding back what is lost in truncation
error,
it's possible to further redistribute the error- as in, further clean up
the bass and midrange, even to a large extent the treble. The whole
point
of error distribution is to load the error into the most extreme highs
where people have a harder time hearing it.
	If you look at a diffusion-dithered picture, the broad areas of
tonality will be strikingly well rendered- but any tiny detail the size
of
a pixel is likely to be completely distorted by the dither. It's the
same
with audio- you're shooting for a big, lush, warm, liquid sound through
most of the audio range- and the price you pay is allowing some error
energy to be shoved into the extreme highs. We're talking still about
100
db down, though this does tend to make the resulting audio sound
'bright'
and illuminated.
	Hope this helped- we can get more detailed whenever you like.
Technically, the dithers I have available under the GPL include two
variations on 'quadratic root' and 'primitive root' sequences that are
fairly quick to calculate, a 'high-pass' dither that is a bear to
calculate
(counts as a 'Near Nyquist' dither as it has little content below the
highest frequencies), and a method for noise shaping that I call
indeterminate-order noise shaping which is rather good.
	Here is the code snippets (should be readable- this isn't C but
then the code isn't hard). Position is the sample number. First, the
Quadratic Residue one:

	Calibration(0)=0.0799058
	Calibration(1)=0.1088214
	Calibration(3)=0.1178439
	Calibration(4)=0.1622092
	Calibration(5)=0.1787035
	Calibration(9)=0.2021918
	//set up values in routine init

	Squared = Position*Position
	Dither = Squared mod 1993
	//main quadratic residue value relative to sample number
	Dither = Calibration((Dither*Dither) mod 11) * 2.9606
	//second level of quadratic residue compressing range further
	if OddSample then
		Dither = -Dither
	end if
	//sign bit flips every other sample

	That gives you a quadratic dither scaled for simply converting a
float to an int. Next, is the Primitive Root dither

	PrimitiveRoot = (PrimitiveRoot * 2) mod 10006721
	Dither = Dither + ((((PrimitiveRoot mod 5) - 2.5) / 14.157) * 1.3849)

	Finally there's the highpass, which is far from realtime (though I
think you could cache a section of it without too much trouble)

	Rand = (rnd-0.5)*10000000000
	for count = 1 to 100
		HAvg(count) = (HAvg(count) + Rand)/2
		Rand = Rand - HAvg(count)
	next
	Dither = Dither + (Rand * (1.847*600000000))

	All this is taken live from the current Mastering Tools Pro source,
including the 'trim' settings I settled on- in the program they're
user-tweakable but here I'm plugging in the defaults. Finally, this is
the
code used for the indeterminate-order noise shaping- slightly simplified
because the original had to deal with interleaved stereo data:

	if OddSample then
		NSOddR = (NSOddR / NSDensity) + Error
		NSEvenR = (NSEvenR / NSDensity) - Error
		Result = Result + (NSOddR * NSTrim) + Dither
	else
		NSOddR = (NSOddR / NSDensity) - Error
		NSEvenR = (NSEvenR / NSDensity) + Error
		Result = Result + (NSEvenR * NSTrim) + Dither
	end if

	Note what's happening here: the error from one sample keeps getting
applied to succeeding samples with diminishing impact. If the error
started
off as 1.6, and NSDensity was 2 then succeeding samples would be hit
with:

	-.8
	.4
	-.2
	.1
	-.05

	..and so on. This means the amount of error introduced approaches 0
over enough time- meaning that the noise curve of the resulting noise
shaping drops to _very_ low amounts at lower frequencies.

	Now that I mention it- I need to experiment with more than 2 stages
in this, and with delaying the onset of the error component. But that's
for
another day- and it's for me to experiment with. For the highpass
dither,
my value for NSDensity is 1.28649 and NSTrim is 1.846. For Quadratic,
NSDensity is 1.122 and NSTrim is 1.22076. For the Primitive Root,
NSDensity
is 1.117 and NSTrim is 1.38433. This is mostly arrived at through
experimentation and measurement of noise levels...

>After we tackle dithering, I'd love to talk about compression and
>equalization, and probably a lot more.

	Absolutely! I can cover compression, limiting, lookahead limiting
and even some tweaks that I'm not sure are common- I've taken to feeding
the 'error' from limiting back into the waveform as DC offset, which I
think is one of the tricks pro limiters such as Waves L2 use. Also, I
have
variable lookahead because excessive limiting without lookahead sounds
distorted, but with lookahead it sounds 'disassociated' and diffused,
and I
wanted to be able to dial in my choice of how much distorted-feel to
give
it.
	There's also a useful trick called 'sidechain compression' which is
basically taking a signal, compressing the hell out of it and then
bleeding
in small amounts of this into the regular, uncompressed signal to bring
up
the details without smooshing the transient attacks of the music.
	As for equalization, you guys might actually know more about it
than I do, because my concept of EQ is on a much cruder level than FFT
stuff (which is how it's normally done, but I don't have the math). My
approach to EQ is averaging recent samples, which is a very crude and
ugly
lowpass filter. For highpass, you subtract it from the current sample :)
This actually makes for a good-sounding EQ, but it's hell to control and
has a really shallow slope. Might be worth coding up as an optional
type-
it's another indeterminate-order approach. The only way you can steepen
the
slope is by cascading them, which gets really computationally expensive.

	Maybe I ought to get on the developer list, if you want me, even if
I'm no C++ coder. I have to say- this is _exactly_ what I write free
software for. I really hope I can bring something to your project. It
may
be that at some time I will rely on it for my work :)


	Chris Johnson

Thread: [Audacity-devel] [Fwd: Re: Dithering]

A free multi-track audio editor and recorder

audacity-devel