Re: [SoX-users] toward floating-point?

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

> I figured it would be something like this. How realistic would it be
> to gradually transition the entire pipeline to floating-point, perhaps
> with an automatic filter applied to convert to/from integer for
> filters which don't yet support it? I haven't looked at sox yet, but
> I'd be willing to contribute some patches.

I've given this some thought, but unfortunately it just hasn't been
important enough to me for it to graduate from thought experiment to
working code.

I thought that the ideal would be to simply not have any kind of
"native" encoding. Sox should accept data in the encoding provided and
pass it on in the encoding requested. The input format handler tells
sox what encoding it produces. Effects tell sox what encoding(s) they
accept and what encoding(s) they can produce. The output format
handler tells sox what encoding it needs. Sox adds encoding conversion
steps between each effect as needed, just like it already
automatically adds channel/sample rate/dithering/etc. steps.

In theory, this could make sox work much more efficiently. Many of the
effects take the 32-bit integer input, convert to double, perform the
effect's calculation, then convert back to 32-bit integer. When
several of these double-based effects are chained together, it would
be much more efficient to simply leave the data in double and skip all
the conversion.

That was my theory -- for a while anyway.

In practice, all effects in an audio processing chain should meet a
similar minimum standard. If the minimum standard is 24 bits of
precision, all effects should be designed to provide 24 bits of
precision. If the standard is 120 dB of dynamic range, all effects
should be able to provide that too. A piece of audio processing
software will be designed with a particular standard in mind, and the
effects will then be built to meet or beat that standard as
efficiently as possible. And that means the effects will converge on a
single "preferred" encoding.

That said, it is definitely interesting to consider whether sox's
current preferred encoding is the right one. Seeing as how you
essentially can't do anything useful in sox without the sample being
converted to a double at least once (for example, any gain adjustment
involves a floating-point multiply), I'm pretty confident that the
best choice of preferred encoding for sox is a floating-point format,
not an integer format. That means it would be a fight between float32
(float) and float64 (double).

As an aside, the fact that all of sox's internal floating-point work
occurs in double should not prejudice the discussion. Double is used
because the current standard is int32, and you lose up to 7 bits of
precision when you convert from int32 to float32, so float32 couldn't
be used for intermediate results if the standard encoding is int32.

In any case, there are two questions unanswered in my mind. First,
which would be better as the standard encoding for sox: float32 or
float64?

The 24 bits of precision provided by float (or 25, depending on how
you count) is more than enough for any finished product, but sox is
involved in intermediate results, not just finished products. 24 bits
of precision, if carefully maintained in all calculations, is probably
good enough even for intermediate results, but properly maintaining
the precision of floating point numbers (ensuring that you lose
minimal precision in each calculation) is very tricky, especially
cross-platform.

Based on the way I use sox for my own needs (very simple usage), I'm
pretty sure that my ears would never be able to tell the difference
between the results of a float32-based sox and a float64-based sox.
But others might be using more complicated effects chains and might
notice a difference.

In addition, based on the way I use sox, the processing time
difference between a float32-based sox and a float64-based sox would
probably never be noticed. A float64-based sox might take a few
milliseconds longer to encode something, but it probably wouldn't
affect me. In fact, on an x86, there might not be any difference at
all. But on an arm chip, or perhaps in the future when somebody wants
to make sox do SSE-optimized (or GPU-optimized) vectorized effects
calculations, the difference between a 32-bit and 64-bit float might
be more significant.

The second unanswered question is how much any of this matters. A
floating-point format would make some normalization tasks easier, but
the current system also works fine. And just as my ears can't tell the
difference between float32 and float64, they probably also can't tell
the difference between int32 and float64. Where would this fit in a
list of sox feature requests? For my needs, it wouldn't rank very
high. But it probably wouldn't be really hard, and it would make sox
a lot more flexible.