|
From: Cory N. <ph...@gm...> - 2012-03-10 12:15:23
|
On Sat, Mar 10, 2012 at 1:19 AM, Doug Cook <idi...@us...> wrote: > That said, it is definitely interesting to consider whether sox's > current preferred encoding is the right one. Seeing as how you > essentially can't do anything useful in sox without the sample being > converted to a double at least once (for example, any gain adjustment > involves a floating-point multiply), I'm pretty confident that the > best choice of preferred encoding for sox is a floating-point format, > not an integer format. That means it would be a fight between float32 > (float) and float64 (double). > > As an aside, the fact that all of sox's internal floating-point work > occurs in double should not prejudice the discussion. Double is used > because the current standard is int32, and you lose up to 7 bits of > precision when you convert from int32 to float32, so float32 couldn't > be used for intermediate results if the standard encoding is int32. > > In any case, there are two questions unanswered in my mind. First, > which would be better as the standard encoding for sox: float32 or > float64? > > The 24 bits of precision provided by float (or 25, depending on how > you count) is more than enough for any finished product, but sox is > involved in intermediate results, not just finished products. 24 bits > of precision, if carefully maintained in all calculations, is probably > good enough even for intermediate results, but properly maintaining > the precision of floating point numbers (ensuring that you lose > minimal precision in each calculation) is very tricky, especially > cross-platform. Float32 might not be so bad for intermediate results. For instance, a negative gain will essentially drop effective bits from an int32 sample, while a single-precision float will maintain its full precision. If calculations are kept in double form, less simple filters would still be able to maintain a full 24 bits of precision pretty easily. I do suppose, though, that if sox's goal is to maintain highest fidelity that a full double-precision pipeline would be required, however overkill that may be. > Based on the way I use sox for my own needs (very simple usage), I'm > pretty sure that my ears would never be able to tell the difference > between the results of a float32-based sox and a float64-based sox. > But others might be using more complicated effects chains and might > notice a difference. > > In addition, based on the way I use sox, the processing time > difference between a float32-based sox and a float64-based sox would > probably never be noticed. A float64-based sox might take a few > milliseconds longer to encode something, but it probably wouldn't > affect me. In fact, on an x86, there might not be any difference at > all. But on an arm chip, or perhaps in the future when somebody wants > to make sox do SSE-optimized (or GPU-optimized) vectorized effects > calculations, the difference between a 32-bit and 64-bit float might > be more significant. Indeed. Short of div/sqrt, there's no difference in speed between single and double on any modern x86. Depending on the complexity of calculations, the extra bandwidth usage of double might even be hidden by a CPU's prefetching. With ARM getting more relevant, though, I guess it is important to take it into consideration and I've got no clue how it handles double or prefetching. > The second unanswered question is how much any of this matters. A > floating-point format would make some normalization tasks easier, but > the current system also works fine. And just as my ears can't tell the > difference between float32 and float64, they probably also can't tell > the difference between int32 and float64. Where would this fit in a > list of sox feature requests? For my needs, it wouldn't rank very > high. But it probably wouldn't be really hard, and it would make sox > a lot more flexible. For me it's got nothing to do with quality—I agree no human ears would ever be able to tell the difference between int/float/double. Even the best of DACs average less than 20 bits of precision (as a full device, not an imaginary IC spec) and most "prosumer" devices are lucky to hit 18 bits. As far as listening gues, 16 bits is perfectly fine so long as volume is controlled at the amp. Short of heroic filter chains, I doubt the resulting samples would be significantly changed for anyone either. Such filter chains would no doubt be altering the audio enough to make a small loss of intermediate precision go entirely unnoticed. The current guard system does work fine for most of the filters, though it really, really sucks when you happen upon one where it doesn't (like compand) and just need to take wild guesses until you find a gain that works. This is really my only motivation here. -- Cory Nelson http://int64.org |