|
From: Doug C. <idi...@us...> - 2012-03-10 07:19:21
|
> I figured it would be something like this. How realistic would it be > to gradually transition the entire pipeline to floating-point, perhaps > with an automatic filter applied to convert to/from integer for > filters which don't yet support it? I haven't looked at sox yet, but > I'd be willing to contribute some patches. I've given this some thought, but unfortunately it just hasn't been important enough to me for it to graduate from thought experiment to working code. I thought that the ideal would be to simply not have any kind of "native" encoding. Sox should accept data in the encoding provided and pass it on in the encoding requested. The input format handler tells sox what encoding it produces. Effects tell sox what encoding(s) they accept and what encoding(s) they can produce. The output format handler tells sox what encoding it needs. Sox adds encoding conversion steps between each effect as needed, just like it already automatically adds channel/sample rate/dithering/etc. steps. In theory, this could make sox work much more efficiently. Many of the effects take the 32-bit integer input, convert to double, perform the effect's calculation, then convert back to 32-bit integer. When several of these double-based effects are chained together, it would be much more efficient to simply leave the data in double and skip all the conversion. That was my theory -- for a while anyway. In practice, all effects in an audio processing chain should meet a similar minimum standard. If the minimum standard is 24 bits of precision, all effects should be designed to provide 24 bits of precision. If the standard is 120 dB of dynamic range, all effects should be able to provide that too. A piece of audio processing software will be designed with a particular standard in mind, and the effects will then be built to meet or beat that standard as efficiently as possible. And that means the effects will converge on a single "preferred" encoding. That said, it is definitely interesting to consider whether sox's current preferred encoding is the right one. Seeing as how you essentially can't do anything useful in sox without the sample being converted to a double at least once (for example, any gain adjustment involves a floating-point multiply), I'm pretty confident that the best choice of preferred encoding for sox is a floating-point format, not an integer format. That means it would be a fight between float32 (float) and float64 (double). As an aside, the fact that all of sox's internal floating-point work occurs in double should not prejudice the discussion. Double is used because the current standard is int32, and you lose up to 7 bits of precision when you convert from int32 to float32, so float32 couldn't be used for intermediate results if the standard encoding is int32. In any case, there are two questions unanswered in my mind. First, which would be better as the standard encoding for sox: float32 or float64? The 24 bits of precision provided by float (or 25, depending on how you count) is more than enough for any finished product, but sox is involved in intermediate results, not just finished products. 24 bits of precision, if carefully maintained in all calculations, is probably good enough even for intermediate results, but properly maintaining the precision of floating point numbers (ensuring that you lose minimal precision in each calculation) is very tricky, especially cross-platform. Based on the way I use sox for my own needs (very simple usage), I'm pretty sure that my ears would never be able to tell the difference between the results of a float32-based sox and a float64-based sox. But others might be using more complicated effects chains and might notice a difference. In addition, based on the way I use sox, the processing time difference between a float32-based sox and a float64-based sox would probably never be noticed. A float64-based sox might take a few milliseconds longer to encode something, but it probably wouldn't affect me. In fact, on an x86, there might not be any difference at all. But on an arm chip, or perhaps in the future when somebody wants to make sox do SSE-optimized (or GPU-optimized) vectorized effects calculations, the difference between a 32-bit and 64-bit float might be more significant. The second unanswered question is how much any of this matters. A floating-point format would make some normalization tasks easier, but the current system also works fine. And just as my ears can't tell the difference between float32 and float64, they probably also can't tell the difference between int32 and float64. Where would this fit in a list of sox feature requests? For my needs, it wouldn't rank very high. But it probably wouldn't be really hard, and it would make sox a lot more flexible. |