Hi,
like the title, I think an example shows best what I mean:
when using seconds to trim, the result is as expected the requested duration
~~~
/tmp
▶ sox -n -r 16000 silence.wav trim 0 0.25
/tmp
▶ sox --i silence.wav
Input File : 'silence.wav'
Channels : 1
Sample Rate : 16000
Precision : 32-bit
Duration : 00:00:00.25 = 4000 samples ~ 18.75 CDDA sectors
File Size : 16.1k
Bit Rate : 515k
Sample Encoding: 32-bit Signed Integer PCM
When using number of samples taken from the above example, it seems that the duration is not as expected (1333 in stead of 4000) I noticed this is relative to 48000Hz meaning 4000 * 16000/48000 = 1333.333 This command works as expected for -r 48000 Maybe I'm understanding something wrong, but I can't seem to find anything in the manual indicating to this behavior?
/tmp
▶ sox -n -r 16000 silence.wav trim 0s 4000s
/tmp
▶ sox --i silence.wav
Input File : 'silence.wav'
Channels : 1
Sample Rate : 16000
Precision : 32-bit
Duration : 00:00:00.08 = 1333 samples ~ 6.24844 CDDA sectors
File Size : 5.41k
Bit Rate : 520k
Sample Encoding: 32-bit Signed Integer PCM
~~~
Apparently it works as expected when it is called in a different order.
Sorry, I guess I don't really understand things very well.
Also I don't really understand why it would be different when using samples or seconds for the trim effect in the original post's case
Last edit: Emile V 2020-08-25
Well, when a time is converted to a sample count by SoX, the current rate is automatically taken into consideration. When you specify a number of samples directly, you need to do that yourself.
So, in your example from the first post, youstart with a sampling rate of 48000 Hz, which is the nullfile default. Then you trim to 4000 samples, which is 1/12 of a second, before the audio is automatically converted to the specified output rate of 16000 Hz, where 1/12 of a second is approximately 1333 samples. All correct. In the second post, you specify a sampling rate of 16000 Hz for the input nullfile already, so no conversion takes place, and the trimming happens at 16000 Hz.
You could consider using verbose mode (
-V
) in order to see what is actually happening and in what order. You can always manually specify the necessary converting effects in your preferred order (e.g.rate 16k trim 0 4000s
).Add the -V flag for verbose output:
Compare with this, your second example:
In the first example, the null input (-n) has the default 48 kHz sample rate. The trim effect is applied to this. Finally, the output from the trim is resampled to the output rate of 16 kHz. In the second example, the input is set to 16 kHz sample rate, and the output then defaults to the same rate.
The trim effect is operating on different sample rates in the two cases, which explains the difference.
Thanks for the clarification.