I'm not an expert on SoX or digital audio, so apologies if this is incorrect. However, the silence effect has been reported to successfully strip silences enbedded in recorded speech. In particular http://osdir.com/ml/audio.sox/2006-03/msg00001.html and http://osdir.com/ml/audio.sox/2006-03/msg00005.html report that embedded silences in recorded speech are successfully stripped with:
sox input.wav output.wav silence 1 0.2 0.5% -1 0.2 0.5%
I have downloaded SoX v14.2.0 for Windows, and am using a recorded speech WAV file with silences which are stripped cleanly by Audacity at level -30dB (duration 0.8"). I have attempted to strip the silences with SoX, and find that for any value of the level parameter except 0% the output file is always 60 bytes (presumably empty); for 0% the output file is a bit larger than the input. I have tried a range of values from 0.0001% to 50%, and from -1d to -100d (zero or positive d not allowed; in a batch file "%%" must be used instead of "%", of course). The WAV file is produced by a Chinese S1 MP3 player-type audio recorder, and works in all the applications I've tried it with (including Audacity).
I'd also comment that information in the manual on stripping embedded silences is not very clear.
M:\>sox -V -V z.wav out.wav silence 1 0.2 -30d -1 0.2 -30d > m:z
sox: SoX v14.2.0
time: Nov 8 2008 19:35:10
uname: CYGWIN_NT-5.1 MS_Dell 1.5.25(0.156/4/2) 2008-06-12 19:34 i686
gcc: 3.4.4 (cygming special, gdc 0.12, using dmd 0.125)
arch: 1248 48 44 L
sox formats: detected file format type `wav'
sox wav: WAV Chunk fmt
sox wav: WAV Chunk fact
sox wav: WAV Chunk LIST
sox wav: WAV Chunk data
sox wav: Reading Wave file: IMA ADPCM format, 1 channel, 8000 samp/sec
sox wav: 4000 byte/sec, 512 block align, 4 bits/samp, 2126336 data bytes
sox wav: 2 Extsize, 1017 Samps/block, 512 bytes/block 4223601 Samps/chan
Input File : 'z.wav'
Channels : 1
Sample Rate : 8000
Precision : 13-bit
Duration : 00:08:47.95 = 4223601 samples ~ 39596.3 CDDA sectors
Sample Encoding: 4-bit IMA ADPCM
Endian Type : little
Reverse Nibbles: no
Reverse Bits : no
sox sox: Overwriting `out.wav'
sox wav: Writing Wave file: IMA ADPCM format, 1 channel, 8000 samp/sec
sox wav: 4055 byte/sec, 256 block align, 4 bits/samp
Output File : 'out.wav'
Channels : 1
Sample Rate : 8000
Precision : 13-bit
Duration : 00:08:47.95 = 4223601 samples ~ 39596.3 CDDA sectors
Sample Encoding: 4-bit IMA ADPCM
Endian Type : little
Reverse Nibbles: no
Reverse Bits : no
Comment : 'Processed by SoX'
sox sox: effects chain: input 8000Hz 1 channels 13 bits (multi)
sox sox: effects chain: silence 8000Hz 1 channels 13 bits (multi)
sox sox: effects chain: output 8000Hz 1 channels 13 bits (multi)
sox wav: Finished writing Wave file, 0 data bytes 0 samples
M:\>dir *.wav
05/06/2009 15:04 60 out.wav
03/06/2009 20:10 2,126,848 z.WAV
Best wishes, pol098
I think this is a bug that occurs when the encoding is not linear PCM, so you should be able to convert to 16-bit linear PCM, strip the silence then convert back -- in theory, the conversion should be lossless.
FYI, there's been a fix put in place for this for the next release.
Believed fixed in 14.3.0
This Tracker item was closed automatically by the system. It was
previously set to a Pending status, and the original submitter
did not respond within 14 days (the time period specified by
the administrator of this Tracker).