Menu

#137 Stripping embedded silences doesn't work

closed
nobody
None
5
2009-07-13
2009-06-05
mgs
No

I'm not an expert on SoX or digital audio, so apologies if this is incorrect. However, the silence effect has been reported to successfully strip silences enbedded in recorded speech. In particular http://osdir.com/ml/audio.sox/2006-03/msg00001.html and http://osdir.com/ml/audio.sox/2006-03/msg00005.html report that embedded silences in recorded speech are successfully stripped with:
sox input.wav output.wav silence 1 0.2 0.5% -1 0.2 0.5%

I have downloaded SoX v14.2.0 for Windows, and am using a recorded speech WAV file with silences which are stripped cleanly by Audacity at level -30dB (duration 0.8"). I have attempted to strip the silences with SoX, and find that for any value of the level parameter except 0% the output file is always 60 bytes (presumably empty); for 0% the output file is a bit larger than the input. I have tried a range of values from 0.0001% to 50%, and from -1d to -100d (zero or positive d not allowed; in a batch file "%%" must be used instead of "%", of course). The WAV file is produced by a Chinese S1 MP3 player-type audio recorder, and works in all the applications I've tried it with (including Audacity).

I'd also comment that information in the manual on stripping embedded silences is not very clear.

M:\>sox -V -V z.wav out.wav silence 1 0.2 -30d -1 0.2 -30d > m:z
sox: SoX v14.2.0
time: Nov 8 2008 19:35:10
uname: CYGWIN_NT-5.1 MS_Dell 1.5.25(0.156/4/2) 2008-06-12 19:34 i686
gcc: 3.4.4 (cygming special, gdc 0.12, using dmd 0.125)
arch: 1248 48 44 L
sox formats: detected file format type `wav'
sox wav: WAV Chunk fmt
sox wav: WAV Chunk fact
sox wav: WAV Chunk LIST
sox wav: WAV Chunk data
sox wav: Reading Wave file: IMA ADPCM format, 1 channel, 8000 samp/sec
sox wav: 4000 byte/sec, 512 block align, 4 bits/samp, 2126336 data bytes

sox wav: 2 Extsize, 1017 Samps/block, 512 bytes/block 4223601 Samps/chan

Input File : 'z.wav'
Channels : 1
Sample Rate : 8000
Precision : 13-bit
Duration : 00:08:47.95 = 4223601 samples ~ 39596.3 CDDA sectors
Sample Encoding: 4-bit IMA ADPCM
Endian Type : little
Reverse Nibbles: no
Reverse Bits : no

sox sox: Overwriting `out.wav'
sox wav: Writing Wave file: IMA ADPCM format, 1 channel, 8000 samp/sec
sox wav: 4055 byte/sec, 256 block align, 4 bits/samp

Output File : 'out.wav'
Channels : 1
Sample Rate : 8000
Precision : 13-bit
Duration : 00:08:47.95 = 4223601 samples ~ 39596.3 CDDA sectors
Sample Encoding: 4-bit IMA ADPCM
Endian Type : little
Reverse Nibbles: no
Reverse Bits : no
Comment : 'Processed by SoX'

sox sox: effects chain: input 8000Hz 1 channels 13 bits (multi)
sox sox: effects chain: silence 8000Hz 1 channels 13 bits (multi)
sox sox: effects chain: output 8000Hz 1 channels 13 bits (multi)
sox wav: Finished writing Wave file, 0 data bytes 0 samples

M:\>dir *.wav
05/06/2009 15:04 60 out.wav
03/06/2009 20:10 2,126,848 z.WAV

Best wishes, pol098

Discussion

  • robs

    robs - 2009-06-05

    I think this is a bug that occurs when the encoding is not linear PCM, so you should be able to convert to 16-bit linear PCM, strip the silence then convert back -- in theory, the conversion should be lossless.

    FYI, there's been a fix put in place for this for the next release.

     
  • robs

    robs - 2009-06-28

    Believed fixed in 14.3.0

     
  • robs

    robs - 2009-06-28
    • status: open --> pending
     
  • SourceForge Robot

    This Tracker item was closed automatically by the system. It was
    previously set to a Pending status, and the original submitter
    did not respond within 14 days (the time period specified by
    the administrator of this Tracker).

     
  • SourceForge Robot

    • status: pending --> closed
     

Log in to post a comment.

MongoDB Logo MongoDB