wave2feat help request

Speech Recognition Toolkit

Brought to you by: air, arthchan2003, awb, bhiksha, and 5 others

This project can now be found here.

wave2feat help request

Forum: Help

Created: 2010-08-18

Updated: 2012-09-22

jvs - 2010-08-18

I would like to directly work with the cepstra generated by wave2feat (not
using Sphix reco or training) and am wondering if my generated cepstra are
correct.

The audio consists of high quality recording of single words with a small
amount of leading and trailing silence. The output is a sequence of roughly
100 to 200 cepstral vectors. The wave2feat options I used are:

wave2feat -i file.wav -o file.mfc -samprate 44100 -nfft 2048 -mswav yes

The nfft option was needed to avoid an error message. Looking at the generated
cepstral vectors for one case I see an occasional NaN, which I replaced with a
small float value in order to make forward progress. Subsequent examination of
the cepstral coefficients themselves show wide magnitude swings for individual
values.

For example, the first cepstral coefficient has the values:
-1e+34 1e-34 2e-02 -2e-12 ee+21
and so on. Examination of the time behavior of other coefficients shows
similar swings. I would have expected more limited movement during the
stationary portions of the speech.

My worry is something about the wave2feat options I needed to use are causing
a problem with the cepstra generation. I have not yet trimmed silence but did
check that dithering is on so I would not expect the silence to be an issue.

Any advice would be appreciated. Thank you.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2010-08-18

wave2feat doesn't support sampling rate 44100. It only supports 16k and 8k.
You need to resample your audio with other tool before feeding it into
wave2feat.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

jvs - 2010-08-19

Many thanks. I used sox to resample to 16k and the results look better.

There are still a number of NaN's being emitted which may be due to the code
converting from a double to a float without range checking. This conversion
occurs just before writing the output values.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2010-08-20

I don't think it's range checking. I suggest you to provide files in order to
let us reproduce your problem.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.