wave2feat usage

Speech Recognition Toolkit

Brought to you by: air, arthchan2003, awb, bhiksha, and 5 others

This project can now be found here.

wave2feat usage

Forum: Help

Creator: Yakun Hu

Created: 2011-03-02

Updated: 2012-09-22

Yakun Hu - 2011-03-02

Hi,

I am using Sphinxtrain/wave2feat to extract MFCC feature. It is said that
wave2feat only support wav. files with a sampling rate of less than 20kHz. In
fe_interface.c file, i revised "FE->FRAME_SIZE =
(int32)(FE->WINDOW_LENGTHFE->SAMPLING_RATE + 0.5);" into " FE->FRAME_SIZE =
(int32)(FE->WINDOW_LENGTHFE->SAMPLING_RATE);" It is running for 20kHz wav.
files. However, for any same speech wav. file, every time when I extracted the
MFCC feature, it has different values (on average, the difference can be
larger than 0.02). But for those files with a sampling rate of less than
20kHz, the difference is still there but very slight that can be ignored.
(e.g. less than 0.005).

I have two questions. First, is wave2feat producing very slight difference for
a same wav. file and why? Second, does it mean that wave2feat is not able to
support 20kHz files, even with revision mentioned above. Because I revised it
and used it for speaker recognition on a set of 20kHz wav. files and the
performance is excellent.

Your help is highly appreciated!

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2011-03-02

Small differences are caused by a random noise added to avoid numerical
overflow on the regions of zero energy. To turn this feature off use an option
"-dither no"

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Yakun Hu - 2011-03-03

Many thanks. I turned the feature off and got stable feature now. I still have
a question, why the difference in the case of 20khz is larger than the one in
the case of 16kHz? I wonder if wave2feat can produce accurate MFCC feature for
20kHz files? Can I use wave2feat for 20kHz files(with the revision I
mentioned)?
Thank you very much!

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2011-03-03

I still have a question, why the difference in the case of 20khz is larger
than the one in the case of 16kHz?

It distorts every sample. More samples to distort means bigger difference
total

I wonder if wave2feat can produce accurate MFCC feature for 20kHz files?

Yes

Can I use wave2feat for 20kHz files(with the revision I mentioned)?

It's actually recommended to use sphinx_fe from sphinxbase than wave2feat.
Wave2feat will be removed soon. You should be able to use both for 20kHz
files.

I don't think that your revision is important and have any reason behind it.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Yakun Hu - 2011-05-24

Thanks for the explanation. Benefit a lot.
Now I wonder if wave2feat support two-channel speech and speech with a
sampling rate higher than 20kHz. I downloaded some mp3 files for the input.
many thanks!

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Yakun Hu - 2011-05-24

Thanks for the explanation. Benefit a lot.

Now I wonder if the speech files I am using for the input has more than one
channel and has a sampling rate higher than 20kHz. Can i still use wave2feat
to extract mfcc? Many thanks!

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Pranav Jawale - 2011-05-24

You can split the two channels and pass on two separate files to wave2feat/
sphinx_fe.
Use Audacity audio editor to split a stereo wav file.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Yakun Hu - 2011-05-24

Thanks a lot. So you mean I can use the result obtained from using any of the
two separate files? They can yield the same result? Also what about the
sampling rate. Does the code support files with a sampling rate higher than
20kHz?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Pranav Jawale - 2011-05-24

It depends on in which channel your speech is! Better still you can merge the
two channels into a single channel (L+R) and then extract features from it,
this way you'll not lose any info.

I dont know if there is any limit on sampling rate, but even if there is, you
can downsample it to 16 / 20 kHz.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Yakun Hu - 2011-05-24

Thanks. Is there any tool that can be used for converting .mp3 to .wav and
also doing downsampling?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.