CMU Sphinx / Forums / Help: Help with acoustic model adaptation?

Tayyab Atiq - 2014-08-15

I'm doing acoustic model adaptation for pocketsphinx. I have a few confusions:

Must the format of the audio files used for training be .wav? Can I use .raw or .pcm (no header, no bit-rate information) file for training?

Since I'm using freeswitch version of pocketsphinx, and it uses "wsj1" model, I'm assuming that this model has been (a) already trained on call center-like data? Or do (b) I need to train it from one of sphinx dbs or my own data? Or (c) There is some other model which is already trained on call-center data?

The adaptation page says:
"Make sure you are using the full model with the mixture_weights file present."
Then it says:
"Sometimes sendump file can be converted back to mixture_weights file. This is only possible for an older sendump files."
If the answer to (2) was (a), where can I find the full model for this? What does sendump.py return if conversion is not possible?

After the adaptation is complete, is the mixture_weights file also updated? If I need to re-train this newly updated model, can I use this mixture_weights or do I need to convert sendump again?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2014-08-16
  
  Must the format of the audio files used for training be .wav? Can I use .raw or .pcm (no header, no bit-rate information) file for training?
  
  you can do that but make sure that bit width and sample rate properly matches the feature extraction specification
  
  Since I'm using freeswitch version of pocketsphinx, and it uses "wsj1" model, I'm assuming that this model has been (a) already trained on call center-like data?
  
  No, wsj model is trained on broadcast news data
  
  Or do (b) I need to train it from one of sphinx dbs or my own data?
  
  Depends on what you want to get as a result
  
  Or (c) There is some other model which is already trained on call-center data?
  
  There are no good models for call center specifically, most reasonable model to try is en-us-8khz.
  
  After the adaptation is complete, is the mixture_weights file also updated? If I need to re-train this newly updated model, can I use this mixture_weights or do I need to convert sendump again?
  
  Sendump is just a compressed mixture weights for limited resource environments (mobile). If you are running on the server you don't need to deal with sendumps. Also, it's better to use continuous models without sendump, they are more accurate.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Tayyab Atiq - 2014-08-18

you can do that but make sure that bit width and sample rate properly matches the feature extraction specification

I receive an 8k audio (call) which is then saved in a folder by "-rawlogdir" parameter of pocketsphinx. I'm using these saved utterances for training. Do I need to re-sample or anything?

Depends on what you want to get as a result

I want to adapt/train an acoustic model best for:
Device: Telephony
Accent: Generic English accent + South Asian (Indian-English) accent

There are no good models for call center specifically, most reasonable model to try is en-us-8khz.

Is this model trained on any type of data?

Sendump is just a compressed mixture weights for limited resource environments (mobile). If you are running on the server you don't need to deal with sendumps. Also, it's better to use continuous models without sendump, they are more accurate.

You mean give the model by "-hmm" and then give the mixture_weights file by "-mixw" param?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2014-08-19

I receive an 8k audio (call) which is then saved in a folder by "-rawlogdir" parameter of pocketsphinx. I'm using these saved utterances for training. Do I need to re-sample or anything?

No idea, you might receive PCM or ADPCM or something else. You need to keep eye on this.

Accent: Generic English accent + South Asian (Indian-English) accent

Unfortunately there is no good Indian English database and model yet

Is this model trained on any type of data?

No, still it's the most accurate model you can get for free.

You mean give the model by "-hmm" and then give the mixture_weights file by "-mixw" param?

You can use continuous models just with -hmm option. There is no need to use -mixw. Continuous models just doesn't have sendump.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Tayyab Atiq - 2014-08-20

No idea, you might receive PCM or ADPCM or something else. You need to keep eye on this.

When I have to run these files, I need to give these params to vlc: "--demux=rawaud --rawaud-channels 1 --rawaud-samplerate 8000 filename.raw"
And they run fine. So I guess they might be in 8k already.

You can use continuous models just with -hmm option. There is no need to use -mixw. Continuous models just doesn't have sendump.

I mean to ask how should I specify to pocketsphinx while initializing the decoder object to use mixture_weights instead of sendump? Or I don't need to?

Also, can you please elaborate or post a link to a page describing the difference between continuous, semi and PTM model?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2014-08-22

And they run fine. So I guess they might be in 8k already.

Great then

I mean to ask how should I specify to pocketsphinx while initializing the decoder object to use mixture_weights instead of sendump? Or I don't need to?

mixture_weights will be used automatically if present.

Also, can you please elaborate or post a link to a page describing the difference between continuous, semi and PTM model?

http://www.ar.media.kyoto-u.ac.jp/EN/bib/intl/LEE-ICASSP00.pdf

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Help with acoustic model adaptation?

Speech Recognition Toolkit

Forums

Help

Help with acoustic model adaptation? document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Help with acoustic model adaptation?