Menu

Help with acoustic model adaptation?

Help
2014-08-15
2016-04-28
  • Tayyab Atiq

    Tayyab Atiq - 2014-08-15

    I'm doing acoustic model adaptation for pocketsphinx. I have a few confusions:

    1. Must the format of the audio files used for training be .wav? Can I use .raw or .pcm (no header, no bit-rate information) file for training?
    2. Since I'm using freeswitch version of pocketsphinx, and it uses "wsj1" model, I'm assuming that this model has been (a) already trained on call center-like data? Or do (b) I need to train it from one of sphinx dbs or my own data? Or (c) There is some other model which is already trained on call-center data?
    3. The adaptation page says:
      "Make sure you are using the full model with the mixture_weights file present."
      Then it says:
      "Sometimes sendump file can be converted back to mixture_weights file. This is only possible for an older sendump files."
      If the answer to (2) was (a), where can I find the full model for this? What does sendump.py return if conversion is not possible?
    4. After the adaptation is complete, is the mixture_weights file also updated? If I need to re-train this newly updated model, can I use this mixture_weights or do I need to convert sendump again?
     
    • Nickolay V. Shmyrev

      Must the format of the audio files used for training be .wav? Can I use .raw or .pcm (no header, no bit-rate information) file for training?

      you can do that but make sure that bit width and sample rate properly matches the feature extraction specification

      Since I'm using freeswitch version of pocketsphinx, and it uses "wsj1" model, I'm assuming that this model has been (a) already trained on call center-like data?

      No, wsj model is trained on broadcast news data

      Or do (b) I need to train it from one of sphinx dbs or my own data?

      Depends on what you want to get as a result

      Or (c) There is some other model which is already trained on call-center data?

      There are no good models for call center specifically, most reasonable model to try is en-us-8khz.

      After the adaptation is complete, is the mixture_weights file also updated? If I need to re-train this newly updated model, can I use this mixture_weights or do I need to convert sendump again?

      Sendump is just a compressed mixture weights for limited resource environments (mobile). If you are running on the server you don't need to deal with sendumps. Also, it's better to use continuous models without sendump, they are more accurate.

       
  • Tayyab Atiq

    Tayyab Atiq - 2014-08-18

    you can do that but make sure that bit width and sample rate properly matches the feature extraction specification

    I receive an 8k audio (call) which is then saved in a folder by "-rawlogdir" parameter of pocketsphinx. I'm using these saved utterances for training. Do I need to re-sample or anything?

    Depends on what you want to get as a result

    I want to adapt/train an acoustic model best for:
    Device: Telephony
    Accent: Generic English accent + South Asian (Indian-English) accent

    There are no good models for call center specifically, most reasonable model to try is en-us-8khz.

    Is this model trained on any type of data?

    Sendump is just a compressed mixture weights for limited resource environments (mobile). If you are running on the server you don't need to deal with sendumps. Also, it's better to use continuous models without sendump, they are more accurate.

    You mean give the model by "-hmm" and then give the mixture_weights file by "-mixw" param?

     
  • Nickolay V. Shmyrev

    I receive an 8k audio (call) which is then saved in a folder by "-rawlogdir" parameter of pocketsphinx. I'm using these saved utterances for training. Do I need to re-sample or anything?

    No idea, you might receive PCM or ADPCM or something else. You need to keep eye on this.

    Accent: Generic English accent + South Asian (Indian-English) accent

    Unfortunately there is no good Indian English database and model yet

    Is this model trained on any type of data?

    No, still it's the most accurate model you can get for free.

    You mean give the model by "-hmm" and then give the mixture_weights file by "-mixw" param?

    You can use continuous models just with -hmm option. There is no need to use -mixw. Continuous models just doesn't have sendump.

     
  • Tayyab Atiq

    Tayyab Atiq - 2014-08-20

    No idea, you might receive PCM or ADPCM or something else. You need to keep eye on this.

    When I have to run these files, I need to give these params to vlc: "--demux=rawaud --rawaud-channels 1 --rawaud-samplerate 8000 filename.raw"
    And they run fine. So I guess they might be in 8k already.

    You can use continuous models just with -hmm option. There is no need to use -mixw. Continuous models just doesn't have sendump.

    I mean to ask how should I specify to pocketsphinx while initializing the decoder object to use mixture_weights instead of sendump? Or I don't need to?

    Also, can you please elaborate or post a link to a page describing the difference between continuous, semi and PTM model?

     
  • Nickolay V. Shmyrev

    And they run fine. So I guess they might be in 8k already.

    Great then

    I mean to ask how should I specify to pocketsphinx while initializing the decoder object to use mixture_weights instead of sendump? Or I don't need to?

    mixture_weights will be used automatically if present.

    Also, can you please elaborate or post a link to a page describing the difference between continuous, semi and PTM model?

    http://www.ar.media.kyoto-u.ac.jp/EN/bib/intl/LEE-ICASSP00.pdf

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.