Menu

MMIE, LDA-MLLT

creative64
2010-07-26
2012-09-22
  • creative64

    creative64 - 2010-07-26

    Hi,

    Have few basic questions for some commonly seen terms in SpeechRec/Sphinx.
    Couple of lines explaination or pointer to
    a link describing these will be enough.

    1. Are MMIE and MLE basically different ways to arrive at HMM parameters ? Does decoder need to know if the acoustic
      model was trained using MMIE or MLE algorithms ?

    2. What exactly is LDA/MLLT feature translation ? Does decoder need to know if the acoustic model was trained using
      LDA/MLLT ? Do these go together I mean is there anything like only LDA or only
      MLLT transformation ?

    3. Can all of these techniques be applied to semicontinuous acoustic models ?

    4. Which of the above techniques were used to train hub4wsj_sc_8k model ?

    5. What does 8k in hub4wsj_sc_8k signify ? Is it the number of senones or the sampling frequency used to create the model ?

    Thanks and regrads,

     
  • Nickolay V. Shmyrev

    1. Are MMIE and MLE basically different ways to arrive at HMM parameters ?

    They are different ways to estimate the parameters

    Does decoder need to know if the acoustic model was trained using MMIE or
    MLE algorithms ?

    No

    1. What exactly is LDA/MLLT feature translation?

    Feature vector is multiplied on matrix. In LDA case this matrix is choosen to
    reduce dimension of the feature vector by selection
    of main components. In MLLT case additional property of diagonal covariance is
    improved.

    Does decoder need to know if the acoustic model was trained using LDA/MLLT?

    Yes

    Do these go together I mean is there anything like only LDA or only MLLT
    transformation ?

    There can be only LDA or only MLLT, but in practice it's single matrix which
    is a multiplication of MLLT matrix on LDA matrix.

    1. Can all of these techniques be applied to semicontinuous acoustic models
      ?

    No

    1. Which of the above techniques were used to train hub4wsj_sc_8k model?

    MMIE probably, I'm not sure. As semicontinuous model it doesn't use feature
    space transformation.

    1. What does 8k in hub4wsj_sc_8k signify ? Is it the number of senones or
      the sampling frequency used to create the model ?

    Sample rate. See in feat.params -upperf 4000 which means that 8khz audio could
    be decoded with this model.

     
  • creative64

    creative64 - 2010-07-26

    Thanks a lot NS.

     
  • creative64

    creative64 - 2010-07-27

    Hi NS,

    Few more question as an afterthought:

    1. What is the data bit-width of data (8 bit or 16 bit) used to create hub4wsj_sc_8k model ? For optimum decoding accuracy, the bit width of "data to be decoded" should match with the with of "training data" right ? Is this value specified somewhere in the model definition files ?

    2. For training an acoustic model using sphinxtrain, where exactly do I need to specify the parameters (say in order to get a feat.params exactly like that of hub4wsj_sc_8k ) ? I have 16 bit audio recorded at 16 Khz.

    Thanks and regards,

     
  • Nickolay V. Shmyrev

    1. What is the data bit-width of data (8 bit or 16 bit) used to create
      hub4wsj_sc_8k model ? For optimum decoding accuracy, the bit width of "data to
      be decoded" should match with the with of "training data" right? Is this value
      specified somewhere in the model definition files?

    The bit width must always be 16. There is no such configuration because 8 is
    just not supported.

    1. For training an acoustic model using sphinxtrain, where exactly do I
      need to specify the parameters (say in order to get a feat.params exactly like
      that of hub4wsj_sc_8k ) ? I have 16 bit audio recorded at 16 Khz.

    Right now you need to edit ./scripts_pl/make_feats.pl. The following section

     $params = <<"EOP";
    -alpha 0.97
    -dither yes
    -doublebw no
    -nfilt 40
    -ncep 13
    -lowerf 133.33334
    -upperf 6855.4976
    -nfft 512
    -wlen 0.0256
    
     
  • creative64

    creative64 - 2010-07-28

    Got it. Thanks NS.

    Regards,

     
  • creative64

    creative64 - 2010-07-30

    Hi NS,

    I'm trying to train a semicont acoustic model for pocketsphinx with exactly
    the same parameters as for hub4wsj_sc_8k. Looking at
    ./scripts_pl/make_feats.pl and ./etc/sphinx_train.cfg have following
    questions:

    1. Where do I specify
      -transform dct
      -round_filters no
      -remove_dc yes
      -svspec 0-12/13-25/26-38
      -cmninit 56,-3,1

    2. What do -svspec and -cmninit specify ?

    3. An acoustic model generated from default an4 settings has parameters like
      -alpha 0.97
      -dither yes
      -doublebw no
      -ncep 13
      which are not seen in hub4wsj_sc_8k parameters. Why ?

    4. What is the meaning of streams ? Does 1s_c_d_dd specify one stream ?

    5. Is 1s_c_d_dd interpreted as 1 stream, cepstrun coefficients, delta and double delta ?

    6. Feature vector length in this case will be 39 right ?

    7. How to interpret s2_4x ?

    Thanks and Regards,

    PS:
    FYI..

    hub4wsj_sc_8k parameters...
    -nfilt 20
    -lowerf 1
    -upperf 4000
    -wlen 0.025
    -transform dct
    -round_filters no
    -remove_dc yes
    -svspec 0-12/13-25/26-38
    -feat 1s_c_d_dd
    -agc none
    -cmn current
    -cmninit 56,-3,1
    -varnorm no

    Default an4.cd_semi_1000 parameters...
    -alpha 0.97
    -dither yes
    -doublebw no
    -nfilt 40
    -ncep 13
    -lowerf 133.33334
    -upperf 6855.4976
    -nfft 512
    -wlen 0.0256
    -transform legacy
    -feat s2_4x
    -agc none
    -cmn current
    -varnorm no

     
  • Nickolay V. Shmyrev

    1. Where do I specify -transform dct -round_filters no -remove_dc yes

    In make_feats.pl

    -svspec 0-12/13-25/26-38

    in sphinx_train.cfg configuration variable CFG_SVSPEC

    -cmninit 56,-3,1

    In feat.params after training

    1. What do -svspec and -cmninit specify ?

    svspec - specification for subvector quantization, specify which features to
    put in each stream

    cmninit - initial value for live CMN. CMN values are printed for each
    utterance. In order to guess value for CMN quickly, initial CMN value should
    be close to the average cepstral mean value.

    1. An acoustic model generated from default an4 settings has parameters
      like -alpha 0.97 -dither yes -doublebw no -ncep 13 which are not seen in
      hub4wsj_sc_8k parameters. Why ?

    They are default no need to specify them

    1. What is the meaning of streams ?

    Each stream is modelled with own gaussian distribution, so if parts of feature
    vector are in theory independant, there is sense to use streams. You can find
    more information in a textbook.

    Does 1s_c_d_dd specify one stream ?

    Yes

    1. Is 1s_c_d_dd interpreted as 1 stream, cepstrun coefficients, delta and
      double delta ?

    Yes, but it's only abbreviation. 1s_c_dd has no meaning for example

    1. Feature vector length in this case will be 39 right ?

    Yes

    1. How to interpret s2_4x ?

    4 streams, 51 coefficient.

    1. Cepstrum without c0 (12 coefficients)
    2. Deltas with step 2 + Deltas with step 4 (24 coefficients)
    3. c0, delta c0, delta-delta c0 (3 coefficients)
    4. Delta-delta without c0 (12 coefficients)
     
  • creative64

    creative64 - 2010-07-31

    Thanks NS,

    Am I setting these parameters correctly in sphinx_train.cfg (for ?

    $CFG_VECTOR_LENGTH = 39

    $CFG_FEATURE = "1s_c_d_dd";
    $CFG_NUM_STREAMS = 4; ----- or should it be 1 ?
    $CFG_INITIAL_NUM_DENSITIES = 256;
    $CFG_FINAL_NUM_DENSITIES = 256;

    Regards,

     
  • creative64

    creative64 - 2010-08-02

    Hi NS,

    Please ignore the above question. I've some more and have clubbed them
    together:

    1. Parameters "remove_dc", "transform", "round_filters" and "cmninit" seem to be applicable only in decoding phase (not able to find them anywhere in SphixTrain directory hierarchy !!!). Is this understanding correct ?

    2. If yes then can they all be put in feat.PARAMS after the training is done (you already mentioned that cmninit needs to be put
      after the training) ?

    3. Do following setting look OK in sphinx_train.cfg (for a hub4wsj_sc_8k like semicont model for pocketsphinx) ?
      $CFG_VECTOR_LENGTH = 39
      $CFG_FEATURE = "1s_c_d_dd";
      $CFG_NUM_STREAMS = 4; ----- or should it be 1 ?
      $CFG_INITIAL_NUM_DENSITIES = 256;
      $CFG_FINAL_NUM_DENSITIES = 256;

    Thanks and regards,

     
  • Nickolay V. Shmyrev

    1. Parameters "remove_dc", "transform", "round_filters" and "cmninit" seem
      to be applicable only in decoding phase (not able to find them anywhere in
      SphixTrain directory hierarchy !!!). Is this understanding correct?

    Please don't ask me questions you can answer yourself.

    1. If yes then can they all be put in feat.PARAMS after the training is
      done (you already mentioned that cmninit needs to be put after the training) ?

    Yes, I already mentioned that

    1. Do following setting look OK in sphinx_train.cfg (for a hub4wsj_sc_8k
      like semicont model for pocketsphinx) ? $CFG_VECTOR_LENGTH = 39 $CFG_FEATURE =
      "1s_c_d_dd"; $CFG_NUM_STREAMS = 4; ----- or should it be 1 ?
      $CFG_INITIAL_NUM_DENSITIES = 256; $CFG_FINAL_NUM_DENSITIES = 256; Thanks and
      regards,

    No idea, why don't you just try and see if it works.

     
  • creative64

    creative64 - 2010-08-03

    Please don't ask me questions you can answer yourself.

    Wanted to be doubly sure (Grepping in Win7 is having a weired bahavior)...
    will be careful from next time. Sorry about this.

    1. When I set

    $CFG_VECTOR_LENGTH = 13;
    $CFG_FEATURE = "1s_c_d_dd";
    $CFG_NUM_STREAMS = 1;
    $CFG_INITIAL_NUM_DENSITIES = 256;
    $CFG_FINAL_NUM_DENSITIES = 256

    My training goes file however when I set $CFG_VECTOR_LENGTH = 39 (Which I
    think should be the case because
    $CFG_FEATURE = "1s_c_d_dd" ) I get a message
    "Expected vector length of 39, got 26" and the training aborts.

    1. In either of the above mentioned cases cases when I set $CFG_SVSPEC = 0-12/13-25/26-38 training aborts and I get following
      message in the logfile:

    ERROR: "........\src\libs\libcommon\cmd_ln.c", line 525: Expecting 'C:\User
    s\Amit\my_data\amit\Technology\Speech\Sphinx\PocketSphinx\hub4wsj_type_local_m
    odel\an4\bin\bw.exe -switch_1 <arg_1> -switch_2 <arg_2> ...' </arg_2></arg_1>

    I also see a value of "-svspec -39.8846153846154" in this logfile as if
    0-12/13-25/26-38 is evaluated as matematical expression and put for svspec.

    Any hints, comments, suggestions (Note: As I mentioned earlier my aim is to
    have a model with parameter compatibility to hub4wsj_sc_8k) ?

    Note: I might be doing something fundamentally wrong here but am not able to
    figure out.

    Thanks and regards,

     
  • Nickolay V. Shmyrev

    Then I set $CFG_VECTOR_LENGTH = 13; $CFG_FEATURE = "1s_c_d_dd";
    $CFG_NUM_STREAMS = 1; $CFG_INITIAL_NUM_DENSITIES = 256;
    $CFG_FINAL_NUM_DENSITIES = 256 My training goes file however when I set
    $CFG_VECTOR_LENGTH = 39 (Which I think should be the case because $CFG_FEATURE
    = "1s_c_d_dd" ) I get a message "Expected vector length of 39, got 26" and the
    training aborts.

    Vector length is the length of cepstrum vector, not feature vector. It should
    be 13, not 39.

    cases cases when I set $CFG_SVSPEC = 0-12/13-25/26-38

    You need quotes, don't you

     $CFG_SVSPEC = "0-12/13-25/26-38";
    
     
  • creative64

    creative64 - 2010-08-04

    $CFG_SVSPEC = "0-12/13-25/26-38";

    Thanks NS. Due to my fixation with vector length and limited knowledge of
    perl, I totally missed it.

    With the all changes discussed above I'm able to train the model now. One last
    hitch remains it seems.

    1. When I try to run this model (the one with same parameters as hub4wsj_sc_8k ), the decoder crashes when I run
      a decoding session.

    2. If I just change "-transform dct" to "-transform legacy" in feat.params, decoding works perfectly fine with excellent
      accuracy.

    PS:
    pocketsphinx 0.6, sphinxtrain nightly build (dated 12 July 2010).
    I have data for 4 speakers (100 short utterance each) totalling about 0.27 hrs. 16Khz, mono, 16 bit.
    WIth this data I had earlier trained an AN4 like model (model with parameters similar to the one for default AN4). That also
    works perfectly fine. But if I change the transform to dct in this model (not
    sure if it is permissible to do this however...),
    this model too start crashing like the other one.
    ----- As if their is an issue with dct transform.
    hub4wsj_sc_8k decodes perfectly fine without any issues.

    Any clues !

    Thanks and regards,

    Amit.

     
  • Nickolay V. Shmyrev

    1. When I try to run this model (the one with same parameters as
      hub4wsj_sc_8k ), the decoder crashes when I run a decoding session.

    You need to provide details of the crash if you need help on this. You need to
    provide backtrace at least.

    1. If I just change "-transform dct" to "-transform legacy" in feat.params,
      decoding works perfectly fine with excellent accuracy.

    -transform must be in scripts_pl/make_feats.pl from the early feature extraction. It seems you forgot to put it there and trained your model with legacy transform instead of dct one.

     
  • creative64

    creative64 - 2010-08-09

    Hi NS,

    1. I've put my scripts_pl/make_feats.pl at following link. I'm putting -transform dct there. Is there something else wrong with the
      script ? http://www.mediafire.com/file/8go1e96680df646/make_feats.pl

    2. Do round_filters and remove_dc too need to match for training and decoding ?

    3. You need to provide details of the crash
      Had tried doing some tracing. The failure seems to happen in fsg_search.c
      (line nos 1062 to 1069) in
      the while loop "while (frm == last_frm)", "fl =
      fsg_hist_entry_fsglink(hist_entry)" fl becomes a NULL pointer and line
      "if ((!final) || fsg_link_to_state(fl) == fsg_model_final_state(fsg))" results
      in error due to fl becoming a NULL pointer.This
      happens when "bpidx" goes to 0.

    But I suspect the fundamental problem is somewhere in make_feats.pl file.

    1. Another observation (though it might be irrelevant here) is that if I run this decoding session with "hub4wsj_sc_8k" model,
      decoding works perfect. When I change the transform to "htk" in feat.params
      (of hub4wsj_sc_8k) , the decoding still works perfect
      but when I change it to "legacy", it gives me more than 100% WER (doesn't
      crash though !).

    Regards,

    Amit.

     
  • Nickolay V. Shmyrev

    I've put my scripts_pl/make_feats.pl at following link. I'm putting
    -transform dct there.

    Transform option should go together with upperf option, lowerf option and
    nfilt option. You placed it incorrectly

    1. Do round_filters and remove_dc too need to match for training and
      decoding

    Yes

    results in error due to fl becoming a NULL pointer.

    K, this issue now is fixed in trunk

    When I change the transform to "htk" in feat.params (of hub4wsj_sc_8k) , the
    decoding still works perfect but when I change it to "legacy", it gives me
    more than 100% WER (doesn't crash though !).

    dct and htk are actually identical. So this goes as it should be

     
  • creative64

    creative64 - 2010-08-11

    Hi NS,

    Transform option should go together with upperf option, lowerf option and
    nfilt option. You placed it incorrectly

    When I put transform option together with options that you listed above I get
    an error "ERROR: "........\src\libs\libcommon\cmd_ln.c", line 551: Unknown
    switch -transform seen", "ERROR: "........\src\libs\libcommon\cmd_ln.c",
    line 525: Expecting 'bin/wave2feat -switch_1 <arg_1> -switch_2 <arg_2> ...' </arg_2></arg_1>

    This had puzzled me earlier also, parameters like "-transform, -remove_dc and
    round_filters" don't seem to be valid arguments for
    wave2feat whereas they are valid for sphinx_fe (from sphinxbase).
    make_feats.pl tries to run wave2feat.

    What might be missing in my setup ?

    Regards,

    PS: Is pocketsphinxbase also needed for training ? Currently I'm having only
    an4 and SphinxTrain in my training setup.

     
  • Nickolay V. Shmyrev

    This had puzzled me earlier also, parameters like "-transform, -remove_dc
    and round_filters" don't seem to be valid arguments for wave2feat whereas they
    are valid for sphinx_fe (from sphinxbase). make_feats.pl tries to run
    wave2feat.

    Yes, you need to run sphinx_fe from sphinxbase instead wave2feat

     
  • creative64

    creative64 - 2010-08-11

    Yes, you need to run sphinx_fe from sphinxbase instead wave2feat

    Thanks NS. This was the missing link that caused all doubts.... Everything is
    working fine now.

    I'm observing that when I train the model with dct, "Current overall
    likelyhood per frame comes to the order of -58 however
    when I train with legay it comes to the order of +15. WER is good (with
    training set) in both the cases. Just curious to know
    what does this value actually signify and is it related to perfectness of the
    model !

    Regards,

     

Log in to post a comment.