Menu

LDA and MLLT issue

Help
UF grad
2008-08-30
2012-09-22
  • UF grad

    UF grad - 2008-08-30

    I am trying to build a model with LDA and MLLT by following instruction in

    http://www.speech.cs.cmu.edu/cmusphinx/moinmoin/LDAMLLT

    When I built, I ran the script script_pl/RunAll_CDMLLT.pl

    on that website, it mentioned that I should get a directory like model_parameters/rm.mllt_cd_cont_1000

    However, I do not get that directory, here are what I got.

    rm1.cd_cont_1000
    rm1.cd_cont_1000_1
    rm1.cd_cont_1000_2
    rm1.cd_cont_1000_4
    rm1.cd_cont_1000_8
    rm1.cd_cont_initial
    rm1.cd_cont_untied
    rm1.ci_cont
    rm1.ci_cont_flatinitial
    rm1.falign_ci_cont
    rm1.lda # file , not directory
    rm1.lda_cont_1000
    rm1.lda_cont_flatinitial

    I then grep the log fies and see these line

    46.lda_train/rm1.N-1.bw.log:ERROR: "s3gau_full_io.c", line 129: Failed to read full covariance file /home/ee01/cmuspeech/tutorial/rm1/model_parameters/rm1.cd_cont_1000/variances (expected 1245699 values, got 31941)

    47.mllt_train/rm1.N-1.bw.log:ERROR: "s3gau_full_io.c", line 129: Failed to read full covariance file /home/charoe/cmuspeech/tutorial/rm1/model_parameters/rm1.lda_cont_1000/variances (expected 904075 values, got 31175)

    I am not sure what that meant? It looks like the number of matrix are different between each run. I think it may be because some of the audio is not trained successfully?? Some of them got "final state is not reached" as follows

    30.cd_hmm_untied/rm1.1-1.bw.log:ERROR: "baum_welch.c", line 331: M3/enus36 ignored
    30.cd_hmm_untied/rm1.1-1.bw.log:utt> 658 enus39 944 0 108 29 ERROR: "backward.c", line 431: final state not reached

    Is this really a problem? I thought "final state is not reach is not a critical err.

    I also confused if LDA and MLLT is really the same thing or is it different?
    I have to admit I do not fully understand the theory, i guess it is just a dimention reduction, but not sure if they have to be used in combination since I do get the LDA but not MLLT.

     
    • UF grad

      UF grad - 2008-09-01

      Yes. I did turn on $CFG_LDAMLLT = 'yes';

      I just re-run with RunAll.pl still did not get it to build. The log file does not show anything that might indicate a problem, other than those "final state not reach".

      Has anyone successfully train the model with LDAMLLT using sphinx3?

       
      • Nickolay V. Shmyrev

        > Has anyone successfully train the model with LDAMLLT using sphinx3?

        Sure, there are some mllt models here for example :)

        http://www.speech.cs.cmu.edu/sphinx/models/

         
    • David Huggins-Daines

      Did you remember to change the line in etc/sphinx_train.cfg that contains the $CFG_LDAMLLT variable to read:

      $CFG_LDAMLLT = 'yes';

      Aside from that, it's possible that RunAll_CDMLLT.pl is currently broken. Try again using plain old RunAll.pl...

      LDA and MLLT are two separate techniques for optimizing the acoustic features used in recognition. The idea behind LDA is to transform the feature space to maximize the separation between acoustic classes. The idea behind MLLT is to maximize the likelihood of the training data under the assumption that the covariance matrix for each class is diagonal.

      In practice it helps to apply both of them in sequence, so that's what we do.

       
  • dang-khoa.nguyen

    i has the same problem:
    47.mllt_train/rm1.N-1.bw.log:ERROR: "s3gau_full_io.c", line 129: Failed to
    read full covariance file /home/charoe/cmuspeech/tutorial/rm1/model_parameters
    /rm1.lda_cont_1000/variances (expected 904075 values, got 31175)

    and here is the lda estimation logs file :
    Sw:
    [

    ...,

    ]
    Sb:
    [

    ...,

    ]
    Traceback (most recent call last):
    File "/media/DATA2/Khoa.nd/thang.10/am_18052010/tone_on_allsyllable/lda/14pers
    ionals_5fr_5000_8gaus_18052010/python/cmusphinx/lda.py", line 77, in <module>
    lda = makelda(gauden)
    File "/media/DATA2/Khoa.nd/thang.10/am_18052010/tone_on_allsyllable/lda/14pers
    ionals_5fr_5000_8gaus_18052010/python/cmusphinx/lda.py", line 55, in makelda
    u, v = numpy.linalg.eig(BinvA)
    File "/usr/lib/python2.6/dist-packages/numpy/linalg/linalg.py", line 790, in
    eig
    _assertFinite(a)
    File "/usr/lib/python2.6/dist-packages/numpy/linalg/linalg.py", line 118, in
    _assertFinite
    raise LinAlgError, "Array must not contain infs or NaNs"
    numpy.linalg.linalg.LinAlgError: Array must not contain infs or NaNs
    Wed May 19 13:59:53 2010 </module>

    could i know what is wrong here ?

     
  • Nickolay V. Shmyrev

    It looks like you are using some outdated sphinxtrain. Update please.

     
  • dang-khoa.nguyen

    i have updated the nighty build and it work. but i got a new bug
    "init_gau: feat.c:268: feat_read_lda: Assertion `lda.lda_cols ==
    feat_conf.blksize()' failed."

    my config is: continuous training
    $CFG_FEATURE = "1s_c";

    $CFG_LDA_MLLT = 'yes';
    $CFG_FEAT_WINDOW=3;
    $CFG_LDA_DIMENSION = 29;

    Could you help me fix this bug ?

     
  • dang-khoa.nguyen

    i try to install numpy :
    sudo apt-get install python-numpy
    it said :
    python-numpy is already the newest version.
    0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
    is it ok ?

     
  • Nickolay V. Shmyrev

    And what about scipy? Is it functional? It's probably easier to provide all
    logs in logdir subfolder, they could contain useful hints.

     
  • dang-khoa.nguyen

    hi sir, i belive in scipy be functional, i try to test with large data (about
    15h training data, before is about 7h training data) it work fine with default
    config ($CFG_FEAT_WINDOW=3;).
    when feat_window increments to 4, 5 it's error again. But when feat_window
    decrements to 2 it work fine. I guess the problem is in amount of data
    training. Do you think so ?

     
  • Nickolay V. Shmyrev

    it work fine with default config ($CFG_FEAT_WINDOW=3;).

    Matrix estimation is not always stable and might fai, but it shouldn't depend
    on window size. You need to investigate mllt.py log, sometimes it helps just
    to run it second time. Try to run

    ./scripts_pl/02.mllt_train/mllt_train.pl

    by hand until it will create model_parameters/rm1.mllt

    I guess the problem is in amount of data training. Do you think so ?

    No, it even works with an4

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.