Menu

mixw_cb in pocketsphinx

Help
creative64
2012-05-16
2012-09-22
  • creative64

    creative64 - 2012-05-16

    Hi

    What does array mixw_cb (mixture weight codebook) signify in Pocketsphinx ?
    For an acoustic model, how is it decided whether to have it or not ?

    Thanks and regards,

     
  • Nickolay V. Shmyrev

    For an acoustic model, how is it decided whether to have it or not ?

    Sendump file can have several formats for mixtures, they are detected when
    loading. If sendump is quantized (can be created with python script from
    sphinxtrain) then mixw_cb is loaded from sendump.

    You can find some theory behind that here:
    "Combining Mixture Weight Pruning and Quantization for Small-Footprint Speech
    Recognition" David Huggins-Daines and Alexander I. Rudnicky. Proceedings of
    ICASSP-2009, Taipei, Taiwan, April 2009.

    http://www.cs.cmu.edu/~dhuggins/Publications/mixw_quant.pdf

     
  • creative64

    creative64 - 2012-05-16

    Thanks NS,

    1. Looks like sendump quantization doesn't affect accuracy of model. Does is have anything to do with number
      of senones ? ie if I have small number of senones (say 1000) will sendump
      quantization introduce more error
      as opposed to a case where number of senone is large (say 5000+) >

    2. I have a model which doesn't have mixw_cb (Model was trained using sphinxtrain some time back) that means
      sendump is not quantized in this. Is there a way to directly convert this into
      a model where sendump is
      quantized or I have to retrain the model from scratch?

    Thanks and regards,

     
  • Nickolay V. Shmyrev

    1. Looks like sendump quantization doesn't affect accuracy of model. Does
      is have anything to do with number of senones ? ie if I have small number of
      senones (say 1000) will sendump quantization introduce more error as opposed
      to a case where number of senone is large (say 5000+) >

    Unlikely

    1. I have a model which doesn't have mixw_cb (Model was trained using
      sphinxtrain some time back) that means sendump is not quantized in this. Is
      there a way to directly convert this into a model where sendump is quantized
      or I have to retrain the model from scratch?

    There is a python script, I wrote you above.

     
  • Nickolay V. Shmyrev

    pocketsphinx/scripts/quantize_mixw.py

     
  • Pankaj

    Pankaj - 2012-05-17

    Hi Nicole,

    What is the command line syntax for using quantize_mixw.py.
    We have trained a model similar to hub4 (link given below)

    http://www.mediafire.com/?cj9lmfdhhpd63px

    When I try to quantize it using the command
    python quantize_mixw.py an4.cd_semi_1000_hub4wsj_type/sendump sendumpq.

    The resulting sendumpq file contains all zeroes after FORMAT DESCRIPTION
    header.

    What could be going wrong?
    Kindly help us in converting the model.

    Regards
    Pankaj

     
  • Nickolay V. Shmyrev

    [shmyrev@gnome scripts]$ python quantize_mixw.py mixture_weights sendump1
    min log mixw: -16.117915 range: 15.931646
    Total distortion: 4.305554e+05 convergence ratio: 1.000000e+00
    Total distortion: 1.403788e+05 convergence ratio: 6.739589e-01
    Total distortion: 1.044983e+05 convergence ratio: 2.555979e-01
    Total distortion: 8.738109e+04 convergence ratio: 1.638034e-01
    Total distortion: 7.669821e+04 convergence ratio: 1.222562e-01
    Total distortion: 6.923910e+04 convergence ratio: 9.725272e-02
    Total distortion: 6.371291e+04 convergence ratio: 7.981313e-02
    Total distortion: 5.945407e+04 convergence ratio: 6.684429e-02
    Total distortion: 5.608071e+04 convergence ratio: 5.673896e-02
    Total distortion: 5.340947e+04 convergence ratio: 4.763203e-02
    [ -1.29778371 -14.67499854  -7.81410748 -13.8030043   -6.31229994
     -15.6150023   -4.93788463 -11.40759421 -13.00321106  -9.2244332
      -2.92451104 -10.42548324 -12.23019545  -2.11868971  -3.81111694]
    

    Works fine for me. The new file sendump1 looks ok. It also can input sendump
    but it didn't work for me for some reason. Maybe it's some issue with the
    algorithm.

     
  • Nickolay V. Shmyrev

    I've just moved quantize_mixw script to Sphinxtrain and fixed it to read
    sendump properly. Now it works with sendump too, not just with mixture
    weights.

     
  • Pankaj

    Pankaj - 2012-05-28

    Hi,

    There are still some issues with quantize_mixw script, when sendump file is
    used as an input. It seems that the script is detecting wrong number of
    senones. For eg with the model mentioned earlier (link repeated below)

    http://www.mediafire.com/?cj9lmfdhhpd63px.

    The resulting sendump file has 1116 senones, while it should actually contain
    1114 senones. During initialization of pocketsphinx with the above sendump
    file the following error message is received.

    INFO: s2_semi_mgau.c(1132): Loading senones from dump file
    /examples/bin/testdata/model/an4.cd_semi_1000_hub4wsj_type/sendump
    INFO: s2_semi_mgau.c(1156): BEGIN FILE FORMAT DESCRIPTION
    ERROR: "s2_semi_mgau.c", line 1233: Number of senones mismatch: 1116 != 1114
    INFO: acmod.c(122): Attempting to use PTHMM computation module
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
    /examples/bin/testdata/model/an4.cd_semi_1000_hub4wsj_type/means
    INFO: ms_gauden.c(292): 1 codebook, 3 feature, size
    256x13 256x13 256x13
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
    /examples/bin/testdata/model/an4.cd_semi_1000_hub4wsj_type/variances
    INFO: ms_gauden.c(292): 1 codebook, 3 feature, size
    256x13 256x13 256x13
    INFO: ms_gauden.c(356): 26 variance values floored
    INFO: ptm_mgau.c(473): Loading senones from dump file
    /examples/bin/testdata/model/an4.cd_semi_1000_hub4wsj_type/sendump
    INFO: ptm_mgau.c(497): BEGIN FILE FORMAT DESCRIPTION
    ERROR: "ptm_mgau.c", line 574: Number of senones mismatch: 1116 != 1114
    INFO: acmod.c(124): Falling back to general multi-stream GMM computation

    Regards
    Pankaj

     
  • Nickolay V. Shmyrev

    It seems that the script is detecting wrong number of senones.

    I think here the source sendump itself doesn't belong to the model. Original
    sendump indeed has 1116 senones, it looks like it was taken from some other
    model or there was some issue when sendump was created.

     
  • Pankaj

    Pankaj - 2012-05-31

    Hi,
    The sendump file was not taken from another model, it is as was created during
    the adaptation process using sphinxtrain. We are facing a similar problem with
    another model which was adapted from hub4wsj_sc_8k.(link given below.

    http://www.mediafire.com/?z32w3669zp8vgae

    In this model the mdef file shows that the number of senones is 5150. But when
    we pass the sendump file as input to quantize_mixw.py script the number of
    senones is detected as 5152. Here also the difference is of 2 senones. In the
    model mentioned in the earlier post also the difference was of 2 senones. Any
    idea what could be going wrong.

     
  • Nickolay V. Shmyrev

    It seems some of your adaptation tool add two more mixture weights. Maybe it
    was unintentionally modified or you are using older version that has a bug.
    Please check them throughly. Please provide all the files you used for
    adaptation.

    I've just verified adaptation process. Here mixw count remains unmodified and
    works perfectly.

     
  • creative64

    creative64 - 2012-06-04

    Hi NS,

    These adaptation/training were done in July/August 2010 time frame with the
    sphinxtrain version prevailing at that point in time. Those directories are
    not available now. We'll repeat those exercises with latest sphinxtrain to
    check if we get into the same issue again. Anything specific that we need to
    be careful about regarding this issue ?

    Thanks and regards,

     

Log in to post a comment.