CMU Sphinx / Forums / Help: inc_comp: confuse about -ninc value

danial ibrahim - 2004-08-03

Hi,
I want to run the inc_comp executable but get confuse about -ninc value i should use.
I want to use 8 mixture gaussian...so, what actually the -ninc should be? is it 2, because 1 gaussian split into 2? or 3 because 8 = 2power3??
can anyone help me out?

thanks.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- The Grand Janitor - 2004-08-26
  
  It is the final number of mixtures. So you should use 8 in your case. However, you are recommended to split the mixture in the order of 1-> 2 -> 4 -> 8. Within splittings , do several rounds of Baum-Welch training will make the performance of models much better. -Arthur Chan
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- danial ibrahim - 2004-08-28
  
  Mr. Arthur, can you check my command line,please?
  It's a very long process and i want to confirm whether it was already right or wrong at all. thanks in advance :)
  
  step 1:
  -------
  bin/init_mixw -src_moddeffn model_architecture/ci_3s.mdef -src_ts2cbfn .cont. -src_mixwfn model_parameters/ci_mixw_f.pm -src_meanfn model_parameters/ci_means_f.pm -src_varfn model_parameters/ci_var_f.pm -src_tmatfn model_parameters/ci_tmat_f.pm -dest_moddeffn model_architecture/cd_tied_3s.mdef -dest_ts2cbfn .cont. -dest_mixwfn model_parameters/cd_tied_mixw_1gau.pm -dest_meanfn model_parameters/cd_tied_means_1gau.pm -dest_varfn model_parameters/cd_tied_var_1gau.pm -dest_tmatfn model_parameters/cd_tied_tmat_1gau.pm -feat c/0..L-1/d/0..L-1/dd/0..L-1/ -ceplen 13
  
  bin/bw -moddeffn model_architecture/cd_tied_3s.mdef -ts2cbfn .cont. -mixwfn model_parameters/cd_tied_mixw_1gau.pm -mwfloor 0.00001 -tmatfn model_parameters/cd_tied_tmat_1gau.pm -tpfloor 0.0001 -meanfn model_parameters/cd_tied_means_1gau.pm -varfn model_parameters/cd_tied_var_1gau.pm -dictfn etc/alpha.dict -fdictfn etc/filler.dict -ctlfn etc/alpha.fileids -part 1 -npart 1 -cepdir e:/wav/mfc -cepext feat -lsnfn etc/alpha.transcription -accumdir bwaccumdir -varfloor 0.00001 -topn 8 -abeam 1e-100 -bbeam 1e-100 -agc max -cmn current -varnorm yes -meanreest yes -varreest yes -2passvar yes -tmatreest yes -feat c/0..L-1/d/0..L-1/dd/0..L-1/ -ceplen 13
  
  bin/norm -accumdir bwaccumdir -feat c/0..L-1/d/0..L-1/dd/0..L-1/ -mixwfn model_parameters/cd_tied_mixw_1norm.pm -tmatfn model_parameters/cd_tied_tmat_1norm.pm -meanfn model_parameters/cd_tied_means_1norm.pm -varfn model_parameters/cd_tied_var_1norm.pm -ceplen 13
  
  bin/inc_comp -ninc 1 -ceplen 13 -dcountfn model_parameters/cd_tied_mixw_1norm.pm -inmixwfn model_parameters/cd_tied_mixw_1norm.pm -outmixwfn model_parameters/cd_tied_mixw_2gau.pm -inmeanfn model_parameters/cd_tied_means_1norm.pm -outmeanfn model_parameters/cd_tied_means_2gau.pm -invarfn model_parameters/cd_tied_var_1norm.pm -outvarfn model_parameters/cd_tied_var_2gau.pm -feat c/0..L-1/d/0..L-1/dd/0..L-1/
  
  step 2:
  -------
  bin/bw -moddeffn model_architecture/cd_tied_3s.mdef -ts2cbfn .cont. -mixwfn model_parameters/cd_tied_mixw_2gau.pm -mwfloor 0.00001 -tmatfn model_parameters/cd_tied_tmat_1norm.pm -tpfloor 0.0001 -meanfn model_parameters/cd_tied_means_2gau.pm -varfn model_parameters/cd_tied_var_2gau.pm -dictfn etc/alpha.dict -fdictfn etc/filler.dict -ctlfn etc/alpha.fileids -part 1 -npart 1 -cepdir e:/wav/mfc -cepext feat -lsnfn etc/alpha.transcription -accumdir bwaccumdir -varfloor 0.00001 -topn 8 -abeam 1e-100 -bbeam 1e-100 -agc max -cmn current -varnorm yes -meanreest yes -varreest yes -2passvar yes -tmatreest yes -feat c/0..L-1/d/0..L-1/dd/0..L-1/ -ceplen 13
  
  bin/norm -accumdir bwaccumdir -feat c/0..L-1/d/0..L-1/dd/0..L-1/ -mixwfn model_parameters/cd_tied_mixw_2norm.pm -tmatfn model_parameters/cd_tied_tmat_2norm.pm -meanfn model_parameters/cd_tied_means_2norm.pm -varfn model_parameters/cd_tied_var_2norm.pm -ceplen 13
  
  bin/inc_comp -ninc 2 -ceplen 13 -dcountfn model_parameters/cd_tied_mixw_2norm.pm -inmixwfn model_parameters/cd_tied_mixw_2norm.pm -outmixwfn model_parameters/cd_tied_mixw_4gau.pm -inmeanfn model_parameters/cd_tied_means_2norm.pm -outmeanfn model_parameters/cd_tied_means_4gau.pm -invarfn model_parameters/cd_tied_var_2norm.pm -outvarfn model_parameters/cd_tied_var_4gau.pm -feat c/0..L-1/d/0..L-1/dd/0..L-1/
  
  step 3:
  -------
  bin/bw -moddeffn model_architecture/cd_tied_3s.mdef -ts2cbfn .cont. -mixwfn model_parameters/cd_tied_mixw_4gau.pm -mwfloor 0.00001 -tmatfn model_parameters/cd_tied_tmat_2norm.pm -tpfloor 0.0001 -meanfn model_parameters/cd_tied_means_4gau.pm -varfn model_parameters/cd_tied_var_4gau.pm -dictfn etc/alpha.dict -fdictfn etc/filler.dict -ctlfn etc/alpha.fileids -part 1 -npart 1 -cepdir e:/wav/mfc -cepext feat -lsnfn etc/alpha.transcription -accumdir bwaccumdir -varfloor 0.00001 -topn 8 -abeam 1e-100 -bbeam 1e-100 -agc max -cmn current -varnorm yes -meanreest yes -varreest yes -2passvar yes -tmatreest yes -feat c/0..L-1/d/0..L-1/dd/0..L-1/ -ceplen 13
  
  bin/norm -accumdir bwaccumdir -feat c/0..L-1/d/0..L-1/dd/0..L-1/ -mixwfn model_parameters/cd_tied_mixw_4norm.pm -tmatfn model_parameters/cd_tied_tmat_4norm.pm -meanfn model_parameters/cd_tied_means_4norm.pm -varfn model_parameters/cd_tied_var_4norm.pm -ceplen 13
  
  bin/inc_comp -ninc 4 -ceplen 13 -dcountfn model_parameters/cd_tied_mixw_4norm.pm -inmixwfn model_parameters/cd_tied_mixw_4norm.pm -outmixwfn model_parameters/cd_tied_mixw_8gau.pm -inmeanfn model_parameters/cd_tied_means_4norm.pm -outmeanfn model_parameters/cd_tied_means_8gau.pm -invarfn model_parameters/cd_tied_var_4norm.pm -outvarfn model_parameters/cd_tied_var_8gau.pm -feat c/0..L-1/d/0..L-1/dd/0..L-1/
  
  step 4:
  ----------------------------
  bin/bw -moddeffn model_architecture/cd_tied_3s.mdef -ts2cbfn .cont. -mixwfn model_parameters/cd_tied_mixw_8gau.pm -mwfloor 0.00001 -tmatfn model_parameters/cd_tied_tmat_4norm.pm -tpfloor 0.0001 -meanfn model_parameters/cd_tied_means_8gau.pm -varfn model_parameters/cd_tied_var_8gau.pm -dictfn etc/alpha.dict -fdictfn etc/filler.dict -ctlfn etc/alpha.fileids -part 1 -npart 1 -cepdir e:/wav/mfc -cepext feat -lsnfn etc/alpha.transcription -accumdir bwaccumdir -varfloor 0.00001 -topn 8 -abeam 1e-100 -bbeam 1e-100 -agc max -cmn current -varnorm yes -meanreest yes -varreest yes -2passvar yes -tmatreest yes -feat c/0..L-1/d/0..L-1/dd/0..L-1/ -ceplen 13
  
  bin/norm -accumdir bwaccumdir -feat c/0..L-1/d/0..L-1/dd/0..L-1/ -mixwfn model_parameters/cd_tied_mixw_8norm.pm -tmatfn model_parameters/cd_tied_tmat_8norm.pm -meanfn model_parameters/cd_tied_means_8norm.pm -varfn model_parameters/cd_tied_var_8norm.pm -ceplen 13
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- The Grand Janitor - 2004-08-28
  
  I can't spot too many procedural difference just from what you said. There are just two general comments.
  
  1, If you want to run bw, make sure it is run iteratively, internal to CMU, we usually run 8-12 iterations. I recommend you to run at least 6 iteration in between each step.
  
  2, Another issue is that it is wiser to test the model at each step but not try to run the whole thing at one short. Even if you already the script of SphinxTrain, I will still recommend you to do in this way. There are a lot of external factors that can make the training scripts die. You want to make sure at every step, your training is correct. So, I will recommend you to do step 1 and test the model first. In that way, you will be sure whether your recognition is correct.
  3, How much training data did you use for training?
  Where did you get your data. If you use too few, some of the models will have too little data. Your recognition will be just very poor as it is.
  
  BTW, if you run alphabet recognition, you have to understand that the recognition rate will be limited by the fact there exist are very confusable E-sets in English alphabets (c,d,e,g,v).
  
  Arthur
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Roger Wellington-Oguri - 2004-08-30
    
    Arthur,
    
    It's good you made the point that the training process doesn't just make one final working model, but a series of models, each one being a refinement, in one way or another, of the previous. That's undoubtedly obvious to someone who understands the whole picture, but I was working with SphinxTrain for quite awhile before I realized it.
    
    But this also brings up another point. For some tasks, might it not be better to use one of the earlier, simpler, models? In my case, for example, I have a small vocabulary, and am using individual words as the units for my accoustic model. Given that I don't expect the pronunciation of the words are going to be influenced much by their context in the utterance, am I not likely to be best off using the initial CI model?
    
    Roger
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- The Grand Janitor - 2004-08-28
  
  Also, kindly call me Arthur, that is my first name :-). I am definitely not Mr. Arthur. Thanks,
  Arthur
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- danial ibrahim - 2004-08-29
  
  ok, thanks Arthur :)
  
  just to ask a few questions before i can pass the init_gau stage. about running the bw iteratively in each step, is it should be done manually 6 times or is there any argument that assign the iterative counter in the command line?
  
  how to test the model after the 1st step is done? what tools should i use?
  
  actually, when the training is done, what process should i pass before i make the recognition test? is this has connection to force alignment?
  
  i use ti46 data corpus (free from LDC: http://wave.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC93S9\).
  uttered by 16 speakers, 8 male and 8 female.
  each letter repeated 10 times in training and 16 times in testing. but for now i just use data from 1 speaker (26*10 = 260 utterances).
  
  it is true the alphabet will be very confusable...but, i have tested the model in sphinx4 for recognition and got accuracy 0% even the settings for front end were all same in training and decoding. so, i thought something had gone wrong in my training.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- The Grand Janitor - 2004-08-29
  
  How to test the model? Of course to use the decoder! You should make sure that at every stage, your training results (means, vars, trans ...) is a valid input of the decoder.   It will also allow you to debug your training like debugging your code.
  
  For iterative training. for me, I usually use a shell scripts to do iterative training.   If you want to make it simple, choose bash or csh. I used Perl as well but calling other programs from perl need some special treatments. It is not very hard to learn them in general. Of course, I can also offer you some help.
  
  TI46 is a good starting point, if you are trying to training dictation at the first shot, I will probably not talk to you. :-)
  
  For your project , this is my general advice, "the slower the better", don't rush yourselve in any steps.   Make sure you understand what is going on rather than to just follow the procedures. E.g. thing like the decoder can help you to validate result. This will help you a lot.
  
  Arthur
  
  Arthur
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- The Grand Janitor - 2004-08-30
  
  Hi Roger,
      It is good that you can realize that by yourself and it is partially our fault that we haven't spent too much time to explain.
      About your question, this is my two cents. There is usually a trade-off between how the models size and the model performance. Lets use number of mixture to be an example and lets sayyou just tune on this single parameters in training. At the beginning of the performance of test set will improve as you increase the parameters. However, at certain point, the performance gain will stop. This is because the model has alway been overfitted to the training set. Scientists in pattern recognition and machine learning usually call this phenomenon as overfitting.
  
        You can always use any model at any point. However, there is always trade-off.   How you could take this trade-off really depends on what your purpose is. Some people want to build a small and fast recognizer. In that case, using CI models as you mentioned can be a good way. Some people we seek for high-performance in accuracy, in that case, using standard CD triphone models will be crucial. Also other expensive method will be necessary.
  
        I just tried to give you some rough ideas on what you can work on acoustic model. Speech recognition itself is a discipline take years to master. If you are interested, I could point you to some references to further your understanding.
  
  Arthur
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Roger Wellington-Oguri - 2004-08-30
    
    Yes, I am interested in good references. I am particularly interested right now in the theory behind the selection of acoustic features. The device I am targeting isn't capable of recording at 16000 sps, so I have to make some changes. Also, the target machine's memory and processing power are very limited, so I need to find a sweet spot in the accuracy/processing tradeoff.
    
    I would appreciate any pointers you can give me. I have a reasonably strong background in mathematics, so the references don't have to be introductory level.
    
    Thanks.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- The Grand Janitor - 2004-08-31
  
  Thanks Daniel, I will fix it asap. Next time, when you have a problem, go the "Bugs" page and submit a new bug. Sometimes, the developers don't have time to handle your request immediately. We will just assign to someone and fullfil your request later. You can also submit file in that page.
  
  Arthur
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- The Grand Janitor - 2004-08-31
  
  Hi Roger,
  Check out L. Rabiner's "Fundamental of Speech Recognition" and Xuedong Huang's "Spoken Language Processing" . Both are classics. The first one is older but simpler. The second one is more updated and more comprehensive (also more expensive.)
  I will also recommend you to understand more about search algorithm and acoustic modeling. Rather than feature extraction. Major reason is on the feature extraction side, MFCC is the standard choise. Many people found that it doesn't really matter how many bits, the sps and front-end parameters you choose. So, if this is the first project of yours, save your time on backend(search algorithms and acoustic modeling) rather than frontend.
  
  Arthur
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- The Grand Janitor - 2004-08-31
  
  BTW, start another thread for new discussion. We are a little bit off the original topic now. :-)
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

inc_comp: confuse about -ninc value

Speech Recognition Toolkit

Forums

Help

inc_comp: confuse about -ninc value document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

inc_comp: confuse about -ninc value