I have done training for by changing the number of Guassian Mixture.
I find as the number of guassian mixture increases the time taken also increases:
Pocket Sphinx:
4 GMM took 73 secs giving accuracy 49.20
8 GMM took 90 secs giving accuracy 51.20
16 GMM took 117 giving accuracy 54.117
HDecode:
4 GMM took 12 secs giving accuracy 52.86
8 GMM took 23 secs giving accuracy 56.79
16 GMM took 44 secs giving accuracy 59.48
Can you please suggest, why the time taken increases as we increase the GMM and why it is too much in case of PocketSphinx as compare to HDecode.
Also I find the trend reverse in Kaldi, where as we increase the GMM the time taken reduces while takes around 200 secs in all studies. How and why PocketSphinx differs from Kaldi in this showing of trend?
Am I doing something wrong or is this the trend I am not sure.
Please advice.
Senjam
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
There are many parameters involved here - topn scoring, language weight, beams for decoding, multiple scoring passes. You can't simply change number of gaussians and keep everything else the same. It is also not clear how do you measure the time, it seems you are doing something strange. You need to measure the decoding time only, not application startup time.
You need to provide more data - command lines, models, data to get help on this issue. Without clear understanding on what is going on the numbers are sort of meaningless.
Usually more gaussians mean more computation. So it is natural the decoding will take longer time. As for Kaldi, you probably measure something different, not the actual computation.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
For PocketSphinx: I simply ran the command "sphinxtrain -s decode run" and a shell script measures the execution time of this command from start till the end, and finally throws the total time taken in Minutes.
If this is not the write method, kindly suggest me how should I do this, which particular script execution time I should measure.
I have changed the language weight to 13 and the accuracy improves in an4 with 23.8% WER.
Can you please let me know about the following:
topn scoring: Is this related with accuracy or only for saving time taken for training?
I put $CFG_CI_TOPN = 8; $CFG_CD_TOPN = 8;
But all I get is models for CD only till 8 Guassian. Cannot see CI in model parameter.
Beams for decoding: what are the ranges we can do for word and global beam(sentence),
Multiple scoring passes: what are the important scoring passes.
Also I came to know from a search about maxhmmpf, maxwpf. Where can I set them?
Also are the settings of these parameter mentioned above available for HTK and KALDI?
Regards,
Shanti
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thank you so much Nickolay, I don't know how to thank you.
The open source is making the world change, I am new to this, but will work harder to contribute back.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I have done training for by changing the number of Guassian Mixture.
I find as the number of guassian mixture increases the time taken also increases:
Pocket Sphinx:
4 GMM took 73 secs giving accuracy 49.20
8 GMM took 90 secs giving accuracy 51.20
16 GMM took 117 giving accuracy 54.117
HDecode:
4 GMM took 12 secs giving accuracy 52.86
8 GMM took 23 secs giving accuracy 56.79
16 GMM took 44 secs giving accuracy 59.48
Can you please suggest, why the time taken increases as we increase the GMM and why it is too much in case of PocketSphinx as compare to HDecode.
Also I find the trend reverse in Kaldi, where as we increase the GMM the time taken reduces while takes around 200 secs in all studies. How and why PocketSphinx differs from Kaldi in this showing of trend?
Am I doing something wrong or is this the trend I am not sure.
Please advice.
Senjam
There are many parameters involved here - topn scoring, language weight, beams for decoding, multiple scoring passes. You can't simply change number of gaussians and keep everything else the same. It is also not clear how do you measure the time, it seems you are doing something strange. You need to measure the decoding time only, not application startup time.
You need to provide more data - command lines, models, data to get help on this issue. Without clear understanding on what is going on the numbers are sort of meaningless.
Usually more gaussians mean more computation. So it is natural the decoding will take longer time. As for Kaldi, you probably measure something different, not the actual computation.
For PocketSphinx: I simply ran the command "sphinxtrain -s decode run" and a shell script measures the execution time of this command from start till the end, and finally throws the total time taken in Minutes.
If this is not the write method, kindly suggest me how should I do this, which particular script execution time I should measure.
In Kaldi I did this in the run.sh:
start=$(date +'%s')
steps/decode.sh --nj "$decode_nj" --cmd "$decode_cmd" \ exp/mono/graph data/test exp/mono/decode_test || exit 1;
echo "It took $(($(date +'%s') - $start)) seconds"
Please suggest if this is NOT proper.
You need to take the time taken and realtime ratio from the decoder logs, both cmusphinx and kaldi report that.
In cmusphinx:
In Kaldi:
Measure of the time of the script is sort of senseless since it includes the initialization time and other utility time.
Thank you Nickolay.
I have changed the language weight to 13 and the accuracy improves in an4 with 23.8% WER.
Can you please let me know about the following:
topn scoring: Is this related with accuracy or only for saving time taken for training?
I put $CFG_CI_TOPN = 8; $CFG_CD_TOPN = 8;
But all I get is models for CD only till 8 Guassian. Cannot see CI in model parameter.
Beams for decoding: what are the ranges we can do for word and global beam(sentence),
Multiple scoring passes: what are the important scoring passes.
Also I came to know from a search about maxhmmpf, maxwpf. Where can I set them?
Also are the settings of these parameter mentioned above available for HTK and KALDI?
Regards,
Shanti
Topn scoring improves training speed. It slightly reduces the model accuracy because it doesn't score all gaussians in every training step.
To enable ci mgau training you can set
$CFG_CD_TRAIN
in config file1e-10 to 1e-200
You can read http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.72.3560 about that
In decoder script, there is no configuration option in sphinxtrain
Yes, they are just called differently.
Thank you so much Nickolay, I don't know how to thank you.
The open source is making the world change, I am new to this, but will work harder to contribute back.