It just looks like your data are too small for a proper n-gram training...If you have just a few phrases, consider building a grammar instead.
You can also set "-gtnmax 0" to drop discounting completely
Last edit: Arseniy Gorin 2016-07-20
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Unknown option "-gtnmax"; type "ngram-count -help" for information
warning: discount coeff 1 is out of range: 0
warning: count of count 8 is zero -- lowering maxcount
warning: count of count 7 is zero -- lowering maxcount
warning: count of count 6 is zero -- lowering maxcount
warning: count of count 5 is zero -- lowering maxcount
warning: discount coeff 1 is out of range: 0
warning: discount coeff 3 is out of range: 3.24638
warning: count of count 8 is zero -- lowering maxcount
warning: count of count 7 is zero -- lowering maxcount
warning: count of count 6 is zero -- lowering maxcount
warning: count of count 5 is zero -- lowering maxcount
warning: count of count 4 is zero -- lowering maxcount
warning: count of count 3 is zero -- lowering maxcount
warning: discount coeff 1 is out of range: 0
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Sorry, my fault. Warning is removed with "-gt3max 0 -gt2max 0 -gt1max 0"
However, I think you better keep default setting. Your command still produces the LM.
To sum up, go with "ngram-count -text train-text.txt -lm your.lm"
By the way, not sure what you mean by adaptation of the default acoustic model, but normally you do not need language model for that, You need audios with transcripts and the dictionary covering words in transcriptions
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
It would be better that my corpus has much sentences.
By the way, not sure what you mean by adaptation of the default acoustic model, but normally you do not need language model for that, You need audios with transcripts and the dictionary covering words in transcriptions
Your totally right. It stuck in my mind wrongly.
I have one more question. In the tutorial, there is a information :
"for book-like texts you need to use Knesser-Ney discounting. For command-like texts you should use Witten-Bell discounting or Absolute discounting. You can try different methods and see which gives better perplexity on a test set".
That means for book-like texts we should add parameter "-kndiscount"
My corpus is formed from command-like texts. What should I write as a parameter ?
-wbdiscount ? or something..
Thanks a lot for your quick response Arseniy.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Yes, using other type of discounting is also possible. If your commands are fixed, using no discounting should also work fine. However, I'd prefer building a grammar in this case.
According to SRILM data sheet for WB discounting you add:
"-wbdiscount -wbdiscount1 -wbdiscount2"
I am trying to reproduce your error with acoustic model. Will back when it is done
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I have followed http://cmusphinx.sourceforge.net/wiki/tutoriallm tutorial.
After I run this code, It gives me "one of modified KneserNey discounts is negative error in discount estimator for order 2" error
ngram-count -kndiscount -interpolate -text train-text.txt -lm your.lm
How can I solve this problem ?
Your train-text.txt is likely small. Try other discounting in this case (-kndiscount -interpolate options)
See more in C3 of this FAQ
Last edit: Arseniy Gorin 2016-07-20
Thanks alot for your quick response Arseniy,
I have removed that in the code line. Finally, It created "your.lm" file.
But there are some warnings.
Is it normal ?
It just looks like your data are too small for a proper n-gram training...If you have just a few phrases, consider building a grammar instead.
You can also set "-gtnmax 0" to drop discounting completely
Last edit: Arseniy Gorin 2016-07-20
Hi Arseniy,
I have to use language model for adaptation with defualt acoustic model.
I have tried to set gtnmax parameter just like you said :
ngram-count -gtnmax 0 -text train-text.txt -lm your.lm
The output is:
Sorry, my fault. Warning is removed with "-gt3max 0 -gt2max 0 -gt1max 0"
However, I think you better keep default setting. Your command still produces the LM.
To sum up, go with "ngram-count -text train-text.txt -lm your.lm"
By the way, not sure what you mean by adaptation of the default acoustic model, but normally you do not need language model for that, You need audios with transcripts and the dictionary covering words in transcriptions
I have test with add "-gt3max 0 -gt2max 0 -gt1max 0" parameter in the code line.
The output is worse than the former one, I guess:
It would be better that my corpus has much sentences.
Your totally right. It stuck in my mind wrongly.
I have one more question. In the tutorial, there is a information :
That means for book-like texts we should add parameter "-kndiscount"
My corpus is formed from command-like texts. What should I write as a parameter ?
-wbdiscount ? or something..
Thanks a lot for your quick response Arseniy.
Yes, using other type of discounting is also possible. If your commands are fixed, using no discounting should also work fine. However, I'd prefer building a grammar in this case.
According to SRILM data sheet for WB discounting you add:
"-wbdiscount -wbdiscount1 -wbdiscount2"
I am trying to reproduce your error with acoustic model. Will back when it is done
Thanks alot Arseniy, You are the man :)