I'm getting an error when decoding in sphinxtrain. as stated in the subject line I can seen the error in log file and I cannot find any inconsistance in .dic and .lm files. they are matching exactly as I can see. HERE are the .dic and .lm files and the error log. Please let me know is their any differences between these two files.
If their is any encoding mismatch between two files ,how can I find that?
Thank you
Last edit: ab1984 2015-08-11
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
You incorrectly created language model. You added <s> before every word and </s> after every word, without spaces so that every word now contains <s>. You can open langauge model file and see that.
Actually you do not need to add <s> to the training corpra at all. SRILM adds them automatically.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'm getting an error when decoding in sphinxtrain. as stated in the subject line I can seen the error in log file and I cannot find any inconsistance in .dic and .lm files. they are matching exactly as I can see. HERE are the .dic and .lm files and the error log. Please let me know is their any differences between these two files.
If their is any encoding mismatch between two files ,how can I find that?
Thank you
Last edit: ab1984 2015-08-11
You incorrectly created language model. You added
<s>before every word and</s>after every word, without spaces so that every word now contains<s>. You can open langauge model file and see that.Actually you do not need to add
<s>to the training corpra at all. SRILM adds them automatically.Thank you Nickolay. It worked :)
MODULE: DECODE Decoding using models previously trained
Decoding 4896 segments starting at 0 (part 1 of 1)
0%
Aligning results to find error rate
SENTENCE ERROR: 40.2% (1970/4896) WORD ERROR RATE: 21.8% (2634/12063)