The training data is very simple. They are all in the format as "Please take the characters <digits> once again <digits>".
There are 2,000 wav files for training(5 hours in total) and 500 files for test, which cost me 2days to prepare.
The training result is very good according to align file:
TOTAL Words: 8256 Correct: 7278 Errors: 1105
TOTAL Percent correct = 88.15% Error = 13.38% Accuracy = 86.62%
TOTAL Insertions: 127 Deletions: 281 Substitutions: 697
However, I got a very bad accuracy when running Pocketsphinx/Sphinx4 with my model.
For example, file_2001.wav and file_2002.wav can be recognized in db.align file as:
1 please take the characters one three eight seven two once again one three eight seven two (apple-FILE_2001)
2 please take the characters one three eight seven two once again one three eight seven two (apple-FILE_2001)
3 Words: 16 Correct: 16 Errors: 0 Percent correct = 100.00% Error = 0.00% Accuracy = 100.00%
4 Insertions: 0 Deletions: 0 Substitutions: 0
5 please take the characters one seven THREE nine one ONCE AGAIN one seven three nine ONE (apple-FILE_2002)
6 please take the characters one seven NINE nine one *** *** one seven three nine *** (apple-FILE_2002)
7 Words: 16 Correct: 12 Errors: 4 Percent correct = 75.00% Error = 25.00% Accuracy = 75.00%
Rather than pocketsphinx:
file_2001.wav : NINE ZERO EIGHT NINE TWO NINE ONE
file_2002.wav : NINE
Hi, I am newbee to cmu-sphinx.
I created the language model using: http://www.speech.cs.cmu.edu/tools/lmtool-new.html
And trained the acoustic mode followed: https://cmusphinx.github.io/wiki/tutorialam/
Using latest release from https://cmusphinx.github.io/wiki/download/
The training data is very simple. They are all in the format as "Please take the characters <digits> once again <digits>".
There are 2,000 wav files for training(5 hours in total) and 500 files for test, which cost me 2days to prepare.
The training result is very good according to align file:
However, I got a very bad accuracy when running Pocketsphinx/Sphinx4 with my model.
For example, file_2001.wav and file_2002.wav can be recognized in db.align file as:
Rather than pocketsphinx:
pocketsphinx pocommand is
Training db, including etc/model_parameters/result can be found : https://drive.google.com/open?id=0ByvxjxiH6xq3SEotczdNNVhacUU
And I've uploaded 10 wav file for test.
Do I miss something in training or using the model?
Last edit: kyon 2017-11-03
Running with pocketsphinx_batch can get a very good result
Command is :
Result:
Any magic here?
Our software has several hidden tweaks to reduce captcha cracking capabilities.