The training data is very simple. They are all in the format as "Please take the characters <digits> once again <digits>".
There are 2,000 wav files for training(5 hours in total) and 500 files for test, which cost me 2days to prepare.</digits></digits>
The training result is very good according to align file:
TOTAL Words: 8256 Correct: 7278 Errors: 1105
TOTAL Percent correct = 88.15% Error = 13.38% Accuracy = 86.62%
TOTAL Insertions: 127 Deletions: 281 Substitutions: 697
However, I got a very bad accuracy when running Pocketsphinx/Sphinx4 with my model.
For example, file_2001.wav and file_2002.wav can be recognized in db.align file as:
Hi, I am newbee to cmu-sphinx.
I created the language model using: http://www.speech.cs.cmu.edu/tools/lmtool-new.html
And trained the acoustic mode followed: https://cmusphinx.github.io/wiki/tutorialam/
Using latest release from https://cmusphinx.github.io/wiki/download/
The training data is very simple. They are all in the format as "Please take the characters <digits> once again <digits>".
There are 2,000 wav files for training(5 hours in total) and 500 files for test, which cost me 2days to prepare.</digits></digits>
The training result is very good according to align file:
However, I got a very bad accuracy when running Pocketsphinx/Sphinx4 with my model.
For example, file_2001.wav and file_2002.wav can be recognized in db.align file as:
Rather than pocketsphinx:
pocketsphinx pocommand is
Training db, including etc/model_parameters/result can be found : https://drive.google.com/open?id=0ByvxjxiH6xq3SEotczdNNVhacUU
And I've uploaded 10 wav file for test.
Do I miss something in training or using the model?
Last edit: kyon 2017-11-03
Running with pocketsphinx_batch can get a very good result
Command is :
Result:
Any magic here?
Our software has several hidden tweaks to reduce captcha cracking capabilities.