I want to know the data set used by the Chinese(mandarin) model provided by CMU.
When I want to train my Chinese model, I find that the recognition rate is very low.
I suspect that the data set is of poor quality or training method problem.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
When I want to train my Chinese model, I find that the recognition rate is very low.
I suspect that the data set is of poor quality or training method problem.
I want to know the data set used by the Chinese(mandarin) model provided by CMU.
When I want to train my Chinese model, I find that the recognition rate is very low.
I suspect that the data set is of poor quality or training method problem.
https://catalog.ldc.upenn.edu/LDC98S73
If you use AISHELL http://www.openslr.org/33/ or AISHELL2, the accuracy should be much better.
thanks
I want to know what your noise file is, such as +LAUGH+ +LIPSMACK+ +COUGH+ ...
would you mind sharing “sphinx_train.cfg” file with me, so that I can find different places with my files.
Noisedict inside model contains:
The model was trained long time ago, so the configuration file is lost. You can use default one, it should give you same results.
Ok, let me try again.
Thank you very much.