I need to use PocketSphinx as a speech recognition tool, and the recognition results are very bad despite the medium quality of the input files.
I try to recognise French, so we used the two Acoustic models alongside with the Language model and dictionary for French and we did not manage to get good results: only one audio file has been correctly recognised out of 288, and the other ones were not as close as the actual transcription. To do so, as I am a newbie, I only use the basic options (-hmm -lm, -dict, -infile).
Do you know if there is a way to improve the WER (which is about 85-90%), and to get better results?
I also would like to know if it was possible to improve the processing time which is a bit too long for my use.
Cheers!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thank you very much for the reply and sorry for the incomplete message.
I tried two things :
First, I launched PocketSphinx simply using the provided acoustic model and language model and dictionary for French > Low accuracy and very high WER. For this test, I used:
Both the French F0 Broadcast News Acoustic Model and the French F2 Telephone Acoustic Model acoustic models (not at a time, but I had to test them both);
The french3g62K.lm.dmp language model;
The frenchWords62K.dic dictionary
Then, I tried to adapt the language model and to restrict the dictionary so it can fit the sample data > the WER is slightly better, but the accuracy is still very low. I only tested these changes with the French F0 Broadcast News Acoustic Model.
Hello,
I need to use PocketSphinx as a speech recognition tool, and the recognition results are very bad despite the medium quality of the input files.
I try to recognise French, so we used the two Acoustic models alongside with the Language model and dictionary for French and we did not manage to get good results: only one audio file has been correctly recognised out of 288, and the other ones were not as close as the actual transcription. To do so, as I am a newbie, I only use the basic options (-hmm -lm, -dict, -infile).
Do you know if there is a way to improve the WER (which is about 85-90%), and to get better results?
I also would like to know if it was possible to improve the processing time which is a bit too long for my use.
Cheers!
To get help on the accuracy you need to describe precisely how are you using the decoder and provide the sample files you are going to recognize.
Thank you very much for the reply and sorry for the incomplete message.
I tried two things :
I both launched using this command:
pocketsphinx_continuous -dict /path/to/frenchdictionary.dic -hmm /path/to/acoustic/model/fr_FR/french -lm /path/to/language/model/frenchmodel.lm.dmp -infile /path/to/audiofile.wav
I have attached some of audio samples as well as the dictionary used for the model adaptation and the transcription file.
I hope this will help.
Thanks in advance!
Well, french model is not very good. I'll try to train Voxforge French model in a couple of days, lets see how it goes.
Well, French model has been available for some time. It's still a long way to go for the best accuracy.