While performing adaptation, I have the voice files of one particular user and the corresponding transcriptions. Is it good to have more number of recordings for the same sentence again and again? Say for a word "command", I need to have one .wav file audio. I actually have 7 different .wav files with the same word, by the same speaker, recorded using an array of 7 mikes. During adaptation, while running "sphinx_fe" program, do I use all the seven wav files or only one among them? Which will give me better recognition accuracy?
I have executed the procedures of adaptation using MAP with one set of recordings and found that the recognition accuracy is better than the same procedures with 2 sets. I was expecting that multiple recordings would improve the recognition. That's why I'm raising this. Please clarify.
Thank you.
Balaji.
Last edit: Balaji 2018-02-26
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
When I run the bw.exe (as per the adaptation tutorial), I get the following three files created in my working directory:
I want to understand their contents and the algorithm that creates them. Any reference please?
Thank you.
Balaji.
Rabiner's HMM tutorial covers HMM estimation
http://www.ece.ucsb.edu/Faculty/Rabiner/ece259/Reprints/tutorial%20on%20hmm%20and%20applications.pdf
Hello Nickolay,
While performing adaptation, I have the voice files of one particular user and the corresponding transcriptions. Is it good to have more number of recordings for the same sentence again and again? Say for a word "command", I need to have one .wav file audio. I actually have 7 different .wav files with the same word, by the same speaker, recorded using an array of 7 mikes. During adaptation, while running "sphinx_fe" program, do I use all the seven wav files or only one among them? Which will give me better recognition accuracy?
I have executed the procedures of adaptation using MAP with one set of recordings and found that the recognition accuracy is better than the same procedures with 2 sets. I was expecting that multiple recordings would improve the recognition. That's why I'm raising this. Please clarify.
Thank you.
Balaji.
Last edit: Balaji 2018-02-26