I'm trying to adapt an acoustic model, but on roughly 25% of my training audio files, I receive the error,
"Failed to align audio to trancript: final state of the search is not reached".
I've tried searching and the most common cause of this problem seems to be audio files in the wrong format, but I've checked and double checked and I believe my audio is fine.
Are you able to offer anymore insight into why this error might occur?
I'm still in the process of gathering all the audio data required, so if I'm doing something wrong I'd like to know sooner rather than later so that I can correct my process.
Thanks for any assistance,
Stu.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thanks for that. What makes you say the audio is mostly 8khz? AFAIK it should all be 16khz, if it's not then I'll have to correct my process.
Currently every time I get new audio samples, I re-run the adaptation process on the the acoustic model with the new data (aswell as the old). Is this causing me issues? Should I be using the default model for adaptation and not be "re-adapting" my existing model?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I've checked all my source audio and most of them are 16khz, some are 44.1khz, and some are 48khz. All of them were converted to 16khz before being run through the adaptation process.
Unfortunately I'm gathering data from a lot of sources and I can't get it all recorded at 16khz. Is including the higher frequency data (even adjusted to 16khz) going to be hurting the process?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
I'm trying to adapt an acoustic model, but on roughly 25% of my training audio files, I receive the error,
"Failed to align audio to trancript: final state of the search is not reached".
I've tried searching and the most common cause of this problem seems to be audio files in the wrong format, but I've checked and double checked and I believe my audio is fine.
Are you able to offer anymore insight into why this error might occur?
I've uploaded two similar recordings to dropbox - one of which works fine and one which gives the error above.
https://www.dropbox.com/sh/e4fpdb3xdezfbw2/AABJBZbs07ZUL5VgSg2utnoba?dl=0
I'm still in the process of gathering all the audio data required, so if I'm doing something wrong I'd like to know sooner rather than later so that I can correct my process.
Thanks for any assistance,
Stu.
Nothing suspicious in these particular files, you need to provide the whole model folder to get help on this issue.
I have uploaded the model and associated data files. They should be available at that same link.
The acoustic model is the en-us-5.2 one, that I have then run through the adaptation process many times with different data.
With the default model http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/US%20English%20Generic%20Acoustic%20Model/cmusphinx-en-us-ptm-5.2.tar.gz/download
it aligns fine.
Looks like your previous adaptation made the model too specific, you overadapted it.
Also, your audio is mostly 8khz, you'd better use 8khz model for reliable recognition.
Thanks for that. What makes you say the audio is mostly 8khz? AFAIK it should all be 16khz, if it's not then I'll have to correct my process.
Currently every time I get new audio samples, I re-run the adaptation process on the the acoustic model with the new data (aswell as the old). Is this causing me issues? Should I be using the default model for adaptation and not be "re-adapting" my existing model?
http://cmusphinx.sourceforge.net/wiki/faq/#qwhat_is_sample_rate_and_how_does_it_affect_accuracy
Yes
I've checked all my source audio and most of them are 16khz, some are 44.1khz, and some are 48khz. All of them were converted to 16khz before being run through the adaptation process.
Unfortunately I'm gathering data from a lot of sources and I can't get it all recorded at 16khz. Is including the higher frequency data (even adjusted to 16khz) going to be hurting the process?
The problem is a bandwidth of the data, not just a sample rate, see the faq entry above. A bandwidth in your data is lower than 16khz.