I have tried to adapt en-us acoustic model following the document. I have also mixed my language model with existing en-us.lm file following this document.
Anyway, I have created a record "close active transaction" . And test it using sphinx4 with the models that I mentioned above. Sphinx4 got the result like "close to transaction". footnote : My language model have the "close active transaction" sentence
So, to get better accuracy, my acoustic model needs to be adapted . So I did.
These are my steps:
1 - Generatin acoustic feature files
2 - Converting the sendump and mdef files
3 - Accumulating observation counts
4 - Creating transformation with MLLR
5 - MLLR doceding
6 - Updating the acoustic model files with MAP
7 - Recreating the adapted sendump file
8- Tunning speech recognition accuracy creating hyp file
9- Tunning speech recognition accuracy word align script test
Default en-us model is phonetically tied. You need to use map adaptation for it, not mllr.
MAP adaptation requires about 10-20 utterances, you have just 1.
Adaptation is also not quite effective for the accented speech. It is better to train the model instead and adapt the dictionary to your pronunciation.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Adaptation is also not quite effective for the accented speech
This is a really bad news for me. I spent lots of time to adapt acoustic model automaticaly. I hoped that adaptation will be effective to get better accuacy.
In Training Acoustic Model For CMUSphinx documentation :
When you don't need to train
You need to improve accuracy - do acoustic model adaptation instead
And in Adapting the default acoustic model tutorial:
you can adapt to your own voice to make dictation good, but you also can adapt to your particular recording environment, your audio transmission channel, your accent or accent of your users.
I know that I need more recordings to adaptation for accent speech accourding to the latter tutorial.
If you are adapting to a channel, accent or some other generic property of the audio, then you need to collect a little bit more recordings manually.
Anyway, I need to make progress in that with adapting acoustic model. And I really need you guys help to adapt more effectively.
MAP adaptation requires about 10-20 utterances, you have just 1.
Can I use the same record with 19 copies for that ? Does it work ? or should it be necessary different records ?
Edit: I have tested 19 copied record for that but doesn't change anything. Until I add an alternative phonetic speel that fits in my accent like below. It works. active(2) AA K T IH V
adapt the dictionary to your pronunciation.
Do you mean that I need to change each word's phonetic spell in terms of the accent manually? We are talking about hundred of thousands words.
if you mean, Is there any tool that you know to write this dynamically ?
Thanks.
Last edit: kk_huk 2016-07-25
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I have tested 19 copied record for that but doesn't change anything.
It is not reasonable to copy same recording, you need different recordings. Also, adaptation works for slight accents, it does not work for such a different pronunciation like yours.
if you mean, Is there any tool that you know to write this dynamically ?
There is no such tool, you can develop it yourself.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
This is a really bad news for me. I spent lots of time to adapt acoustic model automaticaly. I hoped that adaptation will be effective to get better accuacy.
You could spend much more time trying wrong way avoiding to ask proper questions with necessary details.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
It is not reasonable to copy same recording, you need different recordings.
Ye, I tested, It doesn't work with this way.
Also, adaptation works for slight accents, it does not work for such a different pronunciation like yours.
Could you please give me a sample/scenario to test adaptation process whether It works or not ? I need to ensure that I adapt my acoustic model properly. Because, either my whole test cases were already works before the adaptation process or the process doesn't change any accuracy just like my first/main question in this file.
Thanks for your response Nickolay,
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
If you are unable to record artic sentences yourself you can download them on festvox website:
I can record a sound properly to use in my adaption process. You probably misunderstood me. I just need to ensure whether I adapt properly, or I don't. I need to show the adaptation process improve the accuracy.
In my adaptation process, it doesn't change anything.
For example, I called "just show" speech using my models and dictionary in sphinx4. And I need to tell you, the words were called using en-us accent. The sphinx4 gave me "just shawl" as a result. I check my dic file to ensure that words were called with right way.
I mean, In "close active transaction" sample, I was calling AActive instead of AE C tive. That is why It does't get the right result ( I guess) . So I went through my dic file to check the words pronunciation speeling.
in my dic file:
shawl SH AO L
show SH OW
I realised that I have already called SHOW properly, So, to change the wrong result, adaptation would be great to get improve the accuracy. I have recorded 10 different utterances "Just show" using en-us accent. With these sources, I have adapted my acoustic model. After this process, I have tested, and It doesn't change anything.
You did not adapt properly
For every step of the adaptation, I am creating a log file. There is no any error and warning message.
Could you please go through to my adaptation folder for that ? In this folder, there is a bat file that I run for the adaptation process.So, you can also check my adaptest.bat file.
You need to record diverse sentences as explained in tutorial, not same sentence 10 times.
I really don't get it . 10 different "Just show" adaptation didn't work. But I should expect that 10 diverse sentences records will adapt my acoustic model to "just show". How it can be ?
By the way, my main aim is improve accuracy of "just show" sentences. To improve it, I need to use some record that are different sentences from "just show" sentence. I called below sentences respectively.
just show
do sometimes
evil queen
ex boyfriend
fewer errors
fiber optic
field position
fifteen balls
fifty hour
grammatical rules
After the adaptation process, I have tested. By the way, my test record is also in Test folder. It was tested by adapted acoustic model. The result is still wrong.
"just shawl"
1) You do not have enough data for training. The adaptation set is 20 large sentences, not 10 small ones
2) Your language model is broken. It has preplexity of 7000 on test set, even default model is better than yours. You need to prepare language model properly otherwise it will damage any improvement in acoustic model.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
1) You do not have enough data for training. The adaptation set is 20 large sentences, not 10 small ones
As I mentioned before, I just need to improve accuracy for command-texts. If I use 20 long senteces, how it can help it ?.
2) Your language model is broken. It has preplexity of 7000 on test set, even default model is better than yours. You need to prepare language model properly otherwise it will damage any improvement in acoustic model.
The default acoustic model was adapted using large sentences. And if I try to adapt the model using some small ones. It will not help to get better result.
Am I right ? If I am right,so I need a acoustic model that is trained using command-texts.
Is there any command-control acoustic model that I can use for that ?
By the way, Have you gone through my bat. file that has my whole cmd command lines for the adaptation ? It is created considering the documetation. And I just want to ensure, there is no wrong commands in there.
Thanks alot
Last edit: kk_huk 2016-08-01
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi guys,
I have tried to adapt en-us acoustic model following the document. I have also mixed my language model with existing en-us.lm file following this document.
Anyway, I have created a record "close active transaction" . And test it using sphinx4 with the models that I mentioned above. Sphinx4 got the result like "close to transaction".
footnote : My language model have the "close active transaction" sentence
So, to get better accuracy, my acoustic model needs to be adapted . So I did.
These are my steps:
1 - Generatin acoustic feature files
2 - Converting the sendump and mdef files
3 - Accumulating observation counts
4 - Creating transformation with MLLR
5 - MLLR doceding
6 - Updating the acoustic model files with MAP
7 - Recreating the adapted sendump file
8- Tunning speech recognition accuracy creating hyp file
9- Tunning speech recognition accuracy word align script test
I ignored 4. and 5. step because of nickolay's answer.
Every step has a log file and they have no any error ( you can check them )
I have adapted my acoustic model 10 times using the same record to get better accuracy. However the result is the same.
I have uploaded my models, so you can check them with this link .
What Am I Missing Here?
Last edit: kk_huk 2016-07-23
MAP adaptation requires about 10-20 utterances, you have just 1.
Adaptation is also not quite effective for the accented speech. It is better to train the model instead and adapt the dictionary to your pronunciation.
In Training Acoustic Model For CMUSphinx documentation :
And in Adapting the default acoustic model tutorial:
I know that I need more recordings to adaptation for accent speech accourding to the latter tutorial.
Anyway, I need to make progress in that with adapting acoustic model. And I really need you guys help to adapt more effectively.
Edit: I have tested 19 copied record for that but doesn't change anything. Until I add an alternative phonetic speel that fits in my accent like below. It works.
active(2) AA K T IH V
if you mean, Is there any tool that you know to write this dynamically ?
Thanks.
Last edit: kk_huk 2016-07-25
It is not reasonable to copy same recording, you need different recordings. Also, adaptation works for slight accents, it does not work for such a different pronunciation like yours.
There is no such tool, you can develop it yourself.
You could spend much more time trying wrong way avoiding to ask proper questions with necessary details.
Ye, I tested, It doesn't work with this way.
Could you please give me a sample/scenario to test adaptation process whether It works or not ? I need to ensure that I adapt my acoustic model properly. Because, either my whole test cases were already works before the adaptation process or the process doesn't change any accuracy just like my first/main question in this file.
Thanks for your response Nickolay,
If you are unable to record artic sentences yourself you can download them on festvox website:
http://festvox.org/cmu_arctic/cmu_arctic/cmu_us_slt_arctic/wav/
I can record a sound properly to use in my adaption process. You probably misunderstood me. I just need to ensure whether I adapt properly, or I don't. I need to show the adaptation process improve the accuracy.
In my adaptation process, it doesn't change anything.
For example, I called "just show" speech using my models and dictionary in sphinx4. And I need to tell you, the words were called using en-us accent. The sphinx4 gave me "just shawl" as a result. I check my dic file to ensure that words were called with right way.
I mean, In "close active transaction" sample, I was calling AActive instead of AE C tive. That is why It does't get the right result ( I guess) . So I went through my dic file to check the words pronunciation speeling.
in my dic file:
shawl SH AO L
show SH OW
I realised that I have already called SHOW properly, So, to change the wrong result, adaptation would be great to get improve the accuracy. I have recorded 10 different utterances "Just show" using en-us accent. With these sources, I have adapted my acoustic model. After this process, I have tested, and It doesn't change anything.
Oh my gosh, Why ?
Last edit: kk_huk 2016-07-27
You did not adapt properly
You need to record diverse sentences as explained in tutorial, not same sentence 10 times.
Could you please go through to my adaptation folder for that ? In this folder, there is a bat file that I run for the adaptation process.So, you can also check my adaptest.bat file.
I really don't get it . 10 different "Just show" adaptation didn't work. But I should expect that 10 diverse sentences records will adapt my acoustic model to "just show". How it can be ?
By the way, my main aim is improve accuracy of "just show" sentences. To improve it, I need to use some record that are different sentences from "just show" sentence. I called below sentences respectively.
After the adaptation process, I have tested. By the way, my test record is also in Test folder. It was tested by adapted acoustic model. The result is still wrong.
"just shawl"
please check my folder.
Last edit: kk_huk 2016-07-27
You have several issues
1) You do not have enough data for training. The adaptation set is 20 large sentences, not 10 small ones
2) Your language model is broken. It has preplexity of 7000 on test set, even default model is better than yours. You need to prepare language model properly otherwise it will damage any improvement in acoustic model.
Hi Nickolay,
As I mentioned before, I just need to improve accuracy for command-texts. If I use 20 long senteces, how it can help it ?.
The default acoustic model was adapted using large sentences. And if I try to adapt the model using some small ones. It will not help to get better result.
Am I right ? If I am right,so I need a acoustic model that is trained using command-texts.
Is there any command-control acoustic model that I can use for that ?
By the way, Have you gone through my bat. file that has my whole cmd command lines for the adaptation ? It is created considering the documetation. And I just want to ensure, there is no wrong commands in there.
Thanks alot
Last edit: kk_huk 2016-08-01
Is there anyone to help me ? I need some help everyone. I am badly stuck on adaptation issue and just want some advice.