I am running a training session using SphinxTrain. The process seems to be completing successfully since I get the message at the end in the appropriate log file: "Likelihoods have converged! Baum Welch training completed!
".
However when I look into log files, I notice errors like so during the training:
ERROR: "........\src\libs\libmodinv\gauden.c", line 1700: var (mgau= 852, feat= 0, density=3, component=0) < 0
Not sure if I can ignore these errors. Can anyone throw light on this?
Thanks,
Kishore
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thanks a lot Nickolay for your response. Please find my response below.
>> binlm2arpa should also work. What particular problems do you have with this application?
I am getting the following error, if I run the tool for example on wsj5kc.Z.DMP:
D:\lmsphinx>binlm2arpa -binary wsj5kc.Z.DMP -arpa wsj5kc.lm
Reading binary language model from wsj5kc.Z.DMP...Error : Language model file ws
j5kc.Z.DMP appears to be corrupted.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I am evaluating Sphinx4 to replace SAPI based codebase. Sphinx4 has been very impressive so far. I am right now conducting a small experiment with SphinxTrain and Sphinx:
1. Use a TTS application with At&T natural voices (16kHz and 16 bit) to read a text file and encode it in 16kHz and 16 bit wav file format.
2. Build an acoustic model based on the wave file generated in step 1.
3. Package the acoustic model into a jar file and use it for decoding in Sphinx4.
4. So far so good. No problems till step 3.
However if I generate another TTS wave file using just a subset of words with which acoustic model has been trained and try to decode it using the same acoustic model, I am getting inaccurate recognition. (in fact the words recognised are completely different sequence from what has been encoded, but the words are still a subset of those defined in grammar/dictionary).
Also I tried out generating the same wav file content (i.e., the speech is same) as the one generated by TTS in step 1 and I tried to use it with th,e acoustic model packaged in step 3, the recognition also is totally inaccurate.
Thanks in advance for your reply,
Kishore
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
> right now conducting a small experiment with SphinxTrain and Sphinx
As stated before you have not enough data to train the model. Also, to recognize English I suggest you to use existing models instead of trying to shoot yourself in the leg with the invented wheel.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
As we are trying to use speech recognition in engineering field, our intention is to extend the WSJ or anyother existing suitable model by adding technical vocabulary. Hence we are planning to do the following. Please let me know if it makes sense.
1) Take the WSJ dictionary and add needed technical vocabulary.
2) Create wave files from text based on the comprehensive dictionary created from step #1. For this step we are plannig to use text-to-speech convertor.
3) With the above inputs we would like to generate an extended version of WSJ acoustic model, which we hope will support our technical vocabulary.
we are aware of the possibility of extending WSJ by using Addenda functionality, however we need a process which will enable us to add thousands of engineering vocabulary.
Hope this explains our intentions clearly and we hope to hear valuable input on this approach from you.
thanks
Kishore
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I don't get why do you want to train a new acoustic model moreover I don't think it's a good idea to use TTS output for training. To extend the dictionary to the area you want you need the following:
1) Extend the cmudict (please note that there is no such thing as WSJ dictionary) with the pronuciations of missing words. You could use Sequitur g2p for example to do this automatically. Though the result will require manual review from the speech expert.
2) Collect a lot of text from your target domain and train a language model with cmuclmtk.
That's all.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thanks for your suggestion. I have some questions:
There will be engineering words not in cmudict dictionary which will be added to it. Is it enough if we just train the language model? What about training acoustic model for these new words?
The distribution from WSJ/Hub4 acoustic model jar files contains a .dmp file for language model. To merge this .dmp with our custom language lm file (either in ascii or binary), we need to have the ascii version of wsj/hub4 lm file. Where can we get those?
There is a tool in cmuclmtk by name binlm2arpa. Is this for converting .dmp file to ascii text? If so it is not able to read the .dmp file supplied with acoustic model jar. Is there an alternative to convert dmp file to ascii lm?
While building a language model using cmuclmtk tools, I am assuming that we need to provide a transciption in the form of a text file as an input. Where can we get this transcription file for say, cmudict words? Is it necessary to give well formed sentences in the transcription file or is it ok if we give the word set as an input?
Looking forward to your reply,
thanks,
Kishore
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
> What about training acoustic model for these new
words?
Acoustic model contains the property of phones, not words. There is no need to retrain the model.
> the ascii version of wsj/hub4 lm file. Where can we get those?
I suggest you to use lm_giga language model instead of wsj/hub4. It has ascii version. Though conversion is not a problem as well, sphinx3_lm_convert from sphinx3 can do this for example.
> If so it is not able to read the .dmp file supplied with acoustic model jar.
binlm2arpa should also work. What particular problems do you have with this application?
> Is it necessary to give well formed sentences in the
transcription file or is it ok if we give the word set as an input?
I'm not sure what transcription are you talking about here. The input source for any language model is just a text on the topic you are going to recognize. The cars in your case. It can be newspaper articles collection and so on. The text needs to be preprocessed of course, you need to remove punctuation, special characters and so on.
Your question about word list means you don't understand what language model is. I suggest you to google/read text book about it.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
I am running a training session using SphinxTrain. The process seems to be completing successfully since I get the message at the end in the appropriate log file: "Likelihoods have converged! Baum Welch training completed!
".
However when I look into log files, I notice errors like so during the training:
ERROR: "........\src\libs\libmodinv\gauden.c", line 1700: var (mgau= 852, feat= 0, density=3, component=0) < 0
Not sure if I can ignore these errors. Can anyone throw light on this?
Thanks,
Kishore
Thanks a lot Nickolay for your response. Please find my response below.
>> binlm2arpa should also work. What particular problems do you have with this application?
I am getting the following error, if I run the tool for example on wsj5kc.Z.DMP:
D:\lmsphinx>binlm2arpa -binary wsj5kc.Z.DMP -arpa wsj5kc.lm
Reading binary language model from wsj5kc.Z.DMP...Error : Language model file ws
j5kc.Z.DMP appears to be corrupted.
Hm, cmuclmtk binary reading/writing code is in broken state. Please use sphinx3_lm_convert instead.
Hm, it seems that cmuclmtk appears to be broken. Please use sphinx3_lm_convert until it will be repaired.
Read
https://sourceforge.net/search/index.php?group_id=1904&form_submit=Search&search_subject=1&search_body=1&type_of_search=forums&all_words=gauden.c
in particular
https://sourceforge.net/forum/message.php?msg_id=6292641
I also changed the code so it dumps more descriptive message now. I hope it would be enough for you.
Thanks Nickolay. Despite the errors the acoustic model generated by SphinxTrain is working well with regards to speech recognition.
Hi Nickolay,
I am evaluating Sphinx4 to replace SAPI based codebase. Sphinx4 has been very impressive so far. I am right now conducting a small experiment with SphinxTrain and Sphinx:
1. Use a TTS application with At&T natural voices (16kHz and 16 bit) to read a text file and encode it in 16kHz and 16 bit wav file format.
2. Build an acoustic model based on the wave file generated in step 1.
3. Package the acoustic model into a jar file and use it for decoding in Sphinx4.
4. So far so good. No problems till step 3.
However if I generate another TTS wave file using just a subset of words with which acoustic model has been trained and try to decode it using the same acoustic model, I am getting inaccurate recognition. (in fact the words recognised are completely different sequence from what has been encoded, but the words are still a subset of those defined in grammar/dictionary).
Also I tried out generating the same wav file content (i.e., the speech is same) as the one generated by TTS in step 1 and I tried to use it with th,e acoustic model packaged in step 3, the recognition also is totally inaccurate.
Thanks in advance for your reply,
Kishore
> right now conducting a small experiment with SphinxTrain and Sphinx
As stated before you have not enough data to train the model. Also, to recognize English I suggest you to use existing models instead of trying to shoot yourself in the leg with the invented wheel.
Nickolay:
As we are trying to use speech recognition in engineering field, our intention is to extend the WSJ or anyother existing suitable model by adding technical vocabulary. Hence we are planning to do the following. Please let me know if it makes sense.
1) Take the WSJ dictionary and add needed technical vocabulary.
2) Create wave files from text based on the comprehensive dictionary created from step #1. For this step we are plannig to use text-to-speech convertor.
3) With the above inputs we would like to generate an extended version of WSJ acoustic model, which we hope will support our technical vocabulary.
we are aware of the possibility of extending WSJ by using Addenda functionality, however we need a process which will enable us to add thousands of engineering vocabulary.
Hope this explains our intentions clearly and we hope to hear valuable input on this approach from you.
thanks
Kishore
I don't get why do you want to train a new acoustic model moreover I don't think it's a good idea to use TTS output for training. To extend the dictionary to the area you want you need the following:
1) Extend the cmudict (please note that there is no such thing as WSJ dictionary) with the pronuciations of missing words. You could use Sequitur g2p for example to do this automatically. Though the result will require manual review from the speech expert.
2) Collect a lot of text from your target domain and train a language model with cmuclmtk.
That's all.
Nickolay:
Thanks for your suggestion. I have some questions:
There will be engineering words not in cmudict dictionary which will be added to it. Is it enough if we just train the language model? What about training acoustic model for these new words?
The distribution from WSJ/Hub4 acoustic model jar files contains a .dmp file for language model. To merge this .dmp with our custom language lm file (either in ascii or binary), we need to have the ascii version of wsj/hub4 lm file. Where can we get those?
There is a tool in cmuclmtk by name binlm2arpa. Is this for converting .dmp file to ascii text? If so it is not able to read the .dmp file supplied with acoustic model jar. Is there an alternative to convert dmp file to ascii lm?
While building a language model using cmuclmtk tools, I am assuming that we need to provide a transciption in the form of a text file as an input. Where can we get this transcription file for say, cmudict words? Is it necessary to give well formed sentences in the transcription file or is it ok if we give the word set as an input?
Looking forward to your reply,
thanks,
Kishore
> What about training acoustic model for these new
words?
Acoustic model contains the property of phones, not words. There is no need to retrain the model.
> the ascii version of wsj/hub4 lm file. Where can we get those?
I suggest you to use lm_giga language model instead of wsj/hub4. It has ascii version. Though conversion is not a problem as well, sphinx3_lm_convert from sphinx3 can do this for example.
> If so it is not able to read the .dmp file supplied with acoustic model jar.
binlm2arpa should also work. What particular problems do you have with this application?
> Is it necessary to give well formed sentences in the
transcription file or is it ok if we give the word set as an input?
I'm not sure what transcription are you talking about here. The input source for any language model is just a text on the topic you are going to recognize. The cars in your case. It can be newspaper articles collection and so on. The text needs to be preprocessed of course, you need to remove punctuation, special characters and so on.
Your question about word list means you don't understand what language model is. I suggest you to google/read text book about it.