Hi we are trying to create a model for Hindi Speech Recongiztion
We have put the data and logfile in the following location
ftp://123.176.44.99/hindi_asr/
credentials are: anandftp (username) password (password)
Here are the steps that we have done:
We have built the language model using http://www.speech.cs.cmu.edu/tools/lmtool.html. We have submitted our corpus text (ftp://123.176.44.99/hindi_asr/an4_test/etc/latest_dictionary.txt) and we got the language model which is (ftp://123.176.44.99/hindi_asr/an4_test/etc/an4_test.lm.DMP
This corpus text we have transliterated into english from hindi (is it ok ?)
Using the above language model, we created acoustic model as per the tutorial. Training data is in ftp://123.176.44.99/hindi_asr/an4_test/wav/an4_clstk/. We have 2400 odd audio files along with their transcription.
After training, we tested using "sphinxtrain -s decode run". In this testing phase, we have given all fileids (all 2400 fileids) along with their transcriptions from the training folder itself.
We got at the end the following message:
MODULE: DECODE Decoding using models previously trained
Decoding 2404 segments starting at 0 (part 1 of 1)
0%
Aligning results to find error rate
SENTENCE ERROR: 55.9% (1345/2404) WORD ERROR RATE: 7.2% (3044/42406)
We are using the same files from the training data to test in decoding phase. Is it ok or should we use different audio for training?
After this, we tried to recognize one of the audio (which we have used in testing) using the language model, acoustic model and dictionary which we created in the above steps.
This corpus text we have transliterated into english from hindi (is it ok ?)
It is ok but not necessary.
We are using the same files from the training data to test in decoding phase. Is it ok or should we use different audio for training?
It is not recommended to use same audio for testing, you need to split your data on train and test. Those should not intersect.
While running the test phase, sentence error rate and word error rate are low
Final model is in an4.cd_cont_200 and it is very small due to small datasize. You need larger dataset and most recent pocketsphinx, the result would be
I agree that our dataset is small, but while decoding it is able to decode properly where as while running through the command it is giving the above result.
In an4.align file, the decoded result we observe as: horee kandhon par laathee rakh kar ghar se nikala to dhaniya dvaar par khadee use der tak dekhatee rahee
We are confused, are we doing something wrong ? Are you using the same mode that we mentioned in our link or is it a different one ?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi Anand,
We are struggling with the same issue with the latest cmusphinx4-5prealpha build. I see you have been successful in transcribing hindi audio. I'm thinking it's something to do with the way we have our corpus setup. Can you share the ftp details, so we can see how u have formed the corpus to get the results?
Thanks,
Arun
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi we are trying to create a model for Hindi Speech Recongiztion
We have put the data and logfile in the following location
ftp://123.176.44.99/hindi_asr/
credentials are: anandftp (username) password (password)
Here are the steps that we have done:
This corpus text we have transliterated into english from hindi (is it ok ?)
Using the above language model, we created acoustic model as per the tutorial. Training data is in ftp://123.176.44.99/hindi_asr/an4_test/wav/an4_clstk/. We have 2400 odd audio files along with their transcription.
After training, we tested using "sphinxtrain -s decode run". In this testing phase, we have given all fileids (all 2400 fileids) along with their transcriptions from the training folder itself.
We got at the end the following message:
MODULE: DECODE Decoding using models previously trained
Decoding 2404 segments starting at 0 (part 1 of 1)
0%
Aligning results to find error rate
SENTENCE ERROR: 55.9% (1345/2404) WORD ERROR RATE: 7.2% (3044/42406)
We are using the same files from the training data to test in decoding phase. Is it ok or should we use different audio for training?
After this, we tried to recognize one of the audio (which we have used in testing) using the language model, acoustic model and dictionary which we created in the above steps.
command:
pocketsphinx_continuous -infile wav/an4_clstk/hindi/hindi_0010.wav -hmm model_parameters/an4.ci_cont_flatinitial/ -lm etc/an4.lm.DMP -dict etc/an4.dic
But the accuracy is very poor and we are not getting a single word getting recongized.
While running the test phase, sentence error rate and word error rate are low
Please guide us, where we are wrong.
Thanks
Anand
It is ok but not necessary.
It is not recommended to use same audio for testing, you need to split your data on train and test. Those should not intersect.
Final model is in an4.cd_cont_200 and it is very small due to small datasize. You need larger dataset and most recent pocketsphinx, the result would be
which is about accurate
Hi Nickolay,
thanks for prompt reply, but when i use in my system (we are using pocketsphinx-5prealpha) with the following command -
pocketsphinx_continuous -infile wav/an4_clstk/hindi/hindi_0010.wav -hmm model_parameters/an4.cd_cont_200/ -lm etc/an4.lm.DMP -dict etc/an4.dic
we get the following result : OONT DIYE
I agree that our dataset is small, but while decoding it is able to decode properly where as while running through the command it is giving the above result.
In an4.align file, the decoded result we observe as: horee kandhon par laathee rakh kar ghar se nikala to dhaniya dvaar par khadee use der tak dekhatee rahee
We are confused, are we doing something wrong ? Are you using the same mode that we mentioned in our link or is it a different one ?
You need to compile latest sphinxbase and pocketsphinx from github.
Hi Nickolay, we set -cmninit 71, and we have given long sentences and we are able to get good results. Thank you.
I also see you used english phonemes for your dictionary. It is not a good idea, it is better to use hindi phoneset from here:
https://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/Hindi/cmusphinx-hi-5.2.tar.gz
Hi Anand,
We are struggling with the same issue with the latest cmusphinx4-5prealpha build. I see you have been successful in transcribing hindi audio. I'm thinking it's something to do with the way we have our corpus setup. Can you share the ftp details, so we can see how u have formed the corpus to get the results?
Thanks,
Arun