hi everyone,
I have so many doubts following the robust group tutorial, actually i want to
train for telugu language. I have prepared the telugu dictionary, filler
dictionary and phoneme set as mentioned in tutorial.
1) when I run the script "perl scripts_pl/make_feats.pl -ctl
etc/an4_train.fileids" I could not find any assistance asking for me to
speak. At what instance it will ask me to speak and record my voice.
2)I have to trian for phonemes or for words?
Iam just the beginner, can you please help me friends...
Thank you,
Sandhya,
mumbai,India.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
> 1) when I run the script "perl scriptspl/makefeats.pl -ctl
etc/an4_train.fileids" I could not find any assistance asking for me to
speak. At what instance it will ask me to speak and record my voice.
The audio files are recorded beforehand and placed into a wav folder. You can
find files from an4 there already. Add your ones if you need. Don't forget to
update etc/train.fileids and etc/train.transcriptions after that.
> 2)I have to trian for phonemes or for words?
I don't understand this question. You are just training the model, you aren't
training it for something or with something. If you are asking if you
dictionary must be phone-based or word-based, it depends on the size of your
vocabulary.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thank you sir for your reply..
Now I have completed the acoustic model training and next is to build the
language model. I used the LMtool to build the laguage model. Presently I want
to use it in sphinx3 and it requires .lm file to be converted in to .dmp
format .
My doubts are:
1) I have seen in a forum that lm_convert is the tool to convert to DMP
format. How to use that ?? And what is command to convert .lm file to .dmp
file?? And where (at what folder) I have to be inorder to execute that
command??
2) After the first step what I have to do??
Waiting for you reply sir..
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
// 1) I have seen in a forum that lm_convert is the tool to convert to DMP
format. How to use that ?? And what is command to convert .lm file to .dmp
file?? And where (at what folder) I have to be inorder to execute that
command?? solved............................................... by using
lm3g2dmp
** Presently I have acoustic model and language model......
1) how to integrate this acoustic and language model to form a speech
recognition system...???
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello nickolay I have some doubts regarding the training data, I have searched
through the forums but couldn't understand well. sorry If it was answered
before...
1) I have the dictionary with 100 words, How many times I have to train the
each word to get the accurate model..
2) And what is the minimum Database to be trained to recognise speech?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
> 1) I have the dictionary with 100 words, How many times I have to train
the each word to get the accurate model..
Nobody will tell you this because basically such estimate doesn't exists. You
should avoid even such words like "accurate model". Learn to use
quantitative estimates instead
Also learn by exaple, the similar task is covered by RM1 resource management
database. Find it's description here:
Thank you sir for helping me till now and clarifying my doubts...
While following the robust tutorial for AN4 database the .wav files are
converted to .sph format whereas in quickstart subwiki manual they are using
.raw format...
1) which format I have to follow ? (presently iam following the robust
tutorial and AN4 database..)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The an4 uses sph format because it an4 originally was available in this
format. There is no need to convert audio from wav to sph, you can just record
your database in wav files and use configuration in etc/sphinx_train.cfg to
point the format of the files. On the top of the files there are settings to
choose MSWAV instead of nist. Also you need to change extension from sph to
wav in configuration file.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Phones will be treated as case sensitive.
Phase 1: DICT - Checking to see if the dict and filler dict agrees with the
phonelist file.
Found 195 words using 29 phones
Phase 2: DICT - Checking to make sure there are not duplicate entries in the
dictionary
Phase 3: CTL - Check general format; utterance length (must be positive);
files exist
WARNING: CTL line does not parse correctly:
Phase 4: CTL - Checking number of lines in the transcript should match lines
in control file
Phase 5: CTL - Determine amount of training data, see if n_tied_states seems
reasonable.
Total Hours Training: 0.074827564102564
This is a small amount of data, no comment at this time
Phase 6: TRANSCRIPT - Checking that all the words in the transcript are in the
dictionary
Words in dictionary: 192
Words in filler dictionary: 3
Phase 7: TRANSCRIPT - Checking that all the phones in the transcript are in
the phonelist, and all phones in the phonelist appear at least once
Something failed:
(/home/taruns/Desktop/tutorial/telugu/scripts_pl/00.verify/verify_all.pl)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
In the process of decoding, I have recorded 2 sentences and followed the
commands in robust group tutorial and the error is.......
MODULE: DECODE Decoding using models previously trained
Decoding 3 segments starting at 0 (part 1 of 1)
0%
This step had 3 ERROR messages and 4 WARNING messages. Please check the log
file for details.
Aligning results to find error rate
word_align.pl failed with error code 65280 at scripts_pl/decode/slave.pl line
173.
How can I overcome this problem?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
sorry nickolay if my questions are very simple to solve.......
My questions:
1) In my test- transcript file, I have given the same audio files what I have
trained before and iam expecting 100% correct as the audio files are same in
both the cases. But my error rate is 75%... why it was happened like this???
Thanks for being patient in answering my questions....
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
> In my test- transcript file, I have given the same audio files what I
have trained before and iam expecting 100% correct as the audio files are same
in both the cases. But my error rate is 75%... why it was happened like
this???
It's expected. Accuracy is never 100%, that's the basic rule you need to learn
> 1) ERROR: "cont_mgau.c", line 666: Weight normalization failed
for 3 senone
You have too many senones for the small amount of training data
> 3)ERROR: "vithist.c", line 818: No word exit in frame 179,
using exits from frame 74
Your transcription doesn't match audio recorded.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
hi everyone,
I have so many doubts following the robust group tutorial, actually i want to
train for telugu language. I have prepared the telugu dictionary, filler
dictionary and phoneme set as mentioned in tutorial.
1) when I run the script "perl scripts_pl/make_feats.pl -ctl
etc/an4_train.fileids" I could not find any assistance asking for me to
speak. At what instance it will ask me to speak and record my voice.
2)I have to trian for phonemes or for words?
Iam just the beginner, can you please help me friends...
Thank you,
Sandhya,
mumbai,India.
> 1) when I run the script "perl scriptspl/makefeats.pl -ctl
etc/an4_train.fileids" I could not find any assistance asking for me to
speak. At what instance it will ask me to speak and record my voice.
The audio files are recorded beforehand and placed into a wav folder. You can
find files from an4 there already. Add your ones if you need. Don't forget to
update etc/train.fileids and etc/train.transcriptions after that.
> 2)I have to trian for phonemes or for words?
I don't understand this question. You are just training the model, you aren't
training it for something or with something. If you are asking if you
dictionary must be phone-based or word-based, it depends on the size of your
vocabulary.
Thank you sir for your reply..
Now I have completed the acoustic model training and next is to build the
language model. I used the LMtool to build the laguage model. Presently I want
to use it in sphinx3 and it requires .lm file to be converted in to .dmp
format .
My doubts are:
1) I have seen in a forum that lm_convert is the tool to convert to DMP
format. How to use that ?? And what is command to convert .lm file to .dmp
file?? And where (at what folder) I have to be inorder to execute that
command??
2) After the first step what I have to do??
Waiting for you reply sir..
hello nickolay,
// 1) I have seen in a forum that lm_convert is the tool to convert to DMP
format. How to use that ?? And what is command to convert .lm file to .dmp
file?? And where (at what folder) I have to be inorder to execute that
command?? solved............................................... by using
lm3g2dmp
** Presently I have acoustic model and language model......
1) how to integrate this acoustic and language model to form a speech
recognition system...???
> how to integrate this acoustic and language model to form a speech
recognition system
Probably you need to try another tutorial:
http://sphinx.subwiki.com/sphinx/index.php/Hello_World_Decoder_QuickStart_Gui
de
Hello nickolay I have some doubts regarding the training data, I have searched
through the forums but couldn't understand well. sorry If it was answered
before...
1) I have the dictionary with 100 words, How many times I have to train the
each word to get the accurate model..
2) And what is the minimum Database to be trained to recognise speech?
> 1) I have the dictionary with 100 words, How many times I have to train
the each word to get the accurate model..
Nobody will tell you this because basically such estimate doesn't exists. You
should avoid even such words like "accurate model". Learn to use
quantitative estimates instead
Also learn by exaple, the similar task is covered by RM1 resource management
database. Find it's description here:
http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC93S3B
> 2) And what is the minimum Database to be trained to recognise speech?
Again, there is no such thing. If you are interested in dictation, you can
find out a description of dictation databases
http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC94S13A
Thank you sir for helping me till now and clarifying my doubts...
While following the robust tutorial for AN4 database the .wav files are
converted to .sph format whereas in quickstart subwiki manual they are using
.raw format...
1) which format I have to follow ? (presently iam following the robust
tutorial and AN4 database..)
The an4 uses sph format because it an4 originally was available in this
format. There is no need to convert audio from wav to sph, you can just record
your database in wav files and use configuration in etc/sphinx_train.cfg to
point the format of the files. On the top of the files there are settings to
choose MSWAV instead of nist. Also you need to change extension from sph to
wav in configuration file.
1) what is the reason for this?
Phones will be treated as case sensitive.
Phase 1: DICT - Checking to see if the dict and filler dict agrees with the
phonelist file.
Found 195 words using 29 phones
Phase 2: DICT - Checking to make sure there are not duplicate entries in the
dictionary
Phase 3: CTL - Check general format; utterance length (must be positive);
files exist
WARNING: CTL line does not parse correctly:
Phase 4: CTL - Checking number of lines in the transcript should match lines
in control file
Phase 5: CTL - Determine amount of training data, see if n_tied_states seems
reasonable.
Total Hours Training: 0.074827564102564
This is a small amount of data, no comment at this time
Phase 6: TRANSCRIPT - Checking that all the words in the transcript are in the
dictionary
Words in dictionary: 192
Words in filler dictionary: 3
Phase 7: TRANSCRIPT - Checking that all the phones in the transcript are in
the phonelist, and all phones in the phonelist appear at least once
Something failed:
(/home/taruns/Desktop/tutorial/telugu/scripts_pl/00.verify/verify_all.pl)
> WARNING: CTL line does not parse correctly:
This line tells you that your your etc/<db>_train.fileids has empty
line. You should remove it.
In the process of decoding, I have recorded 2 sentences and followed the
commands in robust group tutorial and the error is.......
MODULE: DECODE Decoding using models previously trained
Decoding 3 segments starting at 0 (part 1 of 1)
0%
This step had 3 ERROR messages and 4 WARNING messages. Please check the log
file for details.
Aligning results to find error rate
word_align.pl failed with error code 65280 at scripts_pl/decode/slave.pl line
173.
How can I overcome this problem?
previous problem **solved***
sorry nickolay if my questions are very simple to solve.......
My questions:
1) In my test- transcript file, I have given the same audio files what I have
trained before and iam expecting 100% correct as the audio files are same in
both the cases. But my error rate is 75%... why it was happened like this???
Thanks for being patient in answering my questions....
Can I know about these decoding errors:
1) ERROR: "cont_mgau.c", line 666: Weight normalization failed for 3
senones
2)ERROR: "vithist.c", line 818: No word exit in frame 323, using
exits from frame 122
3)ERROR: "vithist.c", line 818: No word exit in frame 179, using
exits from frame 74
> In my test- transcript file, I have given the same audio files what I
have trained before and iam expecting 100% correct as the audio files are same
in both the cases. But my error rate is 75%... why it was happened like
this???
It's expected. Accuracy is never 100%, that's the basic rule you need to learn
> 1) ERROR: "cont_mgau.c", line 666: Weight normalization failed
for 3 senone
You have too many senones for the small amount of training data
> 3)ERROR: "vithist.c", line 818: No word exit in frame 179,
using exits from frame 74
Your transcription doesn't match audio recorded.