CMU Sphinx / Forums / Help: error at .wav file while training acoustic

Nasir Hussain - 2010-11-11

oops my bad type

perl $SPHINXTRAINDIR/scripts_pl/Runall.pl

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Basit Mahmood - 2010-11-11

I have run this command(./scripts_pl/RunAll.pl). Here is the extract from
tutorial

To train just run

./scripts_pl/RunAll.pl

and it will go through all the required stages. It will take few minutes to
train. On large databases training could take month.

During the stages the most important stage is the first one which checks that
everything is configured correctly and your input data is consistent. Do not
ignore the errors reported on the first 00.verify_all step.

The typical output during decoding will look like:

Baum welch starting for 2 Gaussian(s), iteration: 3 (1 of 1)
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Normalization for iteration: 3
Current Overall Likelihood Per Frame = 30.6558644286942
Convergence Ratio = 0.633864444461992
Baum welch starting for 2 Gaussian(s), iteration: 4 (1 of 1)
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Normalization for iteration: 4

This scripts process all required steps to train the model. In the scripts
directory (./scripts_pl), there are several directories numbered sequentially
from 00 through 99. Each directory either has a directory named slave.pl or
it has a single file with extension .pl. Sequentially go through the
directories and execute either the the slave.pl or the single .pl file, as
below.

perl scripts_pl/00.verify/verify_all.pl
perl scripts_pl/10.vector_quantize/slave.VQ.pl
perl scripts_pl/20.ci_hmm/slave_convg.pl
perl scripts_pl/30.cd_hmm_untied/slave_convg.pl
perl scripts_pl/40.buildtrees/slave.treebuilder.pl
perl scripts_pl/45.prunetree/slave-state-tying.pl
perl scripts_pl/50.cd_hmm_tied/slave_convg.pl
perl scripts_pl/90.deleted_interpolation/deleted_interpolation.pl

The scripts will launch jobs on your machine, and the jobs will take a few
minutes each to run through. Before you run any script, note the directory
contents of your current directory. After you run each slave*.pl note the
contents again. Several new directories will have been created. These
directories contain files which are being generated in the course of your
training. At this point you need not know about the contents of these
directories, though some of the directory names may be self explanatory and
you may explore them if you are curious.

Now i am asking for the next step

This scripts process all required steps to train the model. In the scripts
directory (./scripts_pl), there are several directories numbered sequentially
from 00 through 99. Each directory either has a directory named slave.pl or
it has a single file with extension .pl. Sequentially go through the
directories and execute either the the slave.pl or the single .pl file, as
below.
....

Here i am getting error. That i told you

cd scripts_pl/

You have new mail in /var/spool/mail/root

cd 00.verify/

ls

verify_all.pl

verify_all.pl

-bash: verify_all.pl: command not found

perl verify_all.pl

Configuration (e.g. etc/sphinx_train.cfg) not defined
Compilation failed in require at verify_all.pl line 47.
BEGIN failed--compilation aborted at verify_all.pl line 47.
You have new mail in /var/spool/mail/root

cd ..

You have new mail in /var/spool/mail/root

cd ..

Thanks

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Basit Mahmood - 2010-11-11

Sorry it is working :) I was confused :) sorry

perl scripts_pl/00.verify/verify_all.pl

MODULE: 00 verify training files
O.S. is case sensitive ("A" != "a").
Phones will be treated as case sensitive.
Phase 1: DICT - Checking to see if the dict and filler dict agrees with the
phonelist file.
Found 23 words using 27 phones
Phase 2: DICT - Checking to make sure there are not duplicate entries in the
dictionary
Phase 3: CTL - Check general format; utterance length (must be positive);
files exist
Phase 4: CTL - Checking number of lines in the transcript should match lines
in control file
Phase 5: CTL - Determine amount of training data, see if n_tied_states seems
reasonable.
Total Hours Training: 0.00349594017094017
This is a small amount of data, no comment at this time
Phase 6: TRANSCRIPT - Checking that all the words in the transcript are in the
dictionary
Words in dictionary: 20
Words in filler dictionary: 3
Phase 7: TRANSCRIPT - Checking that all the phones in the transcript are in
the phonelist, and all phones in the phonelist appear at least once

perl scripts_pl/01.vector_quantize/slave.VQ.pl

MODULE: 01 Vector Quantization
Skipped for continuous models
You have new mail in /var/spool/mail/root

thanks

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nasir Hussain - 2010-11-12

Hello Basit,

Sorry it is working :) I was confused :) sorry

Galti to galti se hoti hai na?
If you got any problem you know where to find me :)

-Nasir

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Basit Mahmood - 2010-11-12

Hello Nasir,
Hope you will be fine. Sorry to bother you again:). I have run all the
commands that tutorial said.

perl scripts_pl/00.verify/verify_all.pl
perl scripts_pl/10.vector_quantize/slave.VQ.pl
perl scripts_pl/20.ci_hmm/slave_convg.pl
perl scripts_pl/30.cd_hmm_untied/slave_convg.pl
perl scripts_pl/40.buildtrees/slave.treebuilder.pl
perl scripts_pl/45.prunetree/slave-state-tying.pl
perl scripts_pl/50.cd_hmm_tied/slave_convg.pl
perl scripts_pl/90.deleted_interpolation/deleted_interpolation.pl

But when i run decode command. I get this

./scripts_pl/decode/slave.pl

MODULE: DECODE Decoding using models previously trained
Decoding 7 segments starting at 0 (part 1 of 1)
0%
Aligning results to find error rate
Can't open /usr/basit/sphinx/tutorial/testdb/result/testdb-1-1.match
word_align.pl failed with error code 65280 at ./scripts_pl/decode/slave.pl
line 172.

perl scripts_pl/decode/slave.pl

MODULE: DECODE Decoding using models previously trained
Decoding 7 segments starting at 0 (part 1 of 1)
0%
Aligning results to find error rate
Can't open /usr/basit/sphinx/tutorial/testdb/result/testdb-1-1.match
word_align.pl failed with error code 65280 at scripts_pl/decode/slave.pl line
172.

cd ..

I want to ask it's because of my small vocabulary or something wrong going
here?

Also in the model_parameters directory there are several directories. Like

testdb.cd_cont_1000
testdb.cd_cont_1000_1
testdb.cd_cont_1000_2
testdb.cd_cont_1000_4
testdb.cd_cont_1000_8
testdb.cd_cont_untied
testdb.ci_cont
testdb.ci_cont_flatinitial
testdb.cd_cont_initial

which one is my model that i will point out in my sphinx4 xml config file.
Also in the xml config file

<property name="location" value="the path to the model folder<br>for example \<your_training_folder>/model_parameters/\<your_model_name>.cd_cont_\<senones>"> </property>

If i put this model suppose testdb.cd_cont_1000 in my winows xp folder
(E:\basit\testdb.cd_cont_1000). Then this will become

<property name="location" value="E:\basit\testdb.cd_cont_1000"> </property>

and same goes for lm.DMP, .dic and .filler. Is it?
Thanks

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nasir Hussain - 2010-11-12

Hello Basit,

I want to ask it's because of my small vocabulary or something wrong going
here?

I Frankly dont know about the decode stuff as i havent done it myself
personally.

which one is my model that i will point out in my sphinx4 xml config file.

See you can either choose
testdb.cd_cont_1000 or testdb.ci_cont
Depending upon your test wave files.Now as i know about your wav files, So for
your purpose you can use testdb.ci_cont as you have very few vocabulary,
But you could have choosen testdb.cd_cont_1000 if only you had continuos
speech in your sentences

Also in the xml config file " <property name="location" value="the path to the model folder for example \<your_training_folder>/model_parameters/\<your_model_name>.cd_cont_\<senones>">
" If i put this model suppose testdb.cd_cont_1000 in my winows xp folder
(E:\basit\testdb.cd_cont_1000). Then this will become " <property name="location" value="E:\basit\testdb.cd_cont_1000"> " and same goes for
lm.DMP, .dic and .filler. Is it? Thanks</property></property>

Yes Exactly...:)

-Nasir

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Basit Mahmood - 2010-11-12

Hi,
It's nice to hear you again. Thanks:) cha gaye tum :) I want to ask one thing
can i check my model to see it's progress. I found this tutorial

http://sphinx.subwiki.com/sphinx/index.php/Hello_World_Decoder_QuickStart_G
uide

I have also created a file cfgfile

-samprate 16000
-hmm /usr/basit/sphinx/tutorial/testdb/model_parameters/testdb.cd_cont_1000
-dict /usr/basit/sphinx/tutorial/testdb/etc/testdb.dic
-fdict /usr/basit/sphinx/tutorial/testdb/etc/testdb.filler
-lm /usr/basit/sphinx/tutorial/testdb/etc/testdb.lm.DMP

but i am unable to run this command

sphinx3_livepretend ctlfile . cfgfile

where this sphinx3_livepretend located?

Also i want to ask one thing in advance about .gram file. As i will use my
model so i want to ask when i created .gram file then what words i will put in
the .gram file. Words that i use in my model or .gram file can contain any
kind of words.

Secondly suppose i again train my model for large vocabulary. Then in this
case will i speak the same sentence by different people. As you know my model,
you know that it contains voice of mine only. When i will train large
vocabulary then i will have to speak words or sentences by different people.
Suppose the sentence

MY NAME IS BASIt

is said by three people Nasir, Asad and Shahrukh. Then how will i mention it
in my .fileids and .transcription file. Like this

basit/myname, Asad/myname, Shahrukh/myname, Nasir/myname

and .transcription file will be like

~~MY NAME IS BASIT~~ (basit/myname) (Asad/myname) (Nasir/myname)

Or in other words what steps should i follow when i train my model for large
vocabulary

too many questions ssheeww:)
Thnaks

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nasir Hussain - 2010-11-12

Hello Basit,

Sorry for replying Late...I was kind of Busy in Something..

where this sphinx3_livepretend located?

This is a Sphinx 3 command.You need to install sphinx 3 as given in the
tutorial..And install it with sudo preference :)

Also i want to ask one thing in advance about .gram file. As i will use my
model so i want to ask when i created .gram file then what words i will put in
the .gram file. Words that i use in my model or .gram file can contain any
kind of words.

The .gram files should contain the words that you want to get recognised..It
should not contain random words that you dont want to get recognised.

Secondly suppose i again train my model for large vocabulary. Then in this
case will i speak the same sentence by different people. As you know my model,
you know that it contains voice of mine only. When i will train large
vocabulary then i will have to speak words or sentences by different people.
Suppose the sentence

See it totally depends upon your need.If you want your model to be user
specific. Than you can record your own voice for the acoustic model.You can
just record paragraphs from a book or so and thats all... But if you want your
model to be recognised by other people too. Than you will have to add voice of
other people in your model to get good recognition ...:)

is said by three people Nasir, Asad and Shahrukh. Then how will i mention it
in my .fileids and .transcription file. Like this " basit/myname, Asad/myname,
Shahrukh/myname, Nasir/myname "

Yes the file structure should me like this

and .transcription file will be like " ~~MY NAME IS BASIT~~
(basit/myname) (Asad/myname) (Nasir/myname) "

No. It should me like this

<s> MY NAME IS BASIT </s> (basit/myname) <s> MY NAME IS BASIT </s> (Asad/myname) <s> MY NAME IS BASIT </s> (Nasir/myname)

-Nasir
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Basit Mahmood - 2010-11-12

Thanks:) no problem:) ye to hota ha :)
Actually i have to made model after sometime that is recognized by other
people. As regard to .transcript file it is clear how will i make
transcription file for same sentence spoken by different people. But in
.flieds will i separate with commas or use different line. Means the syntax is
comma seperated. Like

Asad/myname, Shahrukh/myname, Nasir/myname

or new line seperated

Asad/myname
Sharukh/myname
Nasir/myname

As regard to grammar file i want to ask. You know that i have a very small
vocabulary in my model (testdb). Suppose i train model with three words
only(Please suppose only :) ) hello, Nasir, Basit. Just these three words. Now
when i will made grammar file, can i use words like animal, how r u, what\s
going wrong, in my .gram file. My model don't have these words that i am using
in grammar file. Or you can say i want to ask is grammar file depends on
vocabulary used in model or it is a separate thing that doesn't care about
acoustic model, language model.

Thanks

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nasir Hussain - 2010-11-13

Hello Basit,

Actually i have to made model after sometime that is recognized by other
people. As regard to .transcript file it is clear how will i make
transcription file for same sentence spoken by different people. But in
.flieds will i separate with commas or use different line. Means the syntax is
comma seperated. Like " Asad/myname, Shahrukh/myname, Nasir/myname " or new
line seperated " Asad/myname Sharukh/myname Nasir/myname "

Use new line...:D

As regard to grammar file i want to ask. You know that i have a very small
vocabulary in my model (testdb). Suppose i train model with three words
only(Please suppose only :) ) hello, Nasir, Basit. Just these three words. Now
when i will made grammar file, can i use words like animal, how r u, what\s
going wrong, in my .gram file. My model don't have these words that i am using
in grammar file. Or you can say i want to ask is grammar file depends on
vocabulary used in model or it is a separate thing that doesn't care about
acoustic model, language model.

Lol. See grammer file depends upon the dictonary file.If the word is not
present in dictonary than you cannot use it in your grammer.On the other hand
acoustic model is totally a probalistic model.Unless and untill you feed it
with huge amount of data you cannot keep your expectations high to get good
result.Are you getting what i am saying???

-Nasir

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Basit Mahmood - 2010-11-22

Hello Nasir,
Sorry. You know why i am saying sorry to you. I have mentioned it in my email
that i send to you.Anyways. You mean to say that grammar file does not depend
upon acoustic model but it depends on dictionary. Grammar file does not
concern about the pronunciation spoken by different people. It just rely on
words or sentences that are define in the dictionary. Is it?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

error at .wav file while training acoustic

Speech Recognition Toolkit

Forums

Help

error at .wav file while training acoustic document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

cd scripts_pl/

cd 00.verify/

ls

verify_all.pl

perl verify_all.pl

cd ..

cd ..

perl scripts_pl/00.verify/verify_all.pl

perl scripts_pl/01.vector_quantize/slave.VQ.pl

./scripts_pl/decode/slave.pl

perl scripts_pl/decode/slave.pl

cd ..

error at .wav file while training acoustic