I'm following the Sphinx tutorial and got stuck at the perliminary decode
step. I'm running Windows XP and I compiled the nightly builds of sphinxtrain,
sphinxbase and sphinx3 on VS C++ 2008 Express. I did the setup_tutorial for
the rm1 database with cygwin and the preliminary training. Then when I did
'perl scripts_pl/decode/slave.pl', this error came out:
Could not find executable for C:\cygwin\home\BJiaShen\rm1\bin\sphinx3_decode
at C:.cygwin/home/BJiaShen\rm1\scripts_pl\lib/SphinxTrain/Util.pm line 299.
Aligning results to find error rate
Can't open C:/cygwin/home/BJiaShen/rm1/results/rm1-1-1.match
word_align.pl failed with error code 65280 at scripts_pl/decode/slave.pl line
173
Could anyone help me with this problem? Under what step of the tutorial is
that sphinx3 supposed to get there? I also didn't get a word alignment
program, but I don't suppose that's causing the problem. Sorry I'm quite a
newbie so please bear with me.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
You probably need to copy sphinx3_decode.exe and sphinxbase.dll to bin
training folder from sphinx3 folder manually.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2010-07-02
Hey it works! Thank you so much! Actually I tried something similar yesterday:
I copied all the bin/Release files from Sphinx3 and sphinxbase.dll to the bin
for rm1 (with just sphinx3_decode exe and sphinxbase.dll, the second word
align error comes out) but the script was stuck at 0% for such a long time
that I thought it had hung or something. Turns out that it does take a long
time, like 20 min in my case. Again, thanks!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2010-07-02
Hey, now that I got the tutorial going I thought I'd try to train with my own
training data. The problem is that when I run setup_SphinxTrain.pl, there's an
error at this portion:
Generating SphinxTrain configuration file in etc/sphinx_train.cfg
Backing up existing configuration file to etc/sphinx_train.cfg.orig
Can't open etc/sphinx_train.template or ./etc/sphinx_train.cfg
My command was "perl scripts_pl/setup_SphinxTrain.pl -force -task limited
-sphinxtraindir ."
When I checked out the SphinxTrain/etc folder, the sphinx_train.cfg had been
renamed to sphinx_train.cfg.orig, so it seems that the script renames, then
tries to open the file it had just renamed (sphinx_train.cfg), which is rather
strange. Could anyone help me with this? If I rename the .orig back to .cfg
and run the command, it just renames it back again and the same error comes
out.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2010-07-04
Hello, sorry for all the questions, I've got some problems running RunAll.pl.
Right now I'm using Ubuntu 10.04 and the stable releases of SphinxTrain,
Sphinx3 and sphinxbase. (The previous set up was in my office, and now I'm
trying it out at home. Office doesn't allow installation of Ubuntu)
When I ran RunAll.pl, I got this:
Phase 2: Flat initialize
FATAL_ERROR: "corpus.c", line 262: input string too long. Truncated.
Then I went through the forums and learnt that you're supposed to have a
newline for the .fileids and .transcription files. So I went ahead to do that,
then I got this:
Phase 2: Flat initialize
FATAL_ERROR: "corpus.c", line 1647: Failed to get the files after 100 retries
of getting MFCC(about 300 seconds)
This step had 101 ERROR messages and 0 WARNING messages. Please check the log
file for details.
Something failed:
(/home/u0700322/Downloads/limited/scripts_pl/20.ci_hmm/slave_convg.pl)
Then when I went back to remove the newlines, I got this:
Phase 7: TRANSCRIPT - Checking that all the phones in the transcript are in
the phonelist, and all phones in the phonelist appear at least once
Something failed:
(/home/u0700322/Downloads/limited/scripts_pl/00.verify/verify_all.pl)
Which simply left me rather confused. Is there anyway around this? This is my
etc folder WITHOUT the newlines at the end of those files: http://www.megaupload.com/?d=7UPSZLRD
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2010-07-04
Hey, I'm not sure if this information is important, but I'm using my own
training data under the task 'limited', and I set up SphinxTrain and sphinx3
using setup_SphinxTrain.pl and setup_sphinx3.pl.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
You don't setup database properly. In your file transcription file has emtpy
line which is not allowed. Your database is too small. I don't see the point
in training this model as well, it's way better to use stock model which is
way more accurate then everything else you can train.
I'm sure you can do that, just try to be more careful.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2010-07-05
Hmm could I ask for your advice? I'm planning to have an experiment on the
accuracy of speech recognition under noise using noise-cancelling Bluetooth
headsets. The independent variables are noise (40, 65, 90 dBA) and the
headsets (one non-noise-cancelling headset and two Bluetooth headsets with
noise-cancelling).
The task is command and control, you can see in the /etc that it's only about
20 commands with about as many words. I was planning to use a word model
instead of the phone model, that's why I used my own training data. Another
reason is that I'm from Singapore, and we have our own accent. Is it still
better to use an4/rm1?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The task is command and control, you can see in the /etc that it's only
about 20 commands with about as many words. I was planning to use a word model
instead of the phone model, that's why I used my own training data.
I don't think how your experiment is related to training this model but to let
you know that to train good model with 20 commands you need to have several
hours of recordings of several hundreds of speakers. You can't train anything
good with just 20 recordings. Word model aren't recommended for CMUSphinx as
well.
Another reason is that I'm from Singapore, and we have our own accent
Again I'm not sure how accent related to testing task. It applies for each
noise condition, isn't it?
Is it still better to use an4/rm1?
I don't see how they fit into your task.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2010-07-08
The task is command and control, you can see in the /etc that it's only
about 20 commands with about as many words. I was planning to use a word model
instead of the phone model, that's why I used my own training data.
I don't think how your experiment is related to training this model but to let
you know that to train good model with 20 commands you need to have several
hours of recordings of several hundreds of speakers. You can't train anything
good with just 20 recordings. Word model aren't recommended for CMUSphinx as
well.
With regards to the word model thing, I was following this project here: http://hk.myblog.yahoo.com/jw!afd6dGGRHBRkp2l
aqwk198fg/article?mid=629
It suggested using a word model, and I'm not sure if I understood it
correctly, but it seems to suggest recording your own training data as well.
Oh I used my own recordings to get a word model because we need to provide the
transcriptions for each recording right? So I was thinking since the
vocabulary for an4 and rm1 are quite limited (they don't have 'engage' and
'release', for instance, which are used in my commands), I'll need to provide
recordings of these words if I wanted a word model. Correct me if I'm wrong.
Another reason is that I'm from Singapore, and we have our own accent
Again I'm not sure how accent related to testing task. It applies for each
noise condition, isn't it?
Hmm.. if the accent is different, the acoustic models for the phonemes should
be different too right? Won't it cause the recognizer to be more inaccurate if
your test data is a Singaporean accent while your training data is an American
accent? I need the accuracy at low noise level to be high so that when it
drops at higher noise levels, the drop is still significant, e.g. 90% to 50%,
instead of 40% to 30%.
Is it still better to use an4/rm1?
I don't see how they fit into your task.
Sorry when you said 'use stock model' (post 8), did you mean the an4/rm1
databases? Because that's what I interpreted it as.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
With regards to the word model thing, I was following this project here: ht
tp://hk.myblog.yahoo.com/jw!afd6dGGRHBRkp2laqw
k198fg/article?mid=629 It suggested using a word model, and I'm not sure if I
understood it correctly, but it seems to suggest recording your own training
data as well.
This newbie blog is not really good source for advises. It's nice he is trying
new technology but the way he is doing that is not correct.
Hmm.. if the accent is different, the acoustic models for the phonemes
should be different too right? Won't it cause the recognizer to be more
inaccurate if your test data is a Singaporean accent while your training data
is an American accent? I need the accuracy at low noise level to be high so
that when it drops at higher noise levels, the drop is still significant, e.g.
90% to 50%, instead of 40% to 30%.
High accuracy is gained through adaptation of the acoustic model, not with
training your new one.
Sorry when you said 'use stock model' (post 8), did you mean the an4/rm1
databases? Because that's what I interpreted it as.
When I talk about stock models I mean model that goes with pocketsphinx
distribution (hub4wsj_sc_8k). This is one of the best models available for
you.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2010-07-13
hmm if I want to recognize speech from a bone conduction microphone I need to
train with recordings only from that microphone right? Will it still work?
When I look at the spectrogram for those recordings there's pretty much no
high frequencies, which I think is typical of bone conduction recordings.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
hmm if I want to recognize speech from a bone conduction microphone I need
to train with recordings only from that microphone right?
Not necessary
Will it still work?
Yes
When I look at the spectrogram for those recordings there's pretty much no
high frequencies, which I think is typical of bone conduction recordings.
Frequency response of the channel is largely normalized during feature
extraction with cepstral mean normalization. With proper frequency range
selected, it shouldn't cause any issues.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2010-07-14
Hello,
I tried out pocketsphinx instead of sphinx3 as you suggested. I got
pocketsphinx_continuous to work in Ubuntu 10.04 but I couldn't do the same in
Windows XP. Can you help me check where I had gone wrong? These are my steps
in Ubuntu:
1) Download, unzip and rename stable releases of pocketsphinx and sphinxbase.
2) Create language model using lmtools, download and unzip.
Now the directory looks like:
/pocketsphinx
/sphinxbase
/limited/etc/9345 (.lm and .dic are inside /9345)
3) commands:
cd sphinxbase
./configure
make
cd ../pocketsphinx
./configure
make clean all
make test
make install
cd ../limited
perl ../pocketsphinx/scripts/setup_sphinx.pl -task limited
bin/pocketsphinx_continuous -hmm ../pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k
-lm etc/9345/9345.lm -dict etc/9345/9345.dic
As you said, this model works quite well for my application, at least for a
regular air-conductive microphone. Thanks alot for that suggestion!
Now the problem comes when I try to do the same thing with cygwin under
Windows XP. I followed the same steps, but after I run set_sphinx.pl, the /bin
is empty. Do you know why that's the case?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2010-07-14
Sorry, typo: last line is "setup_sphinx.pl", not "set_sphinx.pl"
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
If you are using perl from cygwin, it might have problems. We usually
recommend to use ActivePerl.
Anyway, I don't see the point for you to run this perl script. That
setup_sphinx.pl is used for training. You can copy files yourself, can't you?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2010-07-15
Hello, yeah actually I have several questions now.
1) Yeah I did install ActivePerl, just like the tutorial (http://www.speech.c
s.cmu.edu/sphinx/tutorial.html) suggested, but I don't know if I did it correctly because I've never
worked with Perl before. OK this is not a question haha.
2) Anyway today I built the stable releases of pocketsphinx and sphinxbase
with Visual C++ 2008 Express, and there wasn't a pocketsphinx_livecontinuous
in /pocketsphinx/src/programs/ or anywhere else in /pocketsphinx. This was the
same case as when I compiled with Cygwin. When I did it in ubuntu (as I wrote
in my last post), livecontinuous was in that folder. So did I do something
wrong somewhere? Or is this expected?
3) I don't know if this problem is specific to sphinx, but do I need to do
some special thing to get pocketsphinx_livecontinuous to work with my
Bluetooth headset? This is on Ubuntu. I got my headset paired with the
computer and I can record audio with the Ubuntu sound recorder using the
headset microphone, but I'm thinking Sphinx does not detect the microphone
when the script runs, because it doesn't register anything (no hypothesis)
when I speak after 'READY'. I did a forum search on this but the results were
related to Sphinx4. Could you help me here?
Jia Shen
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Stable release has build bug. You need to try subversion snapshot instead
but do I need to do some special thing to get pocketsphinx_livecontinuous to
work with my Bluetooth headset?
If bluetooth device is not default alsa device, you need to specify it's name
with -adcdev option. Make sure alsa can capture from bluetooth and that you
compiled pocketsphinx with alsa support. Also check mixer settings.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello,
I'm following the Sphinx tutorial and got stuck at the perliminary decode
step. I'm running Windows XP and I compiled the nightly builds of sphinxtrain,
sphinxbase and sphinx3 on VS C++ 2008 Express. I did the setup_tutorial for
the rm1 database with cygwin and the preliminary training. Then when I did
'perl scripts_pl/decode/slave.pl', this error came out:
Could not find executable for C:\cygwin\home\BJiaShen\rm1\bin\sphinx3_decode
at C:.cygwin/home/BJiaShen\rm1\scripts_pl\lib/SphinxTrain/Util.pm line 299.
Aligning results to find error rate
Can't open C:/cygwin/home/BJiaShen/rm1/results/rm1-1-1.match
word_align.pl failed with error code 65280 at scripts_pl/decode/slave.pl line
173
Could anyone help me with this problem? Under what step of the tutorial is
that sphinx3 supposed to get there? I also didn't get a word alignment
program, but I don't suppose that's causing the problem. Sorry I'm quite a
newbie so please bear with me.
You probably need to copy sphinx3_decode.exe and sphinxbase.dll to bin
training folder from sphinx3 folder manually.
Hey it works! Thank you so much! Actually I tried something similar yesterday:
I copied all the bin/Release files from Sphinx3 and sphinxbase.dll to the bin
for rm1 (with just sphinx3_decode exe and sphinxbase.dll, the second word
align error comes out) but the script was stuck at 0% for such a long time
that I thought it had hung or something. Turns out that it does take a long
time, like 20 min in my case. Again, thanks!
Hey, now that I got the tutorial going I thought I'd try to train with my own
training data. The problem is that when I run setup_SphinxTrain.pl, there's an
error at this portion:
Generating SphinxTrain configuration file in etc/sphinx_train.cfg
Backing up existing configuration file to etc/sphinx_train.cfg.orig
Can't open etc/sphinx_train.template or ./etc/sphinx_train.cfg
My command was "perl scripts_pl/setup_SphinxTrain.pl -force -task limited
-sphinxtraindir ."
When I checked out the SphinxTrain/etc folder, the sphinx_train.cfg had been
renamed to sphinx_train.cfg.orig, so it seems that the script renames, then
tries to open the file it had just renamed (sphinx_train.cfg), which is rather
strange. Could anyone help me with this? If I rename the .orig back to .cfg
and run the command, it just renames it back again and the same error comes
out.
Hello
The setup script creates layout in a folder. You need to run it from the
folder where you will train, not in sphinxtrain directory.
Try to run this script with -help option to get outline:
Hello, sorry for all the questions, I've got some problems running RunAll.pl.
Right now I'm using Ubuntu 10.04 and the stable releases of SphinxTrain,
Sphinx3 and sphinxbase. (The previous set up was in my office, and now I'm
trying it out at home. Office doesn't allow installation of Ubuntu)
When I ran RunAll.pl, I got this:
Phase 2: Flat initialize
FATAL_ERROR: "corpus.c", line 262: input string too long. Truncated.
Then I went through the forums and learnt that you're supposed to have a
newline for the .fileids and .transcription files. So I went ahead to do that,
then I got this:
Phase 2: Flat initialize
FATAL_ERROR: "corpus.c", line 1647: Failed to get the files after 100 retries
of getting MFCC(about 300 seconds)
This step had 101 ERROR messages and 0 WARNING messages. Please check the log
file for details.
Something failed:
(/home/u0700322/Downloads/limited/scripts_pl/20.ci_hmm/slave_convg.pl)
Then when I went back to remove the newlines, I got this:
Phase 7: TRANSCRIPT - Checking that all the phones in the transcript are in
the phonelist, and all phones in the phonelist appear at least once
Something failed:
(/home/u0700322/Downloads/limited/scripts_pl/00.verify/verify_all.pl)
Which simply left me rather confused. Is there anyway around this? This is my
etc folder WITHOUT the newlines at the end of those files:
http://www.megaupload.com/?d=7UPSZLRD
Hey, I'm not sure if this information is important, but I'm using my own
training data under the task 'limited', and I set up SphinxTrain and sphinx3
using setup_SphinxTrain.pl and setup_sphinx3.pl.
You don't setup database properly. In your file transcription file has emtpy
line which is not allowed. Your database is too small. I don't see the point
in training this model as well, it's way better to use stock model which is
way more accurate then everything else you can train.
I'm sure you can do that, just try to be more careful.
Hmm could I ask for your advice? I'm planning to have an experiment on the
accuracy of speech recognition under noise using noise-cancelling Bluetooth
headsets. The independent variables are noise (40, 65, 90 dBA) and the
headsets (one non-noise-cancelling headset and two Bluetooth headsets with
noise-cancelling).
The task is command and control, you can see in the /etc that it's only about
20 commands with about as many words. I was planning to use a word model
instead of the phone model, that's why I used my own training data. Another
reason is that I'm from Singapore, and we have our own accent. Is it still
better to use an4/rm1?
I don't think how your experiment is related to training this model but to let
you know that to train good model with 20 commands you need to have several
hours of recordings of several hundreds of speakers. You can't train anything
good with just 20 recordings. Word model aren't recommended for CMUSphinx as
well.
Again I'm not sure how accent related to testing task. It applies for each
noise condition, isn't it?
I don't see how they fit into your task.
I don't think how your experiment is related to training this model but to let
you know that to train good model with 20 commands you need to have several
hours of recordings of several hundreds of speakers. You can't train anything
good with just 20 recordings. Word model aren't recommended for CMUSphinx as
well.
With regards to the word model thing, I was following this project here:
http://hk.myblog.yahoo.com/jw!afd6dGGRHBRkp2l
aqwk198fg/article?mid=629
It suggested using a word model, and I'm not sure if I understood it
correctly, but it seems to suggest recording your own training data as well.
Oh I used my own recordings to get a word model because we need to provide the
transcriptions for each recording right? So I was thinking since the
vocabulary for an4 and rm1 are quite limited (they don't have 'engage' and
'release', for instance, which are used in my commands), I'll need to provide
recordings of these words if I wanted a word model. Correct me if I'm wrong.
Again I'm not sure how accent related to testing task. It applies for each
noise condition, isn't it?
Hmm.. if the accent is different, the acoustic models for the phonemes should
be different too right? Won't it cause the recognizer to be more inaccurate if
your test data is a Singaporean accent while your training data is an American
accent? I need the accuracy at low noise level to be high so that when it
drops at higher noise levels, the drop is still significant, e.g. 90% to 50%,
instead of 40% to 30%.
I don't see how they fit into your task.
Sorry when you said 'use stock model' (post 8), did you mean the an4/rm1
databases? Because that's what I interpreted it as.
This newbie blog is not really good source for advises. It's nice he is trying
new technology but the way he is doing that is not correct.
High accuracy is gained through adaptation of the acoustic model, not with
training your new one.
When I talk about stock models I mean model that goes with pocketsphinx
distribution (hub4wsj_sc_8k). This is one of the best models available for
you.
hmm if I want to recognize speech from a bone conduction microphone I need to
train with recordings only from that microphone right? Will it still work?
When I look at the spectrogram for those recordings there's pretty much no
high frequencies, which I think is typical of bone conduction recordings.
Not necessary
Yes
Frequency response of the channel is largely normalized during feature
extraction with cepstral mean normalization. With proper frequency range
selected, it shouldn't cause any issues.
Hello,
I tried out pocketsphinx instead of sphinx3 as you suggested. I got
pocketsphinx_continuous to work in Ubuntu 10.04 but I couldn't do the same in
Windows XP. Can you help me check where I had gone wrong? These are my steps
in Ubuntu:
1) Download, unzip and rename stable releases of pocketsphinx and sphinxbase.
2) Create language model using lmtools, download and unzip.
Now the directory looks like:
/pocketsphinx
/sphinxbase
/limited/etc/9345 (.lm and .dic are inside /9345)
3) commands:
cd sphinxbase
./configure
make
cd ../pocketsphinx
./configure
make clean all
make test
make install
cd ../limited
perl ../pocketsphinx/scripts/setup_sphinx.pl -task limited
bin/pocketsphinx_continuous -hmm ../pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k
-lm etc/9345/9345.lm -dict etc/9345/9345.dic
As you said, this model works quite well for my application, at least for a
regular air-conductive microphone. Thanks alot for that suggestion!
Now the problem comes when I try to do the same thing with cygwin under
Windows XP. I followed the same steps, but after I run set_sphinx.pl, the /bin
is empty. Do you know why that's the case?
Sorry, typo: last line is "setup_sphinx.pl", not "set_sphinx.pl"
If you are using perl from cygwin, it might have problems. We usually
recommend to use ActivePerl.
Anyway, I don't see the point for you to run this perl script. That
setup_sphinx.pl is used for training. You can copy files yourself, can't you?
Hello, yeah actually I have several questions now.
1) Yeah I did install ActivePerl, just like the tutorial (http://www.speech.c
s.cmu.edu/sphinx/tutorial.html) suggested, but I don't know if I did it correctly because I've never
worked with Perl before. OK this is not a question haha.
2) Anyway today I built the stable releases of pocketsphinx and sphinxbase
with Visual C++ 2008 Express, and there wasn't a pocketsphinx_livecontinuous
in /pocketsphinx/src/programs/ or anywhere else in /pocketsphinx. This was the
same case as when I compiled with Cygwin. When I did it in ubuntu (as I wrote
in my last post), livecontinuous was in that folder. So did I do something
wrong somewhere? Or is this expected?
3) I don't know if this problem is specific to sphinx, but do I need to do
some special thing to get pocketsphinx_livecontinuous to work with my
Bluetooth headset? This is on Ubuntu. I got my headset paired with the
computer and I can record audio with the Ubuntu sound recorder using the
headset microphone, but I'm thinking Sphinx does not detect the microphone
when the script runs, because it doesn't register anything (no hypothesis)
when I speak after 'READY'. I did a forum search on this but the results were
related to Sphinx4. Could you help me here?
Jia Shen
Stable release has build bug. You need to try subversion snapshot instead
If bluetooth device is not default alsa device, you need to specify it's name
with -adcdev option. Make sure alsa can capture from bluetooth and that you
compiled pocketsphinx with alsa support. Also check mixer settings.