I\'m trying to create acoustic model and use it in Asterisk-Unimrcp-
PocketSphinx scheme. I have about 150 utterances recorded. Firstly I train
16khz acoustic model and use it to recognize commands, spoken into microphone,
the quality is OK. Then I tried to train 8khz acoustic model to use in IVR. To
do that I downsample wav files to 8khz, modify the following parameters in
etc/feat.params file:
and rerun make_feats.pl and makeFetRunAll.pl scripts. The scripts run without
errors.
After that I have got two problems.
1. model_parameters/arm8khzR1.ci_cont/feat.params contains values which corresponds to 16khz model, aka
-alpha 0.97
-dither yes
-doublebw no -nfilt 40
-ncep 13 -lowerf 133.33334 -upperf 6855.4976
-nfft 512
-wlen 0.0256
-transform legacy
-feat 1s_c_d_dd
-agc none
-cmn current
-varnorm no
which crashes unimrcpserver with the following error
FATAL_ERROR: \"fe_sigproc.c\", line 399: WTF, 5078.125000 < -15.625000 >
5734.375000
If I change lowerf, upperf, nfilt parameters values to 200,3500,31 correspondingly the unimrcpserver stop crashing but the accuracy is very low. Note that in both cases (16khz, 8khz) I use the same jsgf grammar, which contains 5 words.
I have a hunch that sphinx train creates 16khz model despite the specified
configuration.
Can you please suggest what I miss ?
Thanks,
Zaven.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I notice that make_feats.pl script, for some reason, did not use the
configuration from etc\feat.params but uses the hard coded one (see excerpt
from the source code). I change the hardcoded values and now it uses correct
values to feat the utterances and generates correct feat.params for acoustic
model. But the accuracy is still very low.
Theoretically can two models generated from the same collection of utterances,
but with different sampling rates (16khz, 8khz) have a such difference in
recognition accuracy?
I recorded the utterances using desktop microphone in 16khz quality, then
downsampled them to 8khz using audacity and train acoustic model for both
cases.
Is it recommended to use studio quality microphone for recordings or common
headset microphone is enough?
I notice that make_feats.pl script, for some reason, did not use the
configuration from etc\feat.params but uses the hard coded one (see excerpt
from the source code).
In development version it's all refactored.
I change the hardcoded values and now it uses correct values to feat the
utterances and generates correct feat.params for acoustic model. But the
accuracy is still very low.
Most likely you missed something else.You need to share your training folder
in order to get suggestion on what to fix. See
Theoretically can two models generated from the same collection of
utterances, but with different sampling rates (16khz, 8khz) have a such
difference in recognition accuracy?
No
I recorded the utterances using desktop microphone in 16khz quality, then
downsampled them to 8khz using audacity and train acoustic model for both
cases. Is it recommended to use studio quality microphone for recordings or
common headset microphone is enough?
There is no requirement for a good microphone. Please read the tutorial, it
explains how to collect the data:
I'm trying to train acoustic model for Armenian language. Since
etc\feats.param is overridden by the values hardcoded in make_feats.pl script,
I modified the script and set the following values
-lowerf 200
-upperf 3500
-nfilt 31
and add '-samprate 8000' which was not present there before.
Decoding shows 50% WER, which is not bad for 1000 word dictionary trained on
150 utterances.
But when I put trained acoustic model into Asterisk-Unimrcp-Pocketsphinx setup
with jsgf grammar consisting of 5 words, it practically did not recognize
anything.
I tried also to decode using pocketshinx_batch, with the same 5 word jsgf
grammar and 8khz wav recordings, the accuracy was very low.
I tried to train model, omiting '-samprate 8000' (I guess it train 16khz model
with truncated frequency diapason) and decode with the same 8khz wav
recordings and jsgf grammar, it accuracy is 100% - it recognized correctly all
words. The later acoustic model give same low accuracy when used in asterisk-
uinimrcp-pocketsphinx setup.
Can you please advice what I missed? How can I train a model to be used with
asterisk-unimrcp?
Thanks!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
To get help you need to share the whole training folder, not just few
subfolders in it.
It's also recommended to use latest sphinxtrain from snapshot. Then you don't
need to edit scripts.
But when I put trained acoustic model into Asterisk-Unimrcp-Pocketsphinx
setup with jsgf grammar consisting of 5 words, it practically did not
recognize anything.
It might be unimrcp setup issue. You need to check unimrcp server log for
details
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I tried to use the latest version of sphinxtrain from svn repository.
'sphinxtrain -t arm8khzR3 setup' command just creates etc folder and put there
feat.param, sphinx_train.cfg files. The other folders and executables I place
there manually.
After training with 500 senons I got 50% WER.
But, still, when I use the model in asterisk-unimrcp setup nothing is
recognized.
Any help is appreciated.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
But, still, when I use the model in asterisk-unimrcp setup nothing is
recognized.
In the asterisk log I see that unimrcp does return result
The recognized input is երկու
I'm not sure what your problem is
I also retrained your acoustic model. The WER is 2.7%, not 50%. Maybe you just
need to follow the tutorial accurately first of all and use the latest
sphinxtrain version.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I have retrained the acoustic module using the latest sphinxtrain installed
from svn respository. In order to check the model I recorded words I use in
jsgf grammar and recognized them with pocketsphinx_batch – all words are
recognized correctly. Now I'm sure that everything is OK with the acoustic
model. Then for further debugging I enabled call recording both in asterisk
and unimrcpserver. Here comes the most interesting part. Firstly I took the
call recorded by asterisk and manually cut it by silences then made separate
wav files and decoded with pocketsphinx_batch. The accuracy was very good.
Then I took parts of the same call which is recorded and cut (!) by
unimrcpserver, converted to wav files and decoded the same way. The accuracy
was very bad. I compared two records (done by asterisk and unimrcpserver) of
the same word using audacity and discovered that the only difference is a
duration of silence preceding and following the pronounced word. Then I tried
to add some silence copy/pasted from the beginning of the utterance to the end
(unimrcpserver cut the recording immediately (!) after the word is pronounced)
and was able to successfully decode. I discover that the silence preceding and
following the spelled word should not be very short or very long otherwise the
word will not be recognized. So the source of the problem is how unimrcpserver
cut the utterances. Do you have any ideas what parameter should be adjusted in
unimrcpserver so it cut the utterance in proper way? Or maybe I can somehow
change the utterances that I used in training so the model will not be so
sensible to the duration of silences surrounding the word?
Regards,
Zaven
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I tried all combination of timeout and level values with no success. The
changes in thees parameters do not significantly change silence periods
preceding and following to the pronounced word. I tried to use English
hub4wsj_sc_8k model with the same 'timeout' and 'level' parameters it works
perfectly. I have feeling that hub4wsj_sc_8k model is not sensitive to
silences as the model I trained. How thees silence periods can affect the
recognition process ? Why if I manually cut the call recordings made by
asterisk pocketsphinx able to recognize most of the words?
During training process I have done my records using headset and now use the
acoustic model with asterisk. Can that be a cause of the problem ?
P.S. I use the latest svn versions of unimrcpserver and sphinxtrain.
Below please find the output from unimrcpserver.
ERROR: "fsg_search.c", line 1099: Final state not reached in frame 35
2012-07-21 17:24:23:661487 Signal Message to
2012-07-21 17:24:23:661512 Wait for incoming messages 62887642d33711e1@pocketsphinx
2012-07-21 17:24:23:661530 Process Message
2012-07-21 17:24:23:661543 Process RECOGNITION-COMPLETE Event 62887642d33711e1@speechrecog
2012-07-21 17:24:23:661554 State Transition RECOGNIZING -> RECOGNIZED 62887642d33711e1@speechrecog
2012-07-21 17:24:23:661566 Signal Message to
2012-07-21 17:24:23:661582 Wait for Messages
2012-07-21 17:24:23:661601 Process Poller Wakeup
2012-07-21 17:24:23:661613 Process Message
2012-07-21 17:24:23:661629 Send MRCPv2 Stream 10.100.77.182:1544 <->
10.100.77.182:42357
MRCP/2.0 129 RECOGNITION-COMPLETE 3 COMPLETE
Channel-Identifier: 62887642d33711e1@speechrecog
Completion-Cause: 000 success
ERROR: "fsg_search.c", line 1099: Final state not reached in frame 18
2012-07-21 17:24:24:021494 Signal Message to
2012-07-21 17:24:24:021518 Wait for incoming messages 62887642d33711e1@pocketsphinx
2012-07-21 17:24:24:021536 Process Message
2012-07-21 17:24:24:021549 Process RECOGNITION-COMPLETE Event 62887642d33711e1@speechrecog
2012-07-21 17:24:24:021560 State Transition RECOGNIZING -> RECOGNIZED 62887642d33711e1@speechrecog
2012-07-21 17:24:24:021572 Signal Message to
2012-07-21 17:24:24:021588 Wait for Messages
2012-07-21 17:24:24:021607 Process Poller Wakeup
2012-07-21 17:24:24:021620 Process Message
2012-07-21 17:24:24:021635 Send MRCPv2 Stream 10.100.77.182:1544 <->
10.100.77.182:42357
MRCP/2.0 129 RECOGNITION-COMPLETE 4 COMPLETE
Channel-Identifier: 62887642d33711e1@speechrecog
Completion-Cause: 000 success
ERROR: "fsg_search.c", line 1099: Final state not reached in frame 73
2012-07-21 17:24:25:481540 Signal Message to
2012-07-21 17:24:25:481565 Wait for incoming messages 62887642d33711e1@pocketsphinx
2012-07-21 17:24:25:481584 Process Message
2012-07-21 17:24:25:481596 Process RECOGNITION-COMPLETE Event 62887642d33711e1@speechrecog
2012-07-21 17:24:25:481608 State Transition RECOGNIZING -> RECOGNIZED 62887642d33711e1@speechrecog
2012-07-21 17:24:25:481620 Signal Message to
2012-07-21 17:24:25:481636 Wait for Messages
2012-07-21 17:24:25:481656 Process Poller Wakeup
2012-07-21 17:24:25:481668 Process Message
2012-07-21 17:24:25:481683 Send MRCPv2 Stream 10.100.77.182:1544 <->
10.100.77.182:42357
MRCP/2.0 129 RECOGNITION-COMPLETE 5 COMPLETE
Channel-Identifier: 62887642d33711e1@speechrecog
Completion-Cause: 000 success
ERROR: "fsg_search.c", line 1099: Final state not reached in frame 12
2012-07-21 17:24:25:731756 Signal Message to
2012-07-21 17:24:25:731783 Wait for incoming messages 62887642d33711e1@pocketsphinx
2012-07-21 17:24:25:731802 Process Message
2012-07-21 17:24:25:731815 Process RECOGNITION-COMPLETE Event 62887642d33711e1@speechrecog
2012-07-21 17:24:25:731827 State Transition RECOGNIZING -> RECOGNIZED 62887642d33711e1@speechrecog
2012-07-21 17:24:25:731839 Signal Message to
2012-07-21 17:24:25:731855 Wait for Messages
2012-07-21 17:24:25:731875 Process Poller Wakeup
2012-07-21 17:24:25:731888 Process Message
2012-07-21 17:24:25:731903 Send MRCPv2 Stream 10.100.77.182:1544 <->
10.100.77.182:42357
MRCP/2.0 129 RECOGNITION-COMPLETE 6 COMPLETE
Channel-Identifier: 62887642d33711e1@speechrecog
Completion-Cause: 000 success
ERROR: "fsg_search.c", line 1099: Final state not reached in frame 80
2012-07-21 17:24:27:321486 Signal Message to
2012-07-21 17:24:27:321510 Wait for incoming messages 62887642d33711e1@pocketsphinx
2012-07-21 17:24:27:321529 Process Message
2012-07-21 17:24:27:321542 Process RECOGNITION-COMPLETE Event 62887642d33711e1@speechrecog
2012-07-21 17:24:27:321553 State Transition RECOGNIZING -> RECOGNIZED 62887642d33711e1@speechrecog
2012-07-21 17:24:27:321565 Signal Message to
2012-07-21 17:24:27:321581 Wait for Messages
2012-07-21 17:24:27:321601 Process Poller Wakeup
2012-07-21 17:24:27:321613 Process Message
2012-07-21 17:24:27:321629 Send MRCPv2 Stream 10.100.77.182:1544 <->
10.100.77.182:42357
MRCP/2.0 129 RECOGNITION-COMPLETE 7 COMPLETE
Channel-Identifier: 62887642d33711e1@speechrecog
Completion-Cause: 000 success
ERROR: "fsg_search.c", line 1099: Final state not reached in frame 85
2012-07-21 17:24:29:021564 Signal Message to
2012-07-21 17:24:29:021590 Wait for incoming messages 62887642d33711e1@pocketsphinx
2012-07-21 17:24:29:021609 Process Message
2012-07-21 17:24:29:021621 Process RECOGNITION-COMPLETE Event 62887642d33711e1@speechrecog
2012-07-21 17:24:29:021633 State Transition RECOGNIZING -> RECOGNIZED 62887642d33711e1@speechrecog
2012-07-21 17:24:29:021644 Signal Message to
2012-07-21 17:24:29:021661 Wait for Messages
2012-07-21 17:24:29:021680 Process Poller Wakeup
2012-07-21 17:24:29:021693 Process Message
2012-07-21 17:24:29:021709 Send MRCPv2 Stream 10.100.77.182:1544 <->
10.100.77.182:42357
MRCP/2.0 129 RECOGNITION-COMPLETE 8 COMPLETE
Channel-Identifier: 62887642d33711e1@speechrecog
Completion-Cause: 000 success
ERROR: "fsg_search.c", line 1099: Final state not reached in frame 8
2012-07-21 17:24:30:371561 Signal Message to
2012-07-21 17:24:30:371587 Wait for incoming messages 62887642d33711e1@pocketsphinx
2012-07-21 17:24:30:371606 Process Message
2012-07-21 17:24:30:371618 Process RECOGNITION-COMPLETE Event 62887642d33711e1@speechrecog
2012-07-21 17:24:30:371629 State Transition RECOGNIZING -> RECOGNIZED 62887642d33711e1@speechrecog
2012-07-21 17:24:30:371641 Signal Message to
2012-07-21 17:24:30:371657 Wait for Messages
2012-07-21 17:24:30:371676 Process Poller Wakeup
2012-07-21 17:24:30:371689 Process Message
2012-07-21 17:24:30:371704 Send MRCPv2 Stream 10.100.77.182:1544 <->
10.100.77.182:42357
MRCP/2.0 130 RECOGNITION-COMPLETE 10 COMPLETE
Channel-Identifier: 62887642d33711e1@speechrecog
Completion-Cause: 000 success
Sorry since you didn't provide the information about your experiments, neither
the data that was used for testing nor the exact decoder configuration it's
hard to give you a detailed answer.
How thees silence periods can affect the recognition process ?
Silence doesn't affect results
Why if I manually cut the call recordings made by asterisk pocketsphinx able
to recognize most of the words?
No idea
During training process I have done my records using headset and now use the
acoustic model with asterisk. Can that be a cause of the problem ?
Unlikely
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Firstly I’d like to thank you for your assistance, I appreciate it very much.
Nickolay I didn’t provide configuration files, because didn’t know what
exactly you need to examine this case. Now as per Hiyassat’s suggestion, I
send you the needed information. https://www.dropbox.com/s/al8r6io1x5k5jj2/sphinx.zip
I did the following experiment:
1. trained an acoustic model using 150 8Khz utterances
2. made a grammar with 5 Armenian words mek(one), erku (two), ereq(three), ayo(yes), voch(no). Armenian words are pronounced exactly as they are written.
3. used (1) and (2) in asterisk-unimrcp-pocketsphinx configuration.
The problem is that the accuracy is very low.
Things I have tried:
1. I enabled both call recording from asterisk (Mixmonitor) and unimrcp. Firstly I take .pcm files generated by unimrcp, covert it to wav files and tried to decode with pocketsphinx_batch
(uni_mek -860)
(uni_erku -2079)
(uni_ereq -1910)
(uni_ayo -3230)
(uni_voch -6895)
As see from the above nothing is recognized.
Then I tried to manually cut the recording made by asterisk (call.wav) by
silences using audacity audio editor, made 5 wav files and decoded them. Here
are the results
մեկ (ast_mek -695)
երկու (ast_erku -1501)
(ast_ereq -1564)
այո (ast_ayo -781)
ոչ (ast_voch -938)
4 from 5 words were recognized correctly. I included call.wav, .pcm files as
well as manually cut files in the zip file.
2. I chose such values for sensitivity level and activity-timeout/inactivity-timeout parameters that unimrcp server correctly cut every pronounced word and placed it in separate .pcm file.
3. I tried to increase/decrease volume of the call in asterisk using Set(VOLUME(TX)=3), Set(VOLUME(RX)=3) commands with no success.
In addition to the above I’d like to mention that English acoustic model used
in the same configuration gives very good results!
Can you please suggest where did I a mistake, what can I try some more ?
P.S. In the zip file I include acoustic model, dictionary and grammar in case
you want to try to decode the wav files by yourself.
Regards,
Zaven
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
As addition to my previous post, please see the following figure, where
comparative analysis of original (made by unimrcp) and modified (where
silences are manually truncated or added) utterances are depicted. As seen
from the figure none of the original utterances are recognized by
pocketsphinx, while 4 from 5 of modified utterances are recognized. Hope this
will help to understand the case.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
zaven1, you need to provide the data files you are using, not just images.
Images can rarely say something useful. To get the fastest answer you need to
provide the whole training folder and the test data.
hiyassat didn't ask you for the right data, he confused you because he was
trying to solve his own problems and jumped into the thread with unrelated
question. Not a great thing to do.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
trained an acoustic model using 150 8Khz utterances
This is not sufficient to train a model, it's even worse since you are trying
to train too many senones (1000) and mixtures (8). You can find the size of
the required data in the tutorial
Decoding shows 50% WER, which is not bad for 1000 word dictionary trained on
150 utterances.
It means your model is not functional
The WER must be less than 5%
I enabled both call recording from asterisk (Mixmonitor) and unimrcp.
Firstly I take .pcm files generated by unimrcp, covert it to wav files and
tried to decode with pocketsphinx_batch (uni_mek -860) (uni_erku -2079)
(uni_ereq -1910) (uni_ayo -3230) (uni_voch -6895) As see from the above
nothing is recognized. Then I tried to manually cut the recording made by
asterisk (call.wav) by silences using audacity audio editor, made 5 wav files
and decoded them. Here are the results մեկ (ast_mek -695) երկու (ast_erku
-1501) (ast_ereq -1564) այո (ast_ayo -781) ոչ (ast_voch -938) 4 from 5 words
were recognized correctly. I included call.wav, .pcm files as well as manually
cut files in the zip file.
Some parts of the asterisk recordings have zero silence regions. You need to
add "-dither yes" to feat.params in your model or in unimrcp options.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Nickolay, "-dither yes" really improved the recognition accuracy. As per
number of senons and mixtures, I tried to experiment with lot of variants.
Values 1000,8 I use in the latest sphinxtrain folder, which I sent you. In my
test environment I set 200 for senons and 2 for mixtures. Now, I plan to
collect 10-15 hour of data to train an acoustic model.
Thanks again for your assistant.
Zaven.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi Nickolay,
I\'m trying to create acoustic model and use it in Asterisk-Unimrcp-
PocketSphinx scheme. I have about 150 utterances recorded. Firstly I train
16khz acoustic model and use it to recognize commands, spoken into microphone,
the quality is OK. Then I tried to train 8khz acoustic model to use in IVR. To
do that I downsample wav files to 8khz, modify the following parameters in
etc/feat.params file:
-samprate 8000.0
-nfilt 31
-lowerf 200.00
-upperf 3500.00
-dither yes
and rerun make_feats.pl and makeFetRunAll.pl scripts. The scripts run without
errors.
After that I have got two problems.
1. model_parameters/arm8khzR1.ci_cont/feat.params contains values which corresponds to 16khz model, aka
-alpha 0.97
-dither yes
-doublebw no
-nfilt 40
-ncep 13
-lowerf 133.33334
-upperf 6855.4976
-nfft 512
-wlen 0.0256
-transform legacy
-feat 1s_c_d_dd
-agc none
-cmn current
-varnorm no
which crashes unimrcpserver with the following error
FATAL_ERROR: \"fe_sigproc.c\", line 399: WTF, 5078.125000 < -15.625000 >
5734.375000
I have a hunch that sphinx train creates 16khz model despite the specified
configuration.
Can you please suggest what I miss ?
Thanks,
Zaven.
Hi Nickolay,
I notice that make_feats.pl script, for some reason, did not use the
configuration from etc\feat.params but uses the hard coded one (see excerpt
from the source code). I change the hardcoded values and now it uses correct
values to feat the utterances and generates correct feat.params for acoustic
model. But the accuracy is still very low.
Theoretically can two models generated from the same collection of utterances,
but with different sampling rates (16khz, 8khz) have a such difference in
recognition accuracy?
I recorded the utterances using desktop microphone in 16khz quality, then
downsampled them to 8khz using audacity and train acoustic model for both
cases.
Is it recommended to use studio quality microphone for recordings or common
headset microphone is enough?
$default_params = <<"EOP";
-alpha 0.97
-dither yes
-doublebw no
-nfilt 40
-ncep 13
-lowerf 133.33334
-upperf 6855.4976
-nfft 512
-wlen 0.0256
EOP
Now run sphinx_fe
$params = $default_params;
$params =~ s/\n/ /gs;
system("bin/wave2feat -verbose yes $params -c \"$ctl\" -$ST::CFG_WAVFILE_TYPE
yes " .
"-di \"$ST::CFG_WAVFILES_DIR\" -ei \"$ST::CFG_WAVFILE_EXTENSION\" ".
"-do \"$ST::CFG_FEATFILES_DIR\" " .
"-eo \"$ST::CFG_FEATFILE_EXTENSION\"".
" @ARGV");
Thanks,
Zaven.
In development version it's all refactored.
Most likely you missed something else.You need to share your training folder
in order to get suggestion on what to fix. See
http://cmusphinx.sourceforge.net/wiki/faq#qwhy_my_accuracy_is_poor
No
There is no requirement for a good microphone. Please read the tutorial, it
explains how to collect the data:
http://cmusphinx.sourceforge.net/wiki/tutorialam#data_preparation
Hi Nickolay,
Please find etc, logdir folders as well as make_feats.pl in the following link
https://www.dropbox.com/sh/1a1ec1eema4s1sv/Kzx_WYEe_k/etc.zip
I'm trying to train acoustic model for Armenian language. Since
etc\feats.param is overridden by the values hardcoded in make_feats.pl script,
I modified the script and set the following values
-lowerf 200
-upperf 3500
-nfilt 31
and add '-samprate 8000' which was not present there before.
Decoding shows 50% WER, which is not bad for 1000 word dictionary trained on
150 utterances.
But when I put trained acoustic model into Asterisk-Unimrcp-Pocketsphinx setup
with jsgf grammar consisting of 5 words, it practically did not recognize
anything.
I tried also to decode using pocketshinx_batch, with the same 5 word jsgf
grammar and 8khz wav recordings, the accuracy was very low.
I tried to train model, omiting '-samprate 8000' (I guess it train 16khz model
with truncated frequency diapason) and decode with the same 8khz wav
recordings and jsgf grammar, it accuracy is 100% - it recognized correctly all
words. The later acoustic model give same low accuracy when used in asterisk-
uinimrcp-pocketsphinx setup.
Can you please advice what I missed? How can I train a model to be used with
asterisk-unimrcp?
Thanks!
To get help you need to share the whole training folder, not just few
subfolders in it.
It's also recommended to use latest sphinxtrain from snapshot. Then you don't
need to edit scripts.
It might be unimrcp setup issue. You need to check unimrcp server log for
details
Whole training folder as well as unimrcpserver.log and asterisk.log can be
found in the below link.
https://www.dropbox.com/sh/1a1ec1eema4s1sv/-gmcPHqtAY
I'll try to train using the latest version of sphinxtrain and let you know
about the results.
I tried to use the latest version of sphinxtrain from svn repository.
'sphinxtrain -t arm8khzR3 setup' command just creates etc folder and put there
feat.param, sphinx_train.cfg files. The other folders and executables I place
there manually.
After training with 500 senons I got 50% WER.
But, still, when I use the model in asterisk-unimrcp setup nothing is
recognized.
Any help is appreciated.
In the asterisk log I see that unimrcp does return result
I'm not sure what your problem is
I also retrained your acoustic model. The WER is 2.7%, not 50%. Maybe you just
need to follow the tutorial accurately first of all and use the latest
sphinxtrain version.
Hi Nickolay,
I have retrained the acoustic module using the latest sphinxtrain installed
from svn respository. In order to check the model I recorded words I use in
jsgf grammar and recognized them with pocketsphinx_batch – all words are
recognized correctly. Now I'm sure that everything is OK with the acoustic
model. Then for further debugging I enabled call recording both in asterisk
and unimrcpserver. Here comes the most interesting part. Firstly I took the
call recorded by asterisk and manually cut it by silences then made separate
wav files and decoded with pocketsphinx_batch. The accuracy was very good.
Then I took parts of the same call which is recorded and cut (!) by
unimrcpserver, converted to wav files and decoded the same way. The accuracy
was very bad. I compared two records (done by asterisk and unimrcpserver) of
the same word using audacity and discovered that the only difference is a
duration of silence preceding and following the pronounced word. Then I tried
to add some silence copy/pasted from the beginning of the utterance to the end
(unimrcpserver cut the recording immediately (!) after the word is pronounced)
and was able to successfully decode. I discover that the silence preceding and
following the spelled word should not be very short or very long otherwise the
word will not be recognized. So the source of the problem is how unimrcpserver
cut the utterances. Do you have any ideas what parameter should be adjusted in
unimrcpserver so it cut the utterance in proper way? Or maybe I can somehow
change the utterances that I used in training so the model will not be so
sensible to the duration of silences surrounding the word?
Regards,
Zaven
Maybe you want to review the documentation before asking:
http://code.google.com/p/unimrcp/wiki/PocketSphinxPlugin#4._Configuration
In recent versions it's called activity-timeout and inactivity-timeout, not
just timeout.
I tried all combination of timeout and level values with no success. The
changes in thees parameters do not significantly change silence periods
preceding and following to the pronounced word. I tried to use English
hub4wsj_sc_8k model with the same 'timeout' and 'level' parameters it works
perfectly. I have feeling that hub4wsj_sc_8k model is not sensitive to
silences as the model I trained. How thees silence periods can affect the
recognition process ? Why if I manually cut the call recordings made by
asterisk pocketsphinx able to recognize most of the words?
During training process I have done my records using headset and now use the
acoustic model with asterisk. Can that be a cause of the problem ?
P.S. I use the latest svn versions of unimrcpserver and sphinxtrain.
Below please find the output from unimrcpserver.
2012-07-21 17:24:22:759424 Receive SIP Event Status 100 Trying
2012-07-21 17:24:22:759501 Receive SIP Event Status 100 Trying
2012-07-21 17:24:22:759538 SIP Call State
2012-07-21 17:24:22:759576 Create Session 0x7fde10001938 <new>
2012-07-21 17:24:22:759597 Remote SDP 0x7fde10001938 <new>
v=0
o=UniMRCPClient 8386702476934667627 1645802084994678102 IN IP4 10.100.77.182
s=-
c=IN IP4 10.100.77.182
t=0 0
m=application 9 TCP/MRCPv2 1
a=setup:active
a=connection:new
a=resource:speechrecog
a=cmid:1
m=audio 4020 RTP/AVP 0 8 96 101
a=rtpmap:0 PCMU/8000
a=rtpmap:8 PCMA/8000
a=rtpmap:96 L16/8000
a=rtpmap:101 telephone-event/8000
a=fmtp:101 0-15
a=sendonly
a=ptime:20
a=mid:1 </new></new>
2012-07-21 17:24:22:759686 Signal Message to
2012-07-21 17:24:22:759721 Process Message
2012-07-21 17:24:22:759741 Dispatch Signaling Message
2012-07-21 17:24:22:759797 Add Session <62887642d33711e1>
2012-07-21 17:24:22:759817 Receive Offer 0x7fde10001938 <62887642d33711e1>
2012-07-21 17:24:22:759892 Add Control Channel 0x7fde10001938
62887642d33711e1@speechrecog
2012-07-21 17:24:22:759915 Signal Message to
2012-07-21 17:24:22:759944 Add Media Termination 0x7fde10001938
62887642d33711e1@rtp-tm
2012-07-21 17:24:22:759963 Signal Message to
2012-07-21 17:24:22:759976 Wait for Messages
2012-07-21 17:24:22:760011 Process Poller Wakeup
2012-07-21 17:24:22:760029 Process Message
2012-07-21 17:24:22:760047 Create Container for Pending Control Channels
2012-07-21 17:24:22:760071 Add Pending Control Channel
62887642d33711e1@speechrecog
2012-07-21 17:24:22:760086 Signal Message to
2012-07-21 17:24:22:760103 Wait for Messages
2012-07-21 17:24:22:760121 Process Message
2012-07-21 17:24:22:760171 Control Channel Modified 0x7fde10001938
62887642d33711e1@speechrecog
2012-07-21 17:24:22:760188 Wait for Messages
2012-07-21 17:24:22:761129 Process Message
2012-07-21 17:24:22:761191 Add Media Context 0x7fde10001938
2012-07-21 17:24:22:761355 Enable RTP Session 10.100.77.182:5000
2012-07-21 17:24:22:761399 Create Linear Audio Bridge 0x7fde10001938
2012-07-21 17:24:22:761427 Open RTP Receiver 10.100.77.182:5000 <-
10.100.77.182:4020 playout bounds adaptive skew detection
2012-07-21 17:24:22:761446 Media Path 0x7fde10001938
Source->->Decoder->->Bridge->->Sink
2012-07-21 17:24:22:761464 Signal Message to
2012-07-21 17:24:22:761575 Process Message
2012-07-21 17:24:22:761603 Media Termination Modified 0x7fde10001938
62887642d33711e1@media-tm
2012-07-21 17:24:22:761616 Media Termination Modified 0x7fde10001938
62887642d33711e1@rtp-tm
2012-07-21 17:24:22:761627 Open Channel 62887642d33711e1@pocketsphinx
2012-07-21 17:24:22:761688 Wait for Messages
2012-07-21 17:24:22:761724 Run Recognition Thread
62887642d33711e1@pocketsphinx
2012-07-21 17:24:22:761770 Signal Message to
2012-07-21 17:24:22:761803 Process Message
2012-07-21 17:24:22:761819 Engine Channel Opened 0x7fde10001938
62887642d33711e1@speechrecog
2012-07-21 17:24:22:761831 Send Answer 0x7fde10001938 <62887642d33711e1>
Status OK
2012-07-21 17:24:22:761862 Local SDP 0x7fde10001938 <62887642d33711e1>
v=0
o=UniMRCPServer 0 0 IN IP4 10.100.77.182
s=-
c=IN IP4 10.100.77.182
t=0 0
m=application 1544 TCP/MRCPv2 1
a=setup:passive
a=connection:new
a=channel:62887642d33711e1@speechrecog
a=cmid:1
m=audio 5000 RTP/AVP 0 101
a=rtpmap:0 PCMU/8000
a=rtpmap:101 telephone-event/8000
a=fmtp:101 0-15
a=recvonly
a=ptime:20
a=mid:1
2012-07-21 17:24:22:761930 Wait for Messages
2012-07-21 17:24:22:762848 Receive SIP Event Status 200 OK
2012-07-21 17:24:22:762882 SIP Call State 0x7fde10001938
2012-07-21 17:24:22:763242 Process Signalled Descriptor
2012-07-21 17:24:22:763316 Accepted TCP/MRCPv2 Connection 10.100.77.182:1544
<-> 10.100.77.182:42357
2012-07-21 17:24:22:763349 Wait for Messages
2012-07-21 17:24:22:763697 Receive SIP Event Status 200 OK
2012-07-21 17:24:22:763728 Receive SIP Event Status 200 OK
2012-07-21 17:24:22:763742 SIP Call State 0x7fde10001938
2012-07-21 17:24:22:763754 Receive SIP Event Status 200 Call active
2012-07-21 17:24:22:763771 Wait for incoming messages
62887642d33711e1@pocketsphinx
2012-07-21 17:24:22:767385 Process Signalled Descriptor
2012-07-21 17:24:22:767426 Receive MRCPv2 Stream 10.100.77.182:1544 <->
10.100.77.182:42357
MRCP/2.0 132 SET-PARAMS 1
Channel-Identifier: 62887642d33711e1@speechrecog
Recognition-Timeout: 20000
No-Input-Timeout: 15000
2012-07-21 17:24:22:767488 Attach Control Channel
62887642d33711e1@speechrecog to Connection 10.100.77.182:1544 <->
10.100.77.182:42357
2012-07-21 17:24:22:767508 Signal Message to
2012-07-21 17:24:22:767525 Wait for Messages
2012-07-21 17:24:22:767544 Process Message
2012-07-21 17:24:22:767558 Dispatch Signaling Message
2012-07-21 17:24:22:767569 Process SET-PARAMS Request
62887642d33711e1@speechrecog
2012-07-21 17:24:22:767591 Wait for Messages
2012-07-21 17:24:22:767610 Dispatch Request SET-PARAMS
62887642d33711e1@pocketsphinx
2012-07-21 17:24:22:767624 Signal Message to
2012-07-21 17:24:22:767645 Process Message
2012-07-21 17:24:22:767658 Process SET-PARAMS Response
62887642d33711e1@speechrecog
2012-07-21 17:24:22:767671 Signal Message to
2012-07-21 17:24:22:767687 Wait for Messages
2012-07-21 17:24:22:767706 Process Poller Wakeup
2012-07-21 17:24:22:767718 Process Message
2012-07-21 17:24:22:767741 Send MRCPv2 Stream 10.100.77.182:1544 <->
10.100.77.182:42357
MRCP/2.0 80 1 200 COMPLETE
Channel-Identifier: 62887642d33711e1@speechrecog
2012-07-21 17:24:22:767783 Wait for Messages
2012-07-21 17:24:22:768128 Process Signalled Descriptor
2012-07-21 17:24:22:768194 Receive MRCPv2 Stream 10.100.77.182:1544 <->
10.100.77.182:42357
MRCP/2.0 258 DEFINE-GRAMMAR 2
Channel-Identifier: 62887642d33711e1@speechrecog
Content-Type: application/x-jsgf
Content-Id: digit
Content-Length: 101
JSGF V1.0;
grammar digits;
public <numbers> = ( Õ¡ÕµÕ¸ | Õ¸Õ¹ | Õ´Õ¥Õ¯ | Õ¥Ö€Õ¯Õ¸Ö‚ | Õ¥Ö€Õ¥Ö„); </numbers>
2012-07-21 17:24:22:768243 Signal Message to
2012-07-21 17:24:22:768266 Wait for Messages
2012-07-21 17:24:22:768391 Process Message
2012-07-21 17:24:22:768415 Dispatch Signaling Message
2012-07-21 17:24:22:768426 Process DEFINE-GRAMMAR Request
62887642d33711e1@speechrecog
2012-07-21 17:24:22:768447 Wait for Messages
2012-07-21 17:24:22:768463 Wait for incoming messages
62887642d33711e1@pocketsphinx
2012-07-21 17:24:22:768486 Dispatch Request DEFINE-GRAMMAR
62887642d33711e1@pocketsphinx
2012-07-21 17:24:22:768517 Create Grammar File 62887642d33711e1@pocketsphinx
2012-07-21 17:24:22:768588 Init Config rate dictionary
62887642d33711e1@pocketsphinx
INFO: cmd_ln.c(691): Parsing command line:
\
-samprate 8000 \
-hmm /usr/local/unimrcp/data/arm/AC \
-jsgf ../data/62887642d33711e1-digit.gram \
-dict /usr/local/unimrcp/data/arm/arm8khzR1.dic \
-frate 50 \
-silprob 0.005
Current configuration:
-agc none none
-agcthresh 2.0 2.000000e+00
-alpha 0.97 9.700000e-01
-ascale 20.0 2.000000e+01
-aw 1 1
-backtrace no no
-beam 1e-48 1.000000e-48
-bestpath yes yes
-bestpathlw 9.5 9.500000e+00
-bghist no no
-ceplen 13 13
-cmn current current
-cmninit 8.0 8.0
-compallsen no no
-debug 0
-dict /usr/local/unimrcp/data/arm/arm8khzR1.dic
-dictcase no no
-dither no no
-doublebw no no
-ds 1 1
-fdict
-feat 1s_c_d_dd 1s_c_d_dd
-featparams
-fillprob 1e-8 1.000000e-08
-frate 100 50
-fsg
-fsgusealtpron yes yes
-fsgusefiller yes yes
-fwdflat yes yes
-fwdflatbeam 1e-64 1.000000e-64
-fwdflatefwid 4 4
-fwdflatlw 8.5 8.500000e+00
-fwdflatsfwin 25 25
-fwdflatwbeam 7e-29 7.000000e-29
-fwdtree yes yes
-hmm /usr/local/unimrcp/data/arm/AC
-input_endian little little
-jsgf ../data/62887642d33711e1-digit.gram
-kdmaxbbi -1 -1
-kdmaxdepth 0 0
-kdtree
-latsize 5000 5000
-lda
-ldadim 0 0
-lextreedump 0 0
-lifter 0 0
-lm
-lmctl
-lmname default default
-logbase 1.0001 1.000100e+00
-logfn
-logspec no no
-lowerf 133.33334 1.333333e+02
-lpbeam 1e-40 1.000000e-40
-lponlybeam 7e-29 7.000000e-29
-lw 6.5 6.500000e+00
-maxhmmpf -1 -1
-maxnewoov 20 20
-maxwpf -1 -1
-mdef
-mean
-mfclogdir
-min_endfr 0 0
-mixw
-mixwfloor 0.0000001 1.000000e-07
-mllr
-mmap yes yes
-ncep 13 13
-nfft 512 512
-nfilt 40 40
-nwpen 1.0 1.000000e+00
-pbeam 1e-48 1.000000e-48
-pip 1.0 1.000000e+00
-pl_beam 1e-10 1.000000e-10
-pl_pbeam 1e-5 1.000000e-05
-pl_window 0 0
-rawlogdir
-remove_dc no no
-round_filters yes yes
-samprate 16000 8.000000e+03
-seed -1 -1
-sendump
-senlogdir
-senmgau
-silprob 0.005 5.000000e-03
-smoothspec no no
-svspec
-tmat
-tmatfloor 0.0001 1.000000e-04
-topn 4 4
-topn_beam 0 0
-toprule
-transform legacy legacy
-unit_area yes yes
-upperf 6855.4976 6.855498e+03
-usewdphones no no
-uw 1.0 1.000000e+00
-var
-varfloor 0.0001 1.000000e-04
-varnorm no no
-verbose no no
-warp_params
-warp_type inverse_linear inverse_linear
-wbeam 7e-29 7.000000e-29
-wip 0.65 6.500000e-01
-wlen 0.025625 2.562500e-02
2012-07-21 17:24:22:770706 Init Decoder 62887642d33711e1@pocketsphinx
INFO: cmd_ln.c(691): Parsing command line:
\
-nfilt 31 \
-lowerf 200 \
-upperf 3500 \
-feat 1s_c_d_dd \
-agc none \
-cmn current \
-varnorm no
Current configuration:
-agc none none
-agcthresh 2.0 2.000000e+00
-alpha 0.97 9.700000e-01
-ceplen 13 13
-cmn current current
-cmninit 8.0 8.0
-dither no no
-doublebw no no
-feat 1s_c_d_dd 1s_c_d_dd
-frate 100 50
-input_endian little little
-lda
-ldadim 0 0
-lifter 0 0
-logspec no no
-lowerf 133.33334 2.000000e+02
-ncep 13 13
-nfft 512 512
-nfilt 40 31
-remove_dc no no
-round_filters yes yes
-samprate 16000 8.000000e+03
-seed -1 -1
-smoothspec no no
-svspec
-transform legacy legacy
-unit_area yes yes
-upperf 6855.4976 3.500000e+03
-varnorm no no
-verbose no no
-warp_params
-warp_type inverse_linear inverse_linear
-wlen 0.025625 2.562500e-02
INFO: acmod.c(242): Parsed model-specific feature parameters from
/usr/local/unimrcp/data/arm/AC/feat.params
INFO: feat.c(713): Initializing feature stream to type: '1s_c_d_dd',
ceplen=13, CMN='current', VARNORM='no', AGC='none'
INFO: cmn.c(142): mean= 12.00, mean= 0.0
INFO: mdef.c(520): Reading model definition:
/usr/local/unimrcp/data/arm/AC/mdef
INFO: bin_mdef.c(173): Allocating 24697 * 8 bytes (192 KiB) for CD tree
INFO: tmat.c(205): Reading HMM transition probability matrices:
/usr/local/unimrcp/data/arm/AC/transition_matrices
INFO: acmod.c(117): Attempting to use SCHMM computation module
INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
/usr/local/unimrcp/data/arm/AC/means
INFO: ms_gauden.c(292): 1117 codebook, 1 feature, size:
INFO: ms_gauden.c(294): 8x39
INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
/usr/local/unimrcp/data/arm/AC/variances
INFO: ms_gauden.c(292): 1117 codebook, 1 feature, size:
INFO: ms_gauden.c(294): 8x39
INFO: ms_gauden.c(354): 28666 variance values floored
INFO: acmod.c(119): Attempting to use PTHMM computation module
INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
/usr/local/unimrcp/data/arm/AC/means
INFO: ms_gauden.c(292): 1117 codebook, 1 feature, size:
INFO: ms_gauden.c(294): 8x39
INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
/usr/local/unimrcp/data/arm/AC/variances
INFO: ms_gauden.c(292): 1117 codebook, 1 feature, size:
INFO: ms_gauden.c(294): 8x39
INFO: ms_gauden.c(354): 28666 variance values floored
INFO: ptm_mgau.c(800): Number of codebooks exceeds 256: 1117
INFO: acmod.c(121): Falling back to general multi-stream GMM computation
INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
/usr/local/unimrcp/data/arm/AC/means
INFO: ms_gauden.c(292): 1117 codebook, 1 feature, size:
INFO: ms_gauden.c(294): 8x39
INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
/usr/local/unimrcp/data/arm/AC/variances
INFO: ms_gauden.c(292): 1117 codebook, 1 feature, size:
INFO: ms_gauden.c(294): 8x39
INFO: ms_gauden.c(354): 28666 variance values floored
INFO: ms_senone.c(160): Reading senone mixture weights:
/usr/local/unimrcp/data/arm/AC/mixture_weights
INFO: ms_senone.c(211): Truncating senone logs3(pdf) values by 10 bits
INFO: ms_senone.c(218): Not transposing mixture weights in memory
INFO: ms_senone.c(277): Read mixture weights for 1117 senones: 1 features x 8
codewords
INFO: ms_senone.c(331): Mapping senones to individual codebooks
INFO: ms_mgau.c(122): The value of topn: 4
INFO: dict.c(306): Allocating 5134 * 32 bytes (160 KiB) for word entries
INFO: dict.c(321): Reading main dictionary:
/usr/local/unimrcp/data/arm/arm8khzR1.dic
INFO: dict.c(212): Allocated 15 KiB for strings, 15 KiB for phones
INFO: dict.c(324): 1035 words read
INFO: dict.c(330): Reading filler dictionary:
/usr/local/unimrcp/data/arm/AC/noisedict
INFO: dict.c(212): Allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(333): 3 words read
INFO: dict2pid.c(396): Building PID tables for dictionary
INFO: dict2pid.c(404): Allocating 39^3 * 2 bytes (115 KiB) for word-initial
triphones
INFO: dict2pid.c(131): Allocated 36816 bytes (35 KiB) for word-final triphones
INFO: dict2pid.c(195): Allocated 36816 bytes (35 KiB) for single-phone word
triphones
INFO: fsg_search.c(145): FSG(beam: -1080, pbeam: -1080, wbeam: -634; wip: -26,
pip: 0)
INFO: jsgf.c(583): Defined rule: <digits.g00000>
INFO: jsgf.c(583): Defined rule: PUBLIC <digits.numbers>
INFO: fsg_model.c(215): Computing transitive closure for null transitions
INFO: fsg_model.c(270): 5 null transitions added
INFO: fsg_model.c(421): Adding silence transitions for <sil> to FSG
INFO: fsg_model.c(441): Added 9 silence word transitions
INFO: fsg_model.c(421): Adding silence transitions for
to FSGto FSGINFO: fsg_model.c(441): Added 9 silence word transitions
INFO: fsg_model.c(421): Adding silence transitions for <sil> to FSG
INFO: fsg_model.c(441): Added 9 silence word transitions
INFO: fsg_model.c(421): Adding silence transitions for </sil>
INFO: fsg_model.c(441): Added 9 silence word transitions
INFO: fsg_search.c(364): Added 2 alternate word transitions
INFO: fsg_lextree.c(108): Allocated 720 bytes (0 KiB) for left and right
context phones
INFO: fsg_lextree.c(251): 60 HMM nodes in lextree (43 leaves)
INFO: fsg_lextree.c(253): Allocated 7680 bytes (7 KiB) for all lextree nodes
INFO: fsg_lextree.c(256): Allocated 5504 bytes (5 KiB) for lextree leafnodes
2012-07-21 17:24:22:976732 Signal Message to
2012-07-21 17:24:22:976774 Process Message
2012-07-21 17:24:22:976785 Process DEFINE-GRAMMAR Response
62887642d33711e1@speechrecog
2012-07-21 17:24:22:976795 Signal Message to
2012-07-21 17:24:22:976809 Wait for Messages
2012-07-21 17:24:22:976825 Process Poller Wakeup
2012-07-21 17:24:22:976834 Process Message
2012-07-21 17:24:22:976851 Send MRCPv2 Stream 10.100.77.182:1544 <->
10.100.77.182:42357
MRCP/2.0 112 2 200 COMPLETE
Channel-Identifier: 62887642d33711e1@speechrecog
Completion-Cause: 000 success </sil></digits.numbers></digits.g00000>
2012-07-21 17:24:22:976877 Wait for Messages
2012-07-21 17:24:22:977466 Process Signalled Descriptor
2012-07-21 17:24:22:977487 Receive MRCPv2 Stream 10.100.77.182:1544 <->
10.100.77.182:42357
MRCP/2.0 190 RECOGNIZE 3
Channel-Identifier: 62887642d33711e1@speechrecog
Content-Type: text/uri-list
Cancel-If-Queue: false
Start-Input-Timers: true
Content-Length: 13
session:digit
2012-07-21 17:24:22:977520 Signal Message to
2012-07-21 17:24:22:977533 Wait for Messages
2012-07-21 17:24:22:977546 Process Message
2012-07-21 17:24:22:977555 Dispatch Signaling Message
2012-07-21 17:24:22:977562 Process RECOGNIZE Request
62887642d33711e1@speechrecog
2012-07-21 17:24:22:977578 Wait for Messages
2012-07-21 17:24:22:977590 Wait for incoming messages
62887642d33711e1@pocketsphinx
2012-07-21 17:24:22:977599 Dispatch Request RECOGNIZE
62887642d33711e1@pocketsphinx
2012-07-21 17:24:22:977643 Open Waveform File 62887642d33711e1@pocketsphinx
2012-07-21 17:24:22:977702 Signal Message to
2012-07-21 17:24:22:977722 Process Message
2012-07-21 17:24:22:977732 Process RECOGNIZE Response
62887642d33711e1@speechrecog
2012-07-21 17:24:22:977740 State Transition IDLE -> RECOGNIZING
62887642d33711e1@speechrecog
2012-07-21 17:24:22:977748 Signal Message to
2012-07-21 17:24:22:977759 Wait for Messages
2012-07-21 17:24:22:977772 Process Poller Wakeup
2012-07-21 17:24:22:977780 Process Message
2012-07-21 17:24:22:977791 Send MRCPv2 Stream 10.100.77.182:1544 <->
10.100.77.182:42357
MRCP/2.0 83 3 200 IN-PROGRESS
Channel-Identifier: 62887642d33711e1@speechrecog
2012-07-21 17:24:22:977812 Wait for Messages
2012-07-21 17:24:22:977953 Wait for incoming messages
62887642d33711e1@pocketsphinx
2012-07-21 17:24:23:641226 Detected Voice Activity
62887642d33711e1@pocketsphinx
2012-07-21 17:24:23:641258 Signal Message to
2012-07-21 17:24:23:641285 Process Message
2012-07-21 17:24:23:641300 Process START-OF-INPUT Event
62887642d33711e1@speechrecog
2012-07-21 17:24:23:641312 Signal Message to
2012-07-21 17:24:23:641329 Wait for Messages
2012-07-21 17:24:23:641349 Process Poller Wakeup
2012-07-21 17:24:23:641370 Process Message
2012-07-21 17:24:23:641388 Send MRCPv2 Stream 10.100.77.182:1544 <->
10.100.77.182:42357
MRCP/2.0 94 START-OF-INPUT 3 IN-PROGRESS
Channel-Identifier: 62887642d33711e1@speechrecog
2012-07-21 17:24:23:641426 Wait for Messages
2012-07-21 17:24:23:661228 Detected Voice Inactivity
62887642d33711e1@pocketsphinx
INFO: cmn_prior.c(121): cmn_prior_update: from < 8.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 >
INFO: cmn_prior.c(139): cmn_prior_update: to < -4.86 0.57 0.15 0.11 0.05 0.08
0.04 0.07 0.07 0.06 0.05 0.07 0.05 >
INFO: fsg_search.c(1030): 35 frames, 168 HMMs (4/fr), 377 senones (10/fr), 35
history entries (1/fr)
ERROR: "fsg_search.c", line 1099: Final state not reached in frame 35
2012-07-21 17:24:23:661487 Signal Message to
2012-07-21 17:24:23:661512 Wait for incoming messages
62887642d33711e1@pocketsphinx
2012-07-21 17:24:23:661530 Process Message
2012-07-21 17:24:23:661543 Process RECOGNITION-COMPLETE Event
62887642d33711e1@speechrecog
2012-07-21 17:24:23:661554 State Transition RECOGNIZING -> RECOGNIZED
62887642d33711e1@speechrecog
2012-07-21 17:24:23:661566 Signal Message to
2012-07-21 17:24:23:661582 Wait for Messages
2012-07-21 17:24:23:661601 Process Poller Wakeup
2012-07-21 17:24:23:661613 Process Message
2012-07-21 17:24:23:661629 Send MRCPv2 Stream 10.100.77.182:1544 <->
10.100.77.182:42357
MRCP/2.0 129 RECOGNITION-COMPLETE 3 COMPLETE
Channel-Identifier: 62887642d33711e1@speechrecog
Completion-Cause: 000 success
2012-07-21 17:24:23:661666 Wait for Messages
2012-07-21 17:24:23:674987 Process Signalled Descriptor
2012-07-21 17:24:23:675017 Receive MRCPv2 Stream 10.100.77.182:1544 <->
10.100.77.182:42357
MRCP/2.0 190 RECOGNIZE 4
Channel-Identifier: 62887642d33711e1@speechrecog
Content-Type: text/uri-list
Cancel-If-Queue: false
Start-Input-Timers: true
Content-Length: 13
session:digit
2012-07-21 17:24:23:675053 Signal Message to
2012-07-21 17:24:23:675072 Wait for Messages
2012-07-21 17:24:23:675090 Process Message
2012-07-21 17:24:23:675103 Dispatch Signaling Message
2012-07-21 17:24:23:675114 Process RECOGNIZE Request
62887642d33711e1@speechrecog
2012-07-21 17:24:23:675129 Wait for Messages
2012-07-21 17:24:23:675308 Dispatch Request RECOGNIZE
62887642d33711e1@pocketsphinx
2012-07-21 17:24:23:675437 Open Waveform File 62887642d33711e1@pocketsphinx
2012-07-21 17:24:23:675494 Signal Message to
2012-07-21 17:24:23:675516 Wait for incoming messages
62887642d33711e1@pocketsphinx
2012-07-21 17:24:23:675534 Process Message
2012-07-21 17:24:23:675547 Process RECOGNIZE Response
62887642d33711e1@speechrecog
2012-07-21 17:24:23:675558 State Transition RECOGNIZED -> RECOGNIZING
62887642d33711e1@speechrecog
2012-07-21 17:24:23:675569 Signal Message to
2012-07-21 17:24:23:675586 Wait for Messages
2012-07-21 17:24:23:675606 Process Poller Wakeup
2012-07-21 17:24:23:675618 Process Message
2012-07-21 17:24:23:675633 Send MRCPv2 Stream 10.100.77.182:1544 <->
10.100.77.182:42357
MRCP/2.0 83 4 200 IN-PROGRESS
Channel-Identifier: 62887642d33711e1@speechrecog
2012-07-21 17:24:23:675666 Wait for Messages
2012-07-21 17:24:23:771260 Detected Voice Activity
62887642d33711e1@pocketsphinx
2012-07-21 17:24:23:771288 Signal Message to
2012-07-21 17:24:23:771314 Process Message
2012-07-21 17:24:23:771328 Process START-OF-INPUT Event
62887642d33711e1@speechrecog
2012-07-21 17:24:23:771349 Signal Message to
2012-07-21 17:24:23:771366 Wait for Messages
2012-07-21 17:24:23:771386 Process Poller Wakeup
2012-07-21 17:24:23:771399 Process Message
2012-07-21 17:24:23:771414 Send MRCPv2 Stream 10.100.77.182:1544 <->
10.100.77.182:42357
MRCP/2.0 94 START-OF-INPUT 4 IN-PROGRESS
Channel-Identifier: 62887642d33711e1@speechrecog
2012-07-21 17:24:23:771450 Wait for Messages
2012-07-21 17:24:24:021258 Detected Voice Inactivity
62887642d33711e1@pocketsphinx
INFO: cmn_prior.c(121): cmn_prior_update: from < -4.86 0.57 0.15 0.11 0.05
0.08 0.04 0.07 0.07 0.06 0.05 0.07 0.05 >
INFO: cmn_prior.c(139): cmn_prior_update: to < 1.40 0.16 0.02 0.16 -0.18 -0.09
-0.02 -0.02 -0.10 -0.03 -0.03 0.02 -0.03 >
INFO: fsg_search.c(1030): 18 frames, 69 HMMs (3/fr), 145 senones (8/fr), 11
history entries (0/fr)
ERROR: "fsg_search.c", line 1099: Final state not reached in frame 18
2012-07-21 17:24:24:021494 Signal Message to
2012-07-21 17:24:24:021518 Wait for incoming messages
62887642d33711e1@pocketsphinx
2012-07-21 17:24:24:021536 Process Message
2012-07-21 17:24:24:021549 Process RECOGNITION-COMPLETE Event
62887642d33711e1@speechrecog
2012-07-21 17:24:24:021560 State Transition RECOGNIZING -> RECOGNIZED
62887642d33711e1@speechrecog
2012-07-21 17:24:24:021572 Signal Message to
2012-07-21 17:24:24:021588 Wait for Messages
2012-07-21 17:24:24:021607 Process Poller Wakeup
2012-07-21 17:24:24:021620 Process Message
2012-07-21 17:24:24:021635 Send MRCPv2 Stream 10.100.77.182:1544 <->
10.100.77.182:42357
MRCP/2.0 129 RECOGNITION-COMPLETE 4 COMPLETE
Channel-Identifier: 62887642d33711e1@speechrecog
Completion-Cause: 000 success
2012-07-21 17:24:24:021674 Wait for Messages
2012-07-21 17:24:24:034983 Process Signalled Descriptor
2012-07-21 17:24:24:035012 Receive MRCPv2 Stream 10.100.77.182:1544 <->
10.100.77.182:42357
MRCP/2.0 190 RECOGNIZE 5
Channel-Identifier: 62887642d33711e1@speechrecog
Content-Type: text/uri-list
Cancel-If-Queue: false
Start-Input-Timers: true
Content-Length: 13
session:digit
2012-07-21 17:24:24:035060 Signal Message to
2012-07-21 17:24:24:035084 Wait for Messages
2012-07-21 17:24:24:035102 Process Message
2012-07-21 17:24:24:035116 Dispatch Signaling Message
2012-07-21 17:24:24:035126 Process RECOGNIZE Request
62887642d33711e1@speechrecog
2012-07-21 17:24:24:035182 Wait for Messages
2012-07-21 17:24:24:035322 Dispatch Request RECOGNIZE
62887642d33711e1@pocketsphinx
2012-07-21 17:24:24:035377 Open Waveform File 62887642d33711e1@pocketsphinx
2012-07-21 17:24:24:035435 Signal Message to
2012-07-21 17:24:24:035458 Wait for incoming messages
62887642d33711e1@pocketsphinx
2012-07-21 17:24:24:035477 Process Message
2012-07-21 17:24:24:035489 Process RECOGNIZE Response
62887642d33711e1@speechrecog
2012-07-21 17:24:24:035500 State Transition RECOGNIZED -> RECOGNIZING
62887642d33711e1@speechrecog
2012-07-21 17:24:24:035512 Signal Message to
2012-07-21 17:24:24:035528 Wait for Messages
2012-07-21 17:24:24:035547 Process Poller Wakeup
2012-07-21 17:24:24:035559 Process Message
2012-07-21 17:24:24:035575 Send MRCPv2 Stream 10.100.77.182:1544 <->
10.100.77.182:42357
MRCP/2.0 83 5 200 IN-PROGRESS
Channel-Identifier: 62887642d33711e1@speechrecog
2012-07-21 17:24:24:035608 Wait for Messages
2012-07-21 17:24:25:341272 Detected Voice Activity
62887642d33711e1@pocketsphinx
2012-07-21 17:24:25:341310 Signal Message to
2012-07-21 17:24:25:341346 Process Message
2012-07-21 17:24:25:341361 Process START-OF-INPUT Event
62887642d33711e1@speechrecog
2012-07-21 17:24:25:341374 Signal Message to
2012-07-21 17:24:25:341392 Wait for Messages
2012-07-21 17:24:25:341412 Process Poller Wakeup
2012-07-21 17:24:25:341424 Process Message
2012-07-21 17:24:25:341441 Send MRCPv2 Stream 10.100.77.182:1544 <->
10.100.77.182:42357
MRCP/2.0 94 START-OF-INPUT 5 IN-PROGRESS
Channel-Identifier: 62887642d33711e1@speechrecog
2012-07-21 17:24:25:341481 Wait for Messages
2012-07-21 17:24:25:481239 Detected Voice Inactivity
62887642d33711e1@pocketsphinx
INFO: cmn_prior.c(121): cmn_prior_update: from < 1.40 0.16 0.02 0.16 -0.18
-0.09 -0.02 -0.02 -0.10 -0.03 -0.03 0.02 -0.03 >
INFO: cmn_prior.c(139): cmn_prior_update: to < 4.29 -0.31 -0.05 -0.01 -0.23
-0.10 -0.09 -0.11 -0.09 -0.07 -0.07 -0.02 -0.06 >
INFO: fsg_search.c(1030): 73 frames, 716 HMMs (9/fr), 1781 senones (24/fr),
185 history entries (2/fr)
ERROR: "fsg_search.c", line 1099: Final state not reached in frame 73
2012-07-21 17:24:25:481540 Signal Message to
2012-07-21 17:24:25:481565 Wait for incoming messages
62887642d33711e1@pocketsphinx
2012-07-21 17:24:25:481584 Process Message
2012-07-21 17:24:25:481596 Process RECOGNITION-COMPLETE Event
62887642d33711e1@speechrecog
2012-07-21 17:24:25:481608 State Transition RECOGNIZING -> RECOGNIZED
62887642d33711e1@speechrecog
2012-07-21 17:24:25:481620 Signal Message to
2012-07-21 17:24:25:481636 Wait for Messages
2012-07-21 17:24:25:481656 Process Poller Wakeup
2012-07-21 17:24:25:481668 Process Message
2012-07-21 17:24:25:481683 Send MRCPv2 Stream 10.100.77.182:1544 <->
10.100.77.182:42357
MRCP/2.0 129 RECOGNITION-COMPLETE 5 COMPLETE
Channel-Identifier: 62887642d33711e1@speechrecog
Completion-Cause: 000 success
2012-07-21 17:24:25:481720 Wait for Messages
2012-07-21 17:24:25:494852 Process Signalled Descriptor
2012-07-21 17:24:25:494882 Receive MRCPv2 Stream 10.100.77.182:1544 <->
10.100.77.182:42357
MRCP/2.0 190 RECOGNIZE 6
Channel-Identifier: 62887642d33711e1@speechrecog
Content-Type: text/uri-list
Cancel-If-Queue: false
Start-Input-Timers: true
Content-Length: 13
session:digit
2012-07-21 17:24:25:494919 Signal Message to
2012-07-21 17:24:25:494938 Wait for Messages
2012-07-21 17:24:25:494956 Process Message
2012-07-21 17:24:25:494969 Dispatch Signaling Message
2012-07-21 17:24:25:494980 Process RECOGNIZE Request
62887642d33711e1@speechrecog
2012-07-21 17:24:25:494996 Wait for Messages
2012-07-21 17:24:25:495015 Dispatch Request RECOGNIZE
62887642d33711e1@pocketsphinx
2012-07-21 17:24:25:495067 Open Waveform File 62887642d33711e1@pocketsphinx
2012-07-21 17:24:25:495167 Signal Message to
2012-07-21 17:24:25:495195 Wait for incoming messages
62887642d33711e1@pocketsphinx
2012-07-21 17:24:25:495224 Process Message
2012-07-21 17:24:25:495238 Process RECOGNIZE Response
62887642d33711e1@speechrecog
2012-07-21 17:24:25:495249 State Transition RECOGNIZED -> RECOGNIZING
62887642d33711e1@speechrecog
2012-07-21 17:24:25:495261 Signal Message to
2012-07-21 17:24:25:495277 Wait for Messages
2012-07-21 17:24:25:495409 Process Poller Wakeup
2012-07-21 17:24:25:495432 Process Message
2012-07-21 17:24:25:495449 Send MRCPv2 Stream 10.100.77.182:1544 <->
10.100.77.182:42357
MRCP/2.0 83 6 200 IN-PROGRESS
Channel-Identifier: 62887642d33711e1@speechrecog
2012-07-21 17:24:25:495481 Wait for Messages
2012-07-21 17:24:25:611253 Detected Voice Activity
62887642d33711e1@pocketsphinx
2012-07-21 17:24:25:611282 Signal Message to
2012-07-21 17:24:25:611309 Process Message
2012-07-21 17:24:25:611323 Process START-OF-INPUT Event
62887642d33711e1@speechrecog
2012-07-21 17:24:25:611335 Signal Message to
2012-07-21 17:24:25:611351 Wait for Messages
2012-07-21 17:24:25:611371 Process Poller Wakeup
2012-07-21 17:24:25:611383 Process Message
2012-07-21 17:24:25:611399 Send MRCPv2 Stream 10.100.77.182:1544 <->
10.100.77.182:42357
MRCP/2.0 94 START-OF-INPUT 6 IN-PROGRESS
Channel-Identifier: 62887642d33711e1@speechrecog
2012-07-21 17:24:25:611434 Wait for Messages
2012-07-21 17:24:25:731254 Detected Voice Inactivity
62887642d33711e1@pocketsphinx
INFO: cmn_prior.c(121): cmn_prior_update: from < 4.29 -0.31 -0.05 -0.01 -0.23
-0.10 -0.09 -0.11 -0.09 -0.07 -0.07 -0.02 -0.06 >
INFO: cmn_prior.c(139): cmn_prior_update: to < 4.75 -0.25 -0.04 -0.05 -0.25
-0.07 -0.09 -0.13 -0.11 -0.09 -0.08 -0.02 -0.09 >
INFO: fsg_search.c(1030): 12 frames, 69 HMMs (5/fr), 164 senones (13/fr), 11
history entries (0/fr)
ERROR: "fsg_search.c", line 1099: Final state not reached in frame 12
2012-07-21 17:24:25:731756 Signal Message to
2012-07-21 17:24:25:731783 Wait for incoming messages
62887642d33711e1@pocketsphinx
2012-07-21 17:24:25:731802 Process Message
2012-07-21 17:24:25:731815 Process RECOGNITION-COMPLETE Event
62887642d33711e1@speechrecog
2012-07-21 17:24:25:731827 State Transition RECOGNIZING -> RECOGNIZED
62887642d33711e1@speechrecog
2012-07-21 17:24:25:731839 Signal Message to
2012-07-21 17:24:25:731855 Wait for Messages
2012-07-21 17:24:25:731875 Process Poller Wakeup
2012-07-21 17:24:25:731888 Process Message
2012-07-21 17:24:25:731903 Send MRCPv2 Stream 10.100.77.182:1544 <->
10.100.77.182:42357
MRCP/2.0 129 RECOGNITION-COMPLETE 6 COMPLETE
Channel-Identifier: 62887642d33711e1@speechrecog
Completion-Cause: 000 success
2012-07-21 17:24:25:731943 Wait for Messages
2012-07-21 17:24:25:734692 Process Signalled Descriptor
2012-07-21 17:24:25:734722 Receive MRCPv2 Stream 10.100.77.182:1544 <->
10.100.77.182:42357
MRCP/2.0 190 RECOGNIZE 7
Channel-Identifier: 62887642d33711e1@speechrecog
Content-Type: text/uri-list
Cancel-If-Queue: false
Start-Input-Timers: true
Content-Length: 13
session:digit
2012-07-21 17:24:25:734759 Signal Message to
2012-07-21 17:24:25:734778 Wait for Messages
2012-07-21 17:24:25:734796 Process Message
2012-07-21 17:24:25:734809 Dispatch Signaling Message
2012-07-21 17:24:25:734819 Process RECOGNIZE Request
62887642d33711e1@speechrecog
2012-07-21 17:24:25:734835 Wait for Messages
2012-07-21 17:24:25:734852 Dispatch Request RECOGNIZE
62887642d33711e1@pocketsphinx
2012-07-21 17:24:25:734898 Open Waveform File 62887642d33711e1@pocketsphinx
2012-07-21 17:24:25:734950 Signal Message to
2012-07-21 17:24:25:734971 Wait for incoming messages
62887642d33711e1@pocketsphinx
2012-07-21 17:24:25:734989 Process Message
2012-07-21 17:24:25:735001 Process RECOGNIZE Response
62887642d33711e1@speechrecog
2012-07-21 17:24:25:735012 State Transition RECOGNIZED -> RECOGNIZING
62887642d33711e1@speechrecog
2012-07-21 17:24:25:735023 Signal Message to
2012-07-21 17:24:25:735040 Wait for Messages
2012-07-21 17:24:25:735059 Process Poller Wakeup
2012-07-21 17:24:25:735071 Process Message
2012-07-21 17:24:25:735095 Send MRCPv2 Stream 10.100.77.182:1544 <->
10.100.77.182:42357
MRCP/2.0 83 7 200 IN-PROGRESS
Channel-Identifier: 62887642d33711e1@speechrecog
2012-07-21 17:24:25:735129 Wait for Messages
2012-07-21 17:24:27:041259 Detected Voice Activity
62887642d33711e1@pocketsphinx
2012-07-21 17:24:27:041322 Signal Message to
2012-07-21 17:24:27:041355 Process Message
2012-07-21 17:24:27:041369 Process START-OF-INPUT Event
62887642d33711e1@speechrecog
2012-07-21 17:24:27:041383 Signal Message to
2012-07-21 17:24:27:041400 Wait for Messages
2012-07-21 17:24:27:041422 Process Poller Wakeup
2012-07-21 17:24:27:041434 Process Message
2012-07-21 17:24:27:041453 Send MRCPv2 Stream 10.100.77.182:1544 <->
10.100.77.182:42357
MRCP/2.0 94 START-OF-INPUT 7 IN-PROGRESS
Channel-Identifier: 62887642d33711e1@speechrecog
2012-07-21 17:24:27:041495 Wait for Messages
2012-07-21 17:24:27:321226 Detected Voice Inactivity
62887642d33711e1@pocketsphinx
INFO: cmn_prior.c(121): cmn_prior_update: from < 4.75 -0.25 -0.04 -0.05 -0.25
-0.07 -0.09 -0.13 -0.11 -0.09 -0.08 -0.02 -0.09 >
INFO: cmn_prior.c(139): cmn_prior_update: to < 5.47 -0.41 -0.02 -0.01 -0.25
-0.11 -0.10 -0.15 -0.12 -0.10 -0.08 -0.05 -0.08 >
INFO: fsg_search.c(1030): 80 frames, 759 HMMs (9/fr), 1940 senones (24/fr),
179 history entries (2/fr)
ERROR: "fsg_search.c", line 1099: Final state not reached in frame 80
2012-07-21 17:24:27:321486 Signal Message to
2012-07-21 17:24:27:321510 Wait for incoming messages
62887642d33711e1@pocketsphinx
2012-07-21 17:24:27:321529 Process Message
2012-07-21 17:24:27:321542 Process RECOGNITION-COMPLETE Event
62887642d33711e1@speechrecog
2012-07-21 17:24:27:321553 State Transition RECOGNIZING -> RECOGNIZED
62887642d33711e1@speechrecog
2012-07-21 17:24:27:321565 Signal Message to
2012-07-21 17:24:27:321581 Wait for Messages
2012-07-21 17:24:27:321601 Process Poller Wakeup
2012-07-21 17:24:27:321613 Process Message
2012-07-21 17:24:27:321629 Send MRCPv2 Stream 10.100.77.182:1544 <->
10.100.77.182:42357
MRCP/2.0 129 RECOGNITION-COMPLETE 7 COMPLETE
Channel-Identifier: 62887642d33711e1@speechrecog
Completion-Cause: 000 success
2012-07-21 17:24:27:321667 Wait for Messages
2012-07-21 17:24:27:335092 Process Signalled Descriptor
2012-07-21 17:24:27:335122 Receive MRCPv2 Stream 10.100.77.182:1544 <->
10.100.77.182:42357
MRCP/2.0 190 RECOGNIZE 8
Channel-Identifier: 62887642d33711e1@speechrecog
Content-Type: text/uri-list
Cancel-If-Queue: false
Start-Input-Timers: true
Content-Length: 13
session:digit
2012-07-21 17:24:27:335199 Signal Message to
2012-07-21 17:24:27:335221 Wait for Messages
2012-07-21 17:24:27:335249 Process Message
2012-07-21 17:24:27:335264 Dispatch Signaling Message
2012-07-21 17:24:27:335275 Process RECOGNIZE Request
62887642d33711e1@speechrecog
2012-07-21 17:24:27:335291 Wait for Messages
2012-07-21 17:24:27:335415 Dispatch Request RECOGNIZE
62887642d33711e1@pocketsphinx
2012-07-21 17:24:27:335484 Open Waveform File 62887642d33711e1@pocketsphinx
2012-07-21 17:24:27:335553 Signal Message to
2012-07-21 17:24:27:335575 Wait for incoming messages
62887642d33711e1@pocketsphinx
2012-07-21 17:24:27:335593 Process Message
2012-07-21 17:24:27:335606 Process RECOGNIZE Response
62887642d33711e1@speechrecog
2012-07-21 17:24:27:335617 State Transition RECOGNIZED -> RECOGNIZING
62887642d33711e1@speechrecog
2012-07-21 17:24:27:335628 Signal Message to
2012-07-21 17:24:27:335645 Wait for Messages
2012-07-21 17:24:27:335673 Process Poller Wakeup
2012-07-21 17:24:27:335686 Process Message
2012-07-21 17:24:27:335702 Send MRCPv2 Stream 10.100.77.182:1544 <->
10.100.77.182:42357
MRCP/2.0 83 8 200 IN-PROGRESS
Channel-Identifier: 62887642d33711e1@speechrecog
2012-07-21 17:24:27:335735 Wait for Messages
2012-07-21 17:24:28:651239 Detected Voice Activity
62887642d33711e1@pocketsphinx
2012-07-21 17:24:28:651287 Signal Message to
2012-07-21 17:24:28:651317 Process Message
2012-07-21 17:24:28:651331 Process START-OF-INPUT Event
62887642d33711e1@speechrecog
2012-07-21 17:24:28:651345 Signal Message to
2012-07-21 17:24:28:651364 Wait for Messages
2012-07-21 17:24:28:651386 Process Poller Wakeup
2012-07-21 17:24:28:651399 Process Message
2012-07-21 17:24:28:651418 Send MRCPv2 Stream 10.100.77.182:1544 <->
10.100.77.182:42357
MRCP/2.0 94 START-OF-INPUT 8 IN-PROGRESS
Channel-Identifier: 62887642d33711e1@speechrecog
2012-07-21 17:24:28:651465 Wait for Messages
2012-07-21 17:24:29:021260 Detected Voice Inactivity
62887642d33711e1@pocketsphinx
INFO: cmn_prior.c(121): cmn_prior_update: from < 5.47 -0.41 -0.02 -0.01 -0.25
-0.11 -0.10 -0.15 -0.12 -0.10 -0.08 -0.05 -0.08 >
INFO: cmn_prior.c(139): cmn_prior_update: to < 5.86 -0.45 -0.05 -0.06 -0.25
-0.12 -0.11 -0.16 -0.11 -0.10 -0.09 -0.04 -0.08 >
INFO: fsg_search.c(1030): 85 frames, 758 HMMs (8/fr), 1860 senones (21/fr),
185 history entries (2/fr)
ERROR: "fsg_search.c", line 1099: Final state not reached in frame 85
2012-07-21 17:24:29:021564 Signal Message to
2012-07-21 17:24:29:021590 Wait for incoming messages
62887642d33711e1@pocketsphinx
2012-07-21 17:24:29:021609 Process Message
2012-07-21 17:24:29:021621 Process RECOGNITION-COMPLETE Event
62887642d33711e1@speechrecog
2012-07-21 17:24:29:021633 State Transition RECOGNIZING -> RECOGNIZED
62887642d33711e1@speechrecog
2012-07-21 17:24:29:021644 Signal Message to
2012-07-21 17:24:29:021661 Wait for Messages
2012-07-21 17:24:29:021680 Process Poller Wakeup
2012-07-21 17:24:29:021693 Process Message
2012-07-21 17:24:29:021709 Send MRCPv2 Stream 10.100.77.182:1544 <->
10.100.77.182:42357
MRCP/2.0 129 RECOGNITION-COMPLETE 8 COMPLETE
Channel-Identifier: 62887642d33711e1@speechrecog
Completion-Cause: 000 success
2012-07-21 17:24:29:021747 Wait for Messages
2012-07-21 17:24:29:034794 Process Signalled Descriptor
2012-07-21 17:24:29:034824 Receive MRCPv2 Stream 10.100.77.182:1544 <->
10.100.77.182:42357
MRCP/2.0 190 RECOGNIZE 9
Channel-Identifier: 62887642d33711e1@speechrecog
Content-Type: text/uri-list
Cancel-If-Queue: false
Start-Input-Timers: true
Content-Length: 13
session:digit
2012-07-21 17:24:29:034864 Signal Message to
2012-07-21 17:24:29:034883 Wait for Messages
2012-07-21 17:24:29:034901 Process Message
2012-07-21 17:24:29:034915 Dispatch Signaling Message
2012-07-21 17:24:29:034925 Process RECOGNIZE Request
62887642d33711e1@speechrecog
2012-07-21 17:24:29:034942 Wait for Messages
2012-07-21 17:24:29:034959 Dispatch Request RECOGNIZE
62887642d33711e1@pocketsphinx
2012-07-21 17:24:29:035018 Open Waveform File 62887642d33711e1@pocketsphinx
2012-07-21 17:24:29:035101 Signal Message to
2012-07-21 17:24:29:035124 Wait for incoming messages
62887642d33711e1@pocketsphinx
2012-07-21 17:24:29:035191 Process Message
2012-07-21 17:24:29:035208 Process RECOGNIZE Response
62887642d33711e1@speechrecog
2012-07-21 17:24:29:035219 State Transition RECOGNIZED -> RECOGNIZING
62887642d33711e1@speechrecog
2012-07-21 17:24:29:035246 Signal Message to
2012-07-21 17:24:29:035267 Wait for Messages
2012-07-21 17:24:29:035401 Process Poller Wakeup
2012-07-21 17:24:29:035424 Process Message
2012-07-21 17:24:29:035441 Send MRCPv2 Stream 10.100.77.182:1544 <->
10.100.77.182:42357
MRCP/2.0 83 9 200 IN-PROGRESS
Channel-Identifier: 62887642d33711e1@speechrecog
2012-07-21 17:24:29:035474 Wait for Messages
2012-07-21 17:24:30:081243 Detected Voice Activity
62887642d33711e1@pocketsphinx
2012-07-21 17:24:30:081290 Signal Message to
2012-07-21 17:24:30:081319 Process Message
2012-07-21 17:24:30:081334 Process START-OF-INPUT Event
62887642d33711e1@speechrecog
2012-07-21 17:24:30:081348 Signal Message to
2012-07-21 17:24:30:081366 Wait for Messages
2012-07-21 17:24:30:081388 Process Poller Wakeup
2012-07-21 17:24:30:081400 Process Message
2012-07-21 17:24:30:081419 Send MRCPv2 Stream 10.100.77.182:1544 <->
10.100.77.182:42357
MRCP/2.0 94 START-OF-INPUT 9 IN-PROGRESS
Channel-Identifier: 62887642d33711e1@speechrecog
2012-07-21 17:24:30:081463 Wait for Messages
2012-07-21 17:24:30:131252 Get Recognition Partial Result Score
62887642d33711e1@pocketsphinx
2012-07-21 17:24:30:211232 Detected Voice Inactivity
62887642d33711e1@pocketsphinx
INFO: cmn_prior.c(121): cmn_prior_update: from < 5.86 -0.45 -0.05 -0.06 -0.25
-0.12 -0.11 -0.16 -0.11 -0.10 -0.09 -0.04 -0.08 >
INFO: cmn_prior.c(139): cmn_prior_update: to < 5.91 -0.44 -0.08 -0.08 -0.24
-0.13 -0.12 -0.17 -0.11 -0.10 -0.08 -0.04 -0.08 >
INFO: fsg_search.c(1030): 59 frames, 511 HMMs (8/fr), 1212 senones (20/fr),
140 history entries (2/fr)
INFO: fsg_search.c(1407): Start node <sil>.0:2:40
INFO: fsg_search.c(1446): End node Õ¸Õ¹.33:49:58 (-655)
INFO: fsg_search.c(1662): lattice start node <sil>.0 end node Õ¸Õ¹.33
INFO: ps_lattice.c(1352): Normalizer P(O) = alpha(Õ¸Õ¹:33:58) = -536870912
WARNING: "fsg_search.c", line 1155: Failed to bestpath in a lattice
2012-07-21 17:24:30:211694 Signal Message to
2012-07-21 17:24:30:211721 Wait for incoming messages
62887642d33711e1@pocketsphinx
2012-07-21 17:24:30:211739 Process Message
2012-07-21 17:24:30:211752 Process RECOGNITION-COMPLETE Event
62887642d33711e1@speechrecog
2012-07-21 17:24:30:211763 State Transition RECOGNIZING -> RECOGNIZED
62887642d33711e1@speechrecog
2012-07-21 17:24:30:211775 Signal Message to
2012-07-21 17:24:30:211792 Wait for Messages
2012-07-21 17:24:30:211812 Process Poller Wakeup
2012-07-21 17:24:30:211824 Process Message
2012-07-21 17:24:30:211840 Send MRCPv2 Stream 10.100.77.182:1544 <->
10.100.77.182:42357
MRCP/2.0 129 RECOGNITION-COMPLETE 9 COMPLETE
Channel-Identifier: 62887642d33711e1@speechrecog
Completion-Cause: 000 success </sil></sil>
2012-07-21 17:24:30:211877 Wait for Messages
2012-07-21 17:24:30:214791 Process Signalled Descriptor
2012-07-21 17:24:30:214821 Receive MRCPv2 Stream 10.100.77.182:1544 <->
10.100.77.182:42357
MRCP/2.0 191 RECOGNIZE 10
Channel-Identifier: 62887642d33711e1@speechrecog
Content-Type: text/uri-list
Cancel-If-Queue: false
Start-Input-Timers: true
Content-Length: 13
session:digit
2012-07-21 17:24:30:214870 Signal Message to
2012-07-21 17:24:30:214892 Wait for Messages
2012-07-21 17:24:30:214911 Process Message
2012-07-21 17:24:30:214924 Dispatch Signaling Message
2012-07-21 17:24:30:214935 Process RECOGNIZE Request
62887642d33711e1@speechrecog
2012-07-21 17:24:30:214951 Wait for Messages
2012-07-21 17:24:30:214978 Dispatch Request RECOGNIZE
62887642d33711e1@pocketsphinx
2012-07-21 17:24:30:215038 Open Waveform File 62887642d33711e1@pocketsphinx
2012-07-21 17:24:30:215121 Signal Message to
2012-07-21 17:24:30:215193 Wait for incoming messages
62887642d33711e1@pocketsphinx
2012-07-21 17:24:30:215227 Process Message
2012-07-21 17:24:30:215242 Process RECOGNIZE Response
62887642d33711e1@speechrecog
2012-07-21 17:24:30:215254 State Transition RECOGNIZED -> RECOGNIZING
62887642d33711e1@speechrecog
2012-07-21 17:24:30:215265 Signal Message to
2012-07-21 17:24:30:215282 Wait for Messages
2012-07-21 17:24:30:215414 Process Poller Wakeup
2012-07-21 17:24:30:215437 Process Message
2012-07-21 17:24:30:215454 Send MRCPv2 Stream 10.100.77.182:1544 <->
10.100.77.182:42357
MRCP/2.0 84 10 200 IN-PROGRESS
Channel-Identifier: 62887642d33711e1@speechrecog
2012-07-21 17:24:30:215486 Wait for Messages
2012-07-21 17:24:30:341244 Detected Voice Activity
62887642d33711e1@pocketsphinx
2012-07-21 17:24:30:341275 Signal Message to
2012-07-21 17:24:30:341302 Process Message
2012-07-21 17:24:30:341316 Process START-OF-INPUT Event
62887642d33711e1@speechrecog
2012-07-21 17:24:30:341328 Signal Message to
2012-07-21 17:24:30:341344 Wait for Messages
2012-07-21 17:24:30:341364 Process Poller Wakeup
2012-07-21 17:24:30:341376 Process Message
2012-07-21 17:24:30:341392 Send MRCPv2 Stream 10.100.77.182:1544 <->
10.100.77.182:42357
MRCP/2.0 95 START-OF-INPUT 10 IN-PROGRESS
Channel-Identifier: 62887642d33711e1@speechrecog
2012-07-21 17:24:30:341428 Wait for Messages
2012-07-21 17:24:30:371240 Detected Voice Inactivity
62887642d33711e1@pocketsphinx
INFO: cmn_prior.c(121): cmn_prior_update: from < 5.91 -0.44 -0.08 -0.08 -0.24
-0.13 -0.12 -0.17 -0.11 -0.10 -0.08 -0.04 -0.08 >
INFO: cmn_prior.c(139): cmn_prior_update: to < 5.95 -0.46 -0.06 -0.08 -0.24
-0.13 -0.13 -0.17 -0.11 -0.10 -0.08 -0.04 -0.08 >
INFO: fsg_search.c(1030): 8 frames, 50 HMMs (6/fr), 121 senones (15/fr), 8
history entries (1/fr)
ERROR: "fsg_search.c", line 1099: Final state not reached in frame 8
2012-07-21 17:24:30:371561 Signal Message to
2012-07-21 17:24:30:371587 Wait for incoming messages
62887642d33711e1@pocketsphinx
2012-07-21 17:24:30:371606 Process Message
2012-07-21 17:24:30:371618 Process RECOGNITION-COMPLETE Event
62887642d33711e1@speechrecog
2012-07-21 17:24:30:371629 State Transition RECOGNIZING -> RECOGNIZED
62887642d33711e1@speechrecog
2012-07-21 17:24:30:371641 Signal Message to
2012-07-21 17:24:30:371657 Wait for Messages
2012-07-21 17:24:30:371676 Process Poller Wakeup
2012-07-21 17:24:30:371689 Process Message
2012-07-21 17:24:30:371704 Send MRCPv2 Stream 10.100.77.182:1544 <->
10.100.77.182:42357
MRCP/2.0 130 RECOGNITION-COMPLETE 10 COMPLETE
Channel-Identifier: 62887642d33711e1@speechrecog
Completion-Cause: 000 success
2012-07-21 17:24:30:371741 Wait for Messages
2012-07-21 17:24:30:374845 Process Signalled Descriptor
2012-07-21 17:24:30:374875 Receive MRCPv2 Stream 10.100.77.182:1544 <->
10.100.77.182:42357
MRCP/2.0 191 RECOGNIZE 11
Channel-Identifier: 62887642d33711e1@speechrecog
Content-Type: text/uri-list
Cancel-If-Queue: false
Start-Input-Timers: true
Content-Length: 13
session:digit
2012-07-21 17:24:30:374911 Signal Message to
2012-07-21 17:24:30:374930 Wait for Messages
2012-07-21 17:24:30:374948 Process Message
2012-07-21 17:24:30:374961 Dispatch Signaling Message
2012-07-21 17:24:30:374972 Process RECOGNIZE Request
62887642d33711e1@speechrecog
2012-07-21 17:24:30:374987 Wait for Messages
2012-07-21 17:24:30:375007 Dispatch Request RECOGNIZE
62887642d33711e1@pocketsphinx
2012-07-21 17:24:30:375059 Open Waveform File 62887642d33711e1@pocketsphinx
2012-07-21 17:24:30:375110 Signal Message to
2012-07-21 17:24:30:375164 Wait for incoming messages
62887642d33711e1@pocketsphinx
2012-07-21 17:24:30:375200 Process Message
2012-07-21 17:24:30:375215 Process RECOGNIZE Response
62887642d33711e1@speechrecog
2012-07-21 17:24:30:375226 State Transition RECOGNIZED -> RECOGNIZING
62887642d33711e1@speechrecog
2012-07-21 17:24:30:375238 Signal Message to
2012-07-21 17:24:30:375255 Wait for Messages
2012-07-21 17:24:30:375384 Process Poller Wakeup
2012-07-21 17:24:30:375407 Process Message
2012-07-21 17:24:30:375424 Send MRCPv2 Stream 10.100.77.182:1544 <->
10.100.77.182:42357
MRCP/2.0 84 11 200 IN-PROGRESS
Channel-Identifier: 62887642d33711e1@speechrecog
2012-07-21 17:24:30:375456 Wait for Messages
2012-07-21 17:24:40:880559 Process Signalled Descriptor
2012-07-21 17:24:40:880607 TCP/MRCPv2 Peer Disconnected 10.100.77.182:1544 <->
10.100.77.182:42357
2012-07-21 17:24:40:880674 Wait for Messages
2012-07-21 17:24:40:880929 Receive SIP Event Status 200 Session Terminated
2012-07-21 17:24:40:880956 Receive SIP Event Status 200 Session Terminated
2012-07-21 17:24:40:880970 SIP Call State 0x7fde10001938
2012-07-21 17:24:40:880988 Signal Message to
2012-07-21 17:24:40:881007 Receive SIP Event Status 200 Session Terminated
2012-07-21 17:24:40:881028 Process Message
2012-07-21 17:24:40:881043 Dispatch Signaling Message
2012-07-21 17:24:40:881053 Deactivate Session 0x7fde10001938
<62887642d33711e1>
2012-07-21 17:24:40:881072 Create and Process STOP Request
62887642d33711e1@speechrecog
2012-07-21 17:24:40:881090 Wait for Messages
2012-07-21 17:24:40:881107 Dispatch Request DEACTIVATE
62887642d33711e1@pocketsphinx
2012-07-21 17:24:40:881120 Wait for incoming messages
62887642d33711e1@pocketsphinx
INFO: cmn_prior.c(121): cmn_prior_update: from < 5.39 -0.58 -0.10 -0.14 -0.19
-0.12 -0.12 -0.14 -0.08 -0.07 -0.07 -0.03 -0.06 >
INFO: cmn_prior.c(139): cmn_prior_update: to < 5.00 -0.58 -0.09 -0.13 -0.16
-0.11 -0.11 -0.13 -0.07 -0.06 -0.06 -0.03 -0.06 >
INFO: fsg_search.c(1030): 525 frames, 5216 HMMs (9/fr), 13004 senones (24/fr),
1499 history entries (2/fr)
2012-07-21 17:24:40:881929 Signal Message to
2012-07-21 17:24:40:881956 Wait for incoming messages
62887642d33711e1@pocketsphinx
2012-07-21 17:24:40:881973 Process Message
2012-07-21 17:24:40:881986 Process DEACTIVATE Response
62887642d33711e1@speechrecog
2012-07-21 17:24:40:882004 State Transition RECOGNIZING -> IDLE
62887642d33711e1@speechrecog
2012-07-21 17:24:40:882016 Terminate Session 0x7fde10001938 <62887642d33711e1>
2012-07-21 17:24:40:882028 Remove Control Channel 0x7fde10001938
62887642d33711e1@speechrecog
2012-07-21 17:24:40:882041 Signal Message to
2012-07-21 17:24:40:882057 Subtract Media Termination 0x7fde10001938
62887642d33711e1@media-tm
2012-07-21 17:24:40:882068 Close Channel 62887642d33711e1@pocketsphinx
2012-07-21 17:24:40:882095 Process Poller Wakeup
2012-07-21 17:24:40:882108 Process Message
2012-07-21 17:24:40:882120 Remove Control Channel
62887642d33711e1@speechrecog
2012-07-21 17:24:40:882189 Destroy Container for Pending Control Channels
2012-07-21 17:24:40:882218 Mark Connection for Removal 10.100.77.182:1544 <->
10.100.77.182:42357
2012-07-21 17:24:40:882230 Signal Message to
2012-07-21 17:24:40:882242 Wait for Messages
2012-07-21 17:24:40:882353 Remove Grammar File 62887642d33711e1@pocketsphinx
2012-07-21 17:24:40:882575 Free Decoder 62887642d33711e1@pocketsphinx
2012-07-21 17:24:40:883392 Signal Message to
2012-07-21 17:24:40:883419 Subtract Media Termination 0x7fde10001938
62887642d33711e1@rtp-tm
2012-07-21 17:24:40:883431 Signal Message to
2012-07-21 17:24:40:883443 Remove Session <62887642d33711e1>
2012-07-21 17:24:40:883459 Wait for Messages
2012-07-21 17:24:40:883470 Process Message
2012-07-21 17:24:40:883481 Control Channel Removed 0x7fde10001938
62887642d33711e1@speechrecog
2012-07-21 17:24:40:883491 Wait for Messages
2012-07-21 17:24:40:883501 Process Message
2012-07-21 17:24:40:883511 Engine Channel Closed 0x7fde10001938
62887642d33711e1@speechrecog
2012-07-21 17:24:40:883521 Wait for Messages
2012-07-21 17:24:40:893104 Process Message
2012-07-21 17:24:40:893136 Destroy Audio Bridge 0x7fde10001938
2012-07-21 17:24:40:893151 Close RTP Receiver 10.100.77.182:5000 <-
10.100.77.182:4020
2012-07-21 17:24:40:893268 Remove Media Context 0x7fde10001938
2012-07-21 17:24:40:893298 Remove RTP Session 10.100.77.182:5000
2012-07-21 17:24:40:893324 Signal Message to
2012-07-21 17:24:40:893348 Process Message
2012-07-21 17:24:40:893361 Media Termination Subtracted 0x7fde10001938
62887642d33711e1@media-tm
2012-07-21 17:24:40:893372 Media Termination Subtracted 0x7fde10001938
62887642d33711e1@rtp-tm
2012-07-21 17:24:40:893383 Destroy TCP/MRCPv2 Connection 10.100.77.182:1544
<-> 10.100.77.182:42357
2012-07-21 17:24:40:893424 Session Terminated 0x7fde10001938
<62887642d33711e1>
2012-07-21 17:24:40:893455 Destroy Session <62887642d33711e1>
2012-07-21 17:24:40:893477 Wait for Messages
Sorry since you didn't provide the information about your experiments, neither
the data that was used for testing nor the exact decoder configuration it's
hard to give you a detailed answer.
Silence doesn't affect results
No idea
Unlikely
can you provide all the configuration files of your installation
the
ext.conf_dialplan.txt
pocketsphinx.xml
unimrcpclient.xml
unimrcpserver.xml
unimrcpserverstartup.log
thank you
Firstly I’d like to thank you for your assistance, I appreciate it very much.
Nickolay I didn’t provide configuration files, because didn’t know what
exactly you need to examine this case. Now as per Hiyassat’s suggestion, I
send you the needed information.
https://www.dropbox.com/s/al8r6io1x5k5jj2/sphinx.zip
I did the following experiment:
1. trained an acoustic model using 150 8Khz utterances
2. made a grammar with 5 Armenian words mek(one), erku (two), ereq(three), ayo(yes), voch(no). Armenian words are pronounced exactly as they are written.
3. used (1) and (2) in asterisk-unimrcp-pocketsphinx configuration.
The problem is that the accuracy is very low.
Things I have tried:
1. I enabled both call recording from asterisk (Mixmonitor) and unimrcp. Firstly I take .pcm files generated by unimrcp, covert it to wav files and tried to decode with pocketsphinx_batch
(uni_mek -860)
(uni_erku -2079)
(uni_ereq -1910)
(uni_ayo -3230)
(uni_voch -6895)
As see from the above nothing is recognized.
Then I tried to manually cut the recording made by asterisk (call.wav) by
silences using audacity audio editor, made 5 wav files and decoded them. Here
are the results
մեկ (ast_mek -695)
երկու (ast_erku -1501)
(ast_ereq -1564)
այո (ast_ayo -781)
ոչ (ast_voch -938)
4 from 5 words were recognized correctly. I included call.wav, .pcm files as
well as manually cut files in the zip file.
2. I chose such values for sensitivity level and activity-timeout/inactivity-timeout parameters that unimrcp server correctly cut every pronounced word and placed it in separate .pcm file.
3. I tried to increase/decrease volume of the call in asterisk using Set(VOLUME(TX)=3), Set(VOLUME(RX)=3) commands with no success.
In addition to the above I’d like to mention that English acoustic model used
in the same configuration gives very good results!
Can you please suggest where did I a mistake, what can I try some more ?
P.S. In the zip file I include acoustic model, dictionary and grammar in case
you want to try to decode the wav files by yourself.
Regards,
Zaven
As addition to my previous post, please see the following figure, where
comparative analysis of original (made by unimrcp) and modified (where
silences are manually truncated or added) utterances are depicted. As seen
from the figure none of the original utterances are recognized by
pocketsphinx, while 4 from 5 of modified utterances are recognized. Hope this
will help to understand the case.
The figure can be found here https://www.dropbox.com/s/hlu2cvfb7m5u32g/uttera
nces.jpg
zaven1, you need to provide the data files you are using, not just images.
Images can rarely say something useful. To get the fastest answer you need to
provide the whole training folder and the test data.
hiyassat didn't ask you for the right data, he confused you because he was
trying to solve his own problems and jumped into the thread with unrelated
question. Not a great thing to do.
Here is whole training data, including test data https://www.dropbox.com/s/nx
xzayex9ubr889/TrainData.zip.
The utterances taken from unimrcp (which was depicted in the figure) is in the
following link https://www.dropbox.com/s/al8r6io1x5k5jj2/sphinx.zip "raw utterances" folder.
Please inform me, if you'll need something else.
Thank you.
This is not sufficient to train a model, it's even worse since you are trying
to train too many senones (1000) and mixtures (8). You can find the size of
the required data in the tutorial
It means your model is not functional
The WER must be less than 5%
Some parts of the asterisk recordings have zero silence regions. You need to
add "-dither yes" to feat.params in your model or in unimrcp options.
Nickolay, "-dither yes" really improved the recognition accuracy. As per
number of senons and mixtures, I tried to experiment with lot of variants.
Values 1000,8 I use in the latest sphinxtrain folder, which I sent you. In my
test environment I set 200 for senons and 2 for mixtures. Now, I plan to
collect 10-15 hour of data to train an acoustic model.
Thanks again for your assistant.
Zaven.