FYI: without -acdin I get errors
INFO: batch.c(729): Decoding 'file1'
ERROR: "batch.c", line 207: File length mismatch: 0x52494646 != 0xd7a, maybe it's not MFCC file
ERROR: "batch.c", line 422: Failed to read MFCC from the file 'testwavs/file1.wav'
Last edit: Toine db 2016-02-05
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'm sorry, batch and continuos are processing audio differently. Batch considers audio as a whole and normalizes volume (CMN) audio as a whole. Continuos processes audio frame by frame and it normalizes only using frames it already seen. This is not optimal process as was discussed many times on the forum and unfortunately it needs a proper initial CMN estimation. You can set -cmninit parameter to the values you see in batch log output and you will get similar results between batch and continuous. Thats why we recommend to decode longer files with continuous.
We will work on solution to make continuos more accurate from the start, it's just not there yet.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
As I understand correctly the batch is closest thing to pocketsphinx output on phones?
In my phone Apps I currently have a custom self build recognition system that detects a specific type of sound.
When that sound is detected it sends it to pocketsphinx for recognition, therefore starts and stops the pocketsphinx again and again.
Is pocketsphinx able to normalize volume in this phone scenario?
(or are the processed frames cleared after I stop the recognition in between)
Hope to hear from you
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Now i know in a functional way how to work with the decoder, but Im not really sure how to implement it.
But I do not understand what your trying to say with the following comment:
No, ps_process_raw calls do continuous processing
Do I need to use Batch or Continues to get representative like phone results ???
.
Currently Im using ps_start_utt and ps_end_utt in between, but I'm not sure if this resets reinit the decoder and volume... the part that I really want to....
.
Can you give me a hint what method(s) I need to use to pause/restart pocketsphinx without triggering a reset???
.
Because I'm thinking of creating the following system
1. In the background I want to keep PocketSphinx running/listening/decoding incomming audio to keep the Volume/CMN at a good level.
2. When my custom system detects my sound I want to
...2.1 Pause background pocketsphinx
...2.2 Pause the background running pocketsphinx
...2.3 Clear any current recognition (without clear\reinit volume)
...2.4 Request recognition for my detected sound
...2.5 Continue background pocketsphinx
Last question:
Does -cmninit work like a kickstart, to set initial value but will be leveled by PocketSphinx during decoding??? or is it settings PocketSphinx a constant level???
PS: I think I created the WIndows Phone example with reinit the decoder again and again, I'll check this later and possibly send a Push request with adjustments.
Hope to hear from you, sorry for the lot of quesations about this topic but I want to get it best as possible. Thanks again for the last reply.
Last edit: Toine db 2016-02-12
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Sorry to bother you, but could you give a short anwser to the 3 questions so I can improve the Windows Phone example?
(3 questions are at bottom of this thread)
I'm planning to adjust to parts in the
+ Pause (Pocketsphinx in kind of idle mode) withhout resetting whole decoder, what currently is the issue probably
+ Add nbest as return value
Hope to hear from you,
Toine
Last edit: Toine db 2016-02-23
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
What do you mean with this: "No, ps_process_raw calls do continuous processing
Do I need to use Batch or Continues to get representative like phone results"
Both types of processing give you representative results. You use batch for batch testing, continuous for decoding on the phone.
What command do I need to use to pause and restart the decoder, without loosing the calibrated CMN?
ps_end_utt stops the decoder. ps_start_utt starts the search. cmn is kept
Is -cmninit a kickstart or set to be a constant?
Initial value is used only for the first utterance, for next utterance cmn is recalculated.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
@Nickolay, probably the text was a litle to long, I hope you could still answer the questions I still have (mainly about the mechanisme);
in short.
1. What do you mean with this: "No, ps_process_raw calls do continuous processing
Do I need to use Batch or Continues to get representative like phone results"
2. What command do I need to use to pause and restart the decoder, without loosing the calibrated CMN?
3. Is -cmninit a kickstart or set to be a constant?
Hope to hear from you.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The phoneme recognition test with pocketsphinx_continuous work fine, but one at a time.
http://cmusphinx.sourceforge.net/wiki/phonemerecognition
Any suggestions how to approuch a bulk test for this?
(read a while directory or file with raw file locations?)
PS: I'm using Windows as a OS
Use
pocketsphinx_batchwith ctl file-ctl file.ctlas in the end of tutorial:http://cmusphinx.sourceforge.net/wiki/tutorialadapt#testing_the_adaptation
Thanks,
But the batch gives different results then pocketsphinx_continues.. why could that be?
I'm using the following commands
pocketsphinx_continuous
-hmm model/en-us/en-us
-allphone model/en-us/en-us-phone.lm.bin
-beam 1e-20
-pbeam 1e-20
-lw 2.0
-infile testwavs/file1.wav
-allphone_ci yes
Result: SIL AY AE
Batch Command
pocketsphinx_batch
-adcin yes
-hmm model/en-us/en-us
-allphone model/en-us/en-us-phone.lm.bin
-beam 1e-20
-pbeam 1e-20
-lw 2.0
-cepdir testwavs
-cepext .wav
-ctl test/testfileids.txt
-allphone_ci yes
Result: SIL NG HH
FYI: without -acdin I get errors
INFO: batch.c(729): Decoding 'file1'
ERROR: "batch.c", line 207: File length mismatch: 0x52494646 != 0xd7a, maybe it's not MFCC file
ERROR: "batch.c", line 422: Failed to read MFCC from the file 'testwavs/file1.wav'
Last edit: Toine db 2016-02-05
@Nickolay any thoughts towards my problem?
Hi Toine
I'm sorry, batch and continuos are processing audio differently. Batch considers audio as a whole and normalizes volume (CMN) audio as a whole. Continuos processes audio frame by frame and it normalizes only using frames it already seen. This is not optimal process as was discussed many times on the forum and unfortunately it needs a proper initial CMN estimation. You can set -cmninit parameter to the values you see in batch log output and you will get similar results between batch and continuous. Thats why we recommend to decode longer files with continuous.
We will work on solution to make continuos more accurate from the start, it's just not there yet.
Thanks for the reply Nickolay,
As I understand correctly the batch is closest thing to pocketsphinx output on phones?
In my phone Apps I currently have a custom self build recognition system that detects a specific type of sound.
When that sound is detected it sends it to pocketsphinx for recognition, therefore starts and stops the pocketsphinx again and again.
Is pocketsphinx able to normalize volume in this phone scenario?
(or are the processed frames cleared after I stop the recognition in between)
Hope to hear from you
No, ps_process_raw calls do continuous processing.
It is ok to stop and restart, just do not reinit the decoder.
Volume is reset when you reinit the decoder or when you call ps_start_stream.
Thanks for the explaining anwser.
Now i know in a functional way how to work with the decoder, but Im not really sure how to implement it.
But I do not understand what your trying to say with the following comment:
Currently Im using ps_start_utt and ps_end_utt in between, but I'm not sure if this resets reinit the decoder and volume... the part that I really want to....
.
Because I'm thinking of creating the following system
1. In the background I want to keep PocketSphinx running/listening/decoding incomming audio to keep the Volume/CMN at a good level.
2. When my custom system detects my sound I want to
...2.1 Pause background pocketsphinx
...2.2 Pause the background running pocketsphinx
...2.3 Clear any current recognition (without clear\reinit volume)
...2.4 Request recognition for my detected sound
...2.5 Continue background pocketsphinx
Last question:
Does -cmninit work like a kickstart, to set initial value but will be leveled by PocketSphinx during decoding??? or is it settings PocketSphinx a constant level???
PS: I think I created the WIndows Phone example with reinit the decoder again and again, I'll check this later and possibly send a Push request with adjustments.
Hope to hear from you, sorry for the lot of quesations about this topic but I want to get it best as possible. Thanks again for the last reply.
Last edit: Toine db 2016-02-12
Nickolay,
Sorry to bother you, but could you give a short anwser to the 3 questions so I can improve the Windows Phone example?
(3 questions are at bottom of this thread)
I'm planning to adjust to parts in the
+ Pause (Pocketsphinx in kind of idle mode) withhout resetting whole decoder, what currently is the issue probably
+ Add nbest as return value
Hope to hear from you,
Toine
Last edit: Toine db 2016-02-23
Both types of processing give you representative results. You use batch for batch testing, continuous for decoding on the phone.
ps_end_utt stops the decoder. ps_start_utt starts the search. cmn is kept
Initial value is used only for the first utterance, for next utterance cmn is recalculated.
Thanks for the answers, Ill adjust the Windows Phone example accordingly.
@Nickolay, probably the text was a litle to long, I hope you could still answer the questions I still have (mainly about the mechanisme);
in short.
1. What do you mean with this: "No, ps_process_raw calls do continuous processing
Do I need to use Batch or Continues to get representative like phone results"
2. What command do I need to use to pause and restart the decoder, without loosing the calibrated CMN?
3. Is -cmninit a kickstart or set to be a constant?
Hope to hear from you.