I am working on recognizing the repetitions of syllables, like "DA DA DA DA DA DA". However, when the speaker speak fast, it will only output one "DA". But the duration is the whole sequence. I am using adapted WSJ generic model, and the grammar like this:
SEQ = (DA)*
I am using the command:
pocketsphinx_batch -hmm hub4wsj_sc_8k -feat 1s_c_d_dd -ceplen 13 -ncep 13 -lw 10 -fwdflatlw 10 -bestpathlw 10 -beam 1e-80 -wbeam 1e-40 -fwdflatbeam 1e-80 -fwdflatwbeam 1e-40 -pbeam 1e-80 -lpbeam 1e-80 -lponlybeam 1e-80 -jsgf hello.gram -dict hello.dic -wip 0.2 -ctl UTDallas-concussion_test.fileids -ctloffset 0 -ctlcount 1 -cepdir . -cepext .wav -hyp hello.hyp -agc none -varnorm no -cmn current -hypseg hello.hypseg -remove_noise no -remove_silence yes -transform dct -adcin yes
It seems the recognizer skip the short pause after each syllable. Is there any way to insert the short pause after each syllable and make it recognize all the syllables?
Thank you.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I believe you screwed something in your model, this behaviour does not seem correct for pocketsphinx_batch. Again, its better for you to provide an example
I found a problem that I was testing 16000 hz audio with the model "hub4wsj_sc_8k". If I don't mistake, this model should be used with 8k hz audio. Is that right?
Thank you once again.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I found a problem that I was testing 16000 hz audio with the model "hub4wsj_sc_8k". If I don't mistake, this model should be used with 8k hz audio. Is that right?
hub4wsj_sc_8k should work both for 16khz and 8khz, you just need to specify a sample rate properly.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi, dear all,
I am working on recognizing the repetitions of syllables, like "DA DA DA DA DA DA". However, when the speaker speak fast, it will only output one "DA". But the duration is the whole sequence. I am using adapted WSJ generic model, and the grammar like this:
SEQ = (DA)*
I am using the command:
pocketsphinx_batch -hmm hub4wsj_sc_8k -feat 1s_c_d_dd -ceplen 13 -ncep 13 -lw 10 -fwdflatlw 10 -bestpathlw 10 -beam 1e-80 -wbeam 1e-40 -fwdflatbeam 1e-80 -fwdflatwbeam 1e-40 -pbeam 1e-80 -lpbeam 1e-80 -lponlybeam 1e-80 -jsgf hello.gram -dict hello.dic -wip 0.2 -ctl UTDallas-concussion_test.fileids -ctloffset 0 -ctlcount 1 -cepdir . -cepext .wav -hyp hello.hyp -agc none -varnorm no -cmn current -hypseg hello.hypseg -remove_noise no -remove_silence yes -transform dct -adcin yes
It seems the recognizer skip the short pause after each syllable. Is there any way to insert the short pause after each syllable and make it recognize all the syllables?
Thank you.
Hi, tfpeach, you still working on the same.
I believe you screwed something in your model, this behaviour does not seem correct for pocketsphinx_batch. Again, its better for you to provide an example
Those options should have no effect, you can remove them.
Thank you, Nickolay.
I found a problem that I was testing 16000 hz audio with the model "hub4wsj_sc_8k". If I don't mistake, this model should be used with 8k hz audio. Is that right?
Thank you once again.
hub4wsj_sc_8k should work both for 16khz and 8khz, you just need to specify a sample rate properly.