Hi,
I have an acoustic model that recognizes hebrew digits, and it works with the pocketsphinx command line utilities.
I am now trying to integrate it with unimrcp and asterisk, and I a problem. unimrcp returns a recognition result to asterisk before I get a chance to say anything. I also tried to use the default digit grammar that comes with unimrcp and the hub4wsj_sc_8k acoustic model from pocketsphinx and got similar results.
This is the output from unimrcpserver and pocketsphinx:
Maybe you need to configure voice activity detection in unimrcp (documentation covers that). One possible reason of the broken voice activity detection might be that the audio format you feed into extension is not the one expected. Another reason might be asterisk and unimrcp version incompatibility.
You can dump audio which is sent to pocketsphinx in order to debug the problem.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
I have asterisk 1.8.16.0 and unimrcp 1.0.0. I read a little and it seems like there is a compatibility problem with asterisk 1.8.x. Can anyone confirm this?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
I tried using asterisk 1.6.2.9 instead and results are better. I still have a few problems:
1. Accuracy is still worse than in the pocketsphinx command line.
2. I get this warning message from unimrcp: [Oct 14 16:30:27] WARNING[9739] res_speech_unimrcp.c: Unsuccessful completion cause:3 reason:none
Maybe this one should go to the unimrcp forum.
3. There is too much idle time from the time the voice ends to the recognition results. I tried reducing the timeouts but that didn't seem to work.
Hi,
I tried using asterisk 1.6.2.9 instead and results are better. I still have a few problems:
1. Accuracy is still worse than in the pocketsphinx command line.
2. I get this warning message from unimrcp: [Oct 14 16:30:27] WARNING[9739] res_speech_unimrcp.c: Unsuccessful completion cause:3 reason:none
Maybe this one should go to the unimrcp forum.
3. There is too much idle time from the time the voice ends to the recognition results. I tried reducing the timeouts but that didn't seem to work.
I noticed something strange in the unimrcp server logs: It detects voice activity before I start speaking and then after 300 ms detects voice inactivity. I thought this may be due to cross-talk from the prompt so I changed the prompt to a beep but that didn't help. What am I doing wrong?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I solved the timeout problem by recompiling the energy detector code inside unimrcp and setting activity-timeout and inactivity-timeout in pocketsphinx.xml instead of just "timeout".
The only problem I have now is accuracy. It's the same for pocketsphinx_batch and unimrcp and it's not accurate. I have a working sphinx4 program which is much more accurate.
I guess I need help with pocketsphinx settings.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
In order to analyze accuracy issues you need to dump the audio you are trying to recognize to files and try to recognize them with pocketsphinx_batch. Then you can reliably compare rates.
Among options which unimrcp uses, frate 50 is not the best choice.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The issue is to dump the original audio from the asterisk and unimrcp to test the recognition accuracy with offline recognizer. Not the audio you used for adaptation previously.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I don't have much audio from there, only recordings of my own voice.
I tried using the recordings from asterisk with my sphinx4 program and the sphinx 4 program worked great. Since my sphinx 4 program was tuned according to the audio I sent you I thought this should also work.
I am also attaching the recordings I have from the asterisk machine.
Ok, and what is pocketsphinx_batch accuracy on that data? Is it bad?
Looking on the samples I see that the detector still doesn't work properly. Please check detector timeout settings. Please note that they changed in unirmcp recently and might cause trouble.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
pocketsphinx_batch's accuracy on the data is about the same as unimrcp's. It sometimes adds false words and sometimes misrecognizes.
I used 500 as the detector timeout, is that OK?
I am using unimrcp 1.0.0 (the version in the uni-ast-package 0.3.2), not the trunk version so I don't think recent changes to the detector should affect my results.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I wouldn't worry about timeout value. Something is wrong in the voice activity detection and utterances recoreded are clearly wrong. It does not depend on timeout, it's a flaw in the algorithm. It's not really a pocketsphinx problem.
You might want to debug unirmcp to find a root cause.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
I debugged unimrcp's pocketsphinx plugin code and played with the timeout settings and the voice activity detector seems to fire at the correct time. I am still getting inaccurate results.
How could you tell from the audio I sent you that it's a voice activity detector problem? The plugin code doesn't wait for the voice start event to fire before saving the audio, it just saves everything it gets.
I am also getting the same inaccurate results using pocketsphinx_batch, so this makes me think my problem is not related to unimrcp. Correct me if I am wrong...
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I fixed this in unimrcp. now I feed audio into pocketsphinx only after voice activity has started.
I am still getting inaccurate results. The attached audio is recognized as ACHAT SHTAYIM SHALOSH ARBA CHAMESH ACHAT instead of ACHAT SHTAYIM SHALOSH ARBA CHAMESH.
The problem was as you said - unimrcp feeds all audio to pocketsphinx and doesn't wait for voice activity detection to start.
Should I use another frate value or remove it?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
unimrcp also had another problem - it didn't cut the silence at the end of the audio (until voice activity detection fires an inactivity event). I fixed that, but I am still getting inaccurate results while getting accurate results on the same audio in my sphinx4 program.
an example:
Hi,
I have an acoustic model that recognizes hebrew digits, and it works with the pocketsphinx command line utilities.
I am now trying to integrate it with unimrcp and asterisk, and I a problem. unimrcp returns a recognition result to asterisk before I get a chance to say anything. I also tried to use the default digit grammar that comes with unimrcp and the hub4wsj_sc_8k acoustic model from pocketsphinx and got similar results.
This is the output from unimrcpserver and pocketsphinx:
Maybe you need to configure voice activity detection in unimrcp (documentation covers that). One possible reason of the broken voice activity detection might be that the audio format you feed into extension is not the one expected. Another reason might be asterisk and unimrcp version incompatibility.
You can dump audio which is sent to pocketsphinx in order to debug the problem.
Hi,
I have asterisk 1.8.16.0 and unimrcp 1.0.0. I read a little and it seems like there is a compatibility problem with asterisk 1.8.x. Can anyone confirm this?
Hi,
I tried using asterisk 1.6.2.9 instead and results are better. I still have a few problems:
1. Accuracy is still worse than in the pocketsphinx command line.
2. I get this warning message from unimrcp:
[Oct 14 16:30:27] WARNING[9739] res_speech_unimrcp.c: Unsuccessful completion cause:3 reason:none
Maybe this one should go to the unimrcp forum.
3. There is too much idle time from the time the voice ends to the recognition results. I tried reducing the timeouts but that didn't seem to work.
Attached are logs and my acoustic model.
https://docs.google.com/open?id=0B91Vmp4A3YOuSGRGMnU3RWgzYkU
Hi,
I tried using asterisk 1.6.2.9 instead and results are better. I still have a few problems:
1. Accuracy is still worse than in the pocketsphinx command line.
2. I get this warning message from unimrcp:
[Oct 14 16:30:27] WARNING[9739] res_speech_unimrcp.c: Unsuccessful completion cause:3 reason:none
Maybe this one should go to the unimrcp forum.
3. There is too much idle time from the time the voice ends to the recognition results. I tried reducing the timeouts but that didn't seem to work.
Attached are logs and my acoustic model.
https://docs.google.com/open?id=0B91Vmp4A3YOuSGRGMnU3RWgzYkU
I noticed something strange in the unimrcp server logs: It detects voice activity before I start speaking and then after 300 ms detects voice inactivity. I thought this may be due to cross-talk from the prompt so I changed the prompt to a beep but that didn't help. What am I doing wrong?
I solved the timeout problem by recompiling the energy detector code inside unimrcp and setting activity-timeout and inactivity-timeout in pocketsphinx.xml instead of just "timeout".
The only problem I have now is accuracy. It's the same for pocketsphinx_batch and unimrcp and it's not accurate. I have a working sphinx4 program which is much more accurate.
I guess I need help with pocketsphinx settings.
In order to analyze accuracy issues you need to dump the audio you are trying to recognize to files and try to recognize them with pocketsphinx_batch. Then you can reliably compare rates.
Among options which unimrcp uses, frate 50 is not the best choice.
You already helped me once to tune accuracy for the same audio for sphinx 4. file attached.
Training data also attached.
https://docs.google.com/open?id=0B91Vmp4A3YOuZW1GbU0ycXl2TTg
Are there any pointers you can give me on how to tune this myself? I don't want to bother you every time I need to tune for new data.
I am running train.bat for adaptation, the pocketsphinx_batch command line is there.
The issue is to dump the original audio from the asterisk and unimrcp to test the recognition accuracy with offline recognizer. Not the audio you used for adaptation previously.
I don't have much audio from there, only recordings of my own voice.
I tried using the recordings from asterisk with my sphinx4 program and the sphinx 4 program worked great. Since my sphinx 4 program was tuned according to the audio I sent you I thought this should also work.
I am also attaching the recordings I have from the asterisk machine.
https://docs.google.com/open?id=0B91Vmp4A3YOuZXNWQ1FIYlR3VEE
Ok, and what is pocketsphinx_batch accuracy on that data? Is it bad?
Looking on the samples I see that the detector still doesn't work properly. Please check detector timeout settings. Please note that they changed in unirmcp recently and might cause trouble.
pocketsphinx_batch's accuracy on the data is about the same as unimrcp's. It sometimes adds false words and sometimes misrecognizes.
I used 500 as the detector timeout, is that OK?
I am using unimrcp 1.0.0 (the version in the uni-ast-package 0.3.2), not the trunk version so I don't think recent changes to the detector should affect my results.
I wouldn't worry about timeout value. Something is wrong in the voice activity detection and utterances recoreded are clearly wrong. It does not depend on timeout, it's a flaw in the algorithm. It's not really a pocketsphinx problem.
You might want to debug unirmcp to find a root cause.
Hi,
I debugged unimrcp's pocketsphinx plugin code and played with the timeout settings and the voice activity detector seems to fire at the correct time. I am still getting inaccurate results.
How could you tell from the audio I sent you that it's a voice activity detector problem? The plugin code doesn't wait for the voice start event to fire before saving the audio, it just saves everything it gets.
I am also getting the same inaccurate results using pocketsphinx_batch, so this makes me think my problem is not related to unimrcp. Correct me if I am wrong...
The recorded audio doesn't contain the word itself, it contains silence. So the voice activity detector didn't cut it properly.
Yes, this is a problem. It should only process after the the voice start, not the whole thing
I fixed this in unimrcp. now I feed audio into pocketsphinx only after voice activity has started.
I am still getting inaccurate results. The attached audio is recognized as ACHAT SHTAYIM SHALOSH ARBA CHAMESH ACHAT instead of ACHAT SHTAYIM SHALOSH ARBA CHAMESH.
https://docs.google.com/open?id=0B91Vmp4A3YOucnU4R2lkaXBlUjA
Great, what was the problem with unimrcp?
As for not exactly the accurate result, make sure you are not using -frate 50 in client as I wrote above.
The problem was as you said - unimrcp feeds all audio to pocketsphinx and doesn't wait for voice activity detection to start.
Should I use another frate value or remove it?
unimrcp also had another problem - it didn't cut the silence at the end of the audio (until voice activity detection fires an inactivity event). I fixed that, but I am still getting inaccurate results while getting accurate results on the same audio in my sphinx4 program.
an example:
https://docs.google.com/open?id=0B91Vmp4A3YOuVU9ON0VXTmNVeTQ
This is recognized as EFES ACHAT SHMONNE SHALOSH TESHA ACHAT SHEVA SHEVA instead of EFES SHALOSH ACHAT SHMONNE SHALOSH EFES ACHAT SHEVA SHEVA.