We are trying to spot keywords on telephonic conversation calls.
In audio Sample we want to find for the particular keywords,silence and speakers.
Currently we are using sphinx4 for recognizing voice samples and on the recognized text we are spotting keywords. Please have look at the Configuration file in attachment.
Is this the right way of spotting keywords in a given audio?
Please give some insight for us to fill this requirement.
Apart from that we have few questions:
1) How filler dictionary and garbage looping help in keyword spotting using sphinx4, as of now we are using lextree linuist model.
2) Which is the best way to handle voice samples recorded on telephone, either up ratting 8Khz samples to 16khz samples or changing the configuration to 8khz.
3) Does keyword spotting is available in sphinx4.
4) Can we merge our own customized language model with en-us genric model available in sphinx.
6) How to handle mixed accent(eg: en-us and indian english) voice samples.
Thanks
Nagarjuna
Last edit: Nagarjuna 2014-09-06
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
1) How filler dictionary and garbage looping help in keyword spotting using sphinx4, as of now we are using lextree linuist model.
There is no keyword spotting in sphinx4 yet.
2) Which is the best way to handle voice samples of telephonic calls, either up ratting 8Khz samples to 16khz samples or changing the configuration to 8khz.
To process 8khz files you ned to use en-us-8khz model and call setSampleRate(8000) on configuration.
3) Does keyword spotting is available in sphinx4.
No
4) we tried searching some voice samples of call center telephonic conversation of en-us and indian but could not trace out any. Can you spot where can we get such data.
Such data is mostly proprietary. You can record few calls yourself.
5) Can we merge our own customized language model with en-us genric model available in sphinx.
Yes, you can use srilm.
6) How to handle mixed accent(eg: en-us and indian english) voice samples.
You can identify accent and use appropriate model or you can create joint model. It's mostly a research task to support two different accents at once.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
We are trying to spot keywords on telephonic conversation calls.
In audio Sample we want to find for the particular keywords,silence and speakers.
Currently we are using sphinx4 for recognizing voice samples and on the recognized text we are spotting keywords. Please have look at the Configuration file in attachment.
Is this the right way of spotting keywords in a given audio?
Please give some insight for us to fill this requirement.
Apart from that we have few questions:
1) How filler dictionary and garbage looping help in keyword spotting using sphinx4, as of now we are using lextree linuist model.
2) Which is the best way to handle voice samples recorded on telephone, either up ratting 8Khz samples to 16khz samples or changing the configuration to 8khz.
3) Does keyword spotting is available in sphinx4.
4) Can we merge our own customized language model with en-us genric model available in sphinx.
6) How to handle mixed accent(eg: en-us and indian english) voice samples.
Thanks
Nagarjuna
Last edit: Nagarjuna 2014-09-06
There is no keyword spotting in sphinx4 yet.
Modern sphinx4 has not configuration files
http://cmusphinx.sourceforge.net/wiki/tutorialsphinx4
To process 8khz files you ned to use en-us-8khz model and call setSampleRate(8000) on configuration.
No
Such data is mostly proprietary. You can record few calls yourself.
Yes, you can use srilm.
You can identify accent and use appropriate model or you can create joint model. It's mostly a research task to support two different accents at once.
Thank you.