I have a telephone application and I'm using Sphinx 4 with the wsj acoustic model. I'd like to be able to detect tones (such as DTMF) as well as recognize speech. In at least one instance, I don't know whether to expect speech or a tone, so I have to be able to detect/recognize at the same time. I can think of at least three ways of solving this problem:
1. do the tone detection completely independently of the SR system.
2. add a front-end stage to detect the tone, insert a Signal in the data stream, and modify the recognizer to handle the new type of Signal.
3. add the tones I want to detect to the acoustic model.
It seems like this most be a fairly common application, so there may already be a solution that I have overlooked. Am I missing something?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
IMHO, DTMF recognition and speech recognition are two different things and should not be mixed. Therefore I would avoid alternative 2 and 3. In addition it is possible that you run into synchrinzation problems, if the sphinx code changes.
In JVoiceXML http://jvoicexml.sourceforge.net we had a similar problem. We decided to handle DTMF completely independent to speech recognition.
Both are treated as separate modules that can throw a recognition event. In fact, we create a JSAPI 1.0 recognition result if a DTMF is detected. This way, the observer needs only a single entry point to handle recognition events.
hth
/dirk
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I have a telephone application and I'm using Sphinx 4 with the wsj acoustic model. I'd like to be able to detect tones (such as DTMF) as well as recognize speech. In at least one instance, I don't know whether to expect speech or a tone, so I have to be able to detect/recognize at the same time. I can think of at least three ways of solving this problem:
1. do the tone detection completely independently of the SR system.
2. add a front-end stage to detect the tone, insert a Signal in the data stream, and modify the recognizer to handle the new type of Signal.
3. add the tones I want to detect to the acoustic model.
It seems like this most be a fairly common application, so there may already be a solution that I have overlooked. Am I missing something?
Hi,
I would suggest to choose alternative 1.
IMHO, DTMF recognition and speech recognition are two different things and should not be mixed. Therefore I would avoid alternative 2 and 3. In addition it is possible that you run into synchrinzation problems, if the sphinx code changes.
In JVoiceXML http://jvoicexml.sourceforge.net we had a similar problem. We decided to handle DTMF completely independent to speech recognition.
Both are treated as separate modules that can throw a recognition event. In fact, we create a JSAPI 1.0 recognition result if a DTMF is detected. This way, the observer needs only a single entry point to handle recognition events.
hth
/dirk